scylladb

Author	SHA1	Message	Date
Botond Dénes	c37f5938fd	mutation_writer: feed_writer(): handle exceptions from consume_end_of_stream() Currently the exception handling code of feed_writer() assumes consume_end_of_stream() doesn't throw. This is false and an exception from said method can currently lead to an unclean destroy of the writer and reader. Fix by also handling exceptions from consume_end_of_stream() too. Closes #10147 (cherry picked from commit `1963d1cc25`)	2022-03-03 10:45:40 +01:00
Yaron Kaikov	fa90112787	release: prepare for 4.4.9	2022-02-16 14:24:54 +02:00
Nadav Har'El	f5895e5c04	docker: don't repeat "--alternator-address" option twice If the Docker startup script is passed both "--alternator-port" and "--alternator-https-port", a combination which is supposed to be allowed, it passes to Scylla the "--alternator-address" option twice. This isn't necessary, and worse - not allowed. So this patch fixes the scyllasetup.py script to only pass this parameter once. Fixes #10016. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220202165814.1700047-1-nyh@scylladb.com> (cherry picked from commit `cb6630040d`)	2022-02-03 18:40:12 +02:00
Avi Kivity	ce944911f2	Update seastar submodule (gratuitous exceptions on allocation failure) * seastar 59eeadc720...1fb2187322 (1): > core: memory: Avoid current_backtrace() on alloc failure when logging suppressed Fixes #9982.	2022-01-30 20:08:43 +02:00
Avi Kivity	b220130e4a	Revert "Merge 'scylla_raid_setup: use mdmonitor only when RAID level > 0' from Takuya ASADA" This reverts commit `de4f5b3b1f`. This branch doesn't support RAID 5, so it breaks at runtime. Ref #9540.	2022-01-30 11:00:21 +02:00
Avi Kivity	de4f5b3b1f	Merge 'scylla_raid_setup: use mdmonitor only when RAID level > 0' from Takuya ASADA We found that monitor mode of mdadm does not work on RAID0, and it is not a bug, expected behavior according to RHEL developer. Therefore, we should stop enabling mdmonitor when RAID0 is specified. Fixes #9540 ---- This reverts `0d8f932` and introduce correct fix. Closes #9970 * github.com:scylladb/scylla: scylla_raid_setup: use mdmonitor only when RAID level > 0 Revert "scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8" (cherry picked from commit `df22396a34`)	2022-01-27 10:27:45 +02:00
Avi Kivity	84a42570ec	Update tools/java submodule (maxPendingPerConnection default) * tools/java 14e635e5de...e8accfbf45 (2): > Fix NullPointerException in SettingsMode > cassandra-stress: Remove maxPendingPerConnection default Ref #7748.	2022-01-12 21:38:48 +02:00
Nadav Har'El	001f57ec0c	alternator: allow Authorization header to be without spaces The "Authorization" HTTP header is used in DynamoDB API to sign requests. Our parser for this header, in server::verify_signature(), required the different components of this header to be separated by a comma followed by a whitespace - but it turns out that in DynamoDB both spaces and commas are optional - one of them is enough. At least one DynamoDB client library - the old "boto" (which predated boto3) - builds this header without spaces. In this patch we add a test that shows that an Authorization header with spaces removed works fine in DynamoDB but didn't work in Alternator, and after this patch modifies the parsing code for this header, the test begins to pass (and the other tests show that the previously-working cases didn't break). Fixes #9568 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211101214114.35693-1-nyh@scylladb.com> (cherry picked from commit `56eb994d8f`)	2021-12-29 15:07:47 +02:00
Nadav Har'El	3279718d52	alternator: return the correct Content-Type header Although the DynamoDB API responses are JSON, additional conventions apply to these responses - such as how error codes are encoded in JSON. For this reason, DynamoDB uses the content type `application/x-amz-json-1.0` instead of the standard `application/json` in its responses. Until this patch, Scylla used `application/json` in its responses. This unexpected content-type didn't bother any of the AWS libraries which we tested, but it does bother the aiodynamo library (see HENNGE/aiodynamo#27). Moreover, we should return the x-amz-json-1.0 content type for future proofing: It turns out that AWS already defined x-amz-json-1.1 - see: https://awslabs.github.io/smithy/1.0/spec/aws/aws-json-1_1-protocol.html The 1.1 content type differs (only) in how it encodes error replies. If one day DynamoDB starts to use this new reply format (it doesn't yet) and if DynamoDB libraries will need to differenciate between the two reply formats, Alternator better return the right one. This patch also includes a new test that the Content-Type header is returned with the expected value. The test passes on DynamoDB, and after this patch it starts to pass on Alternator as well. Fixes #9554. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211031094621.1193387-1-nyh@scylladb.com> (cherry picked from commit `6ae0ea0c48`)	2021-12-29 14:14:24 +02:00
Takuya ASADA	c128994f90	scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8 On CentOS8, mdmonitor.service does not works correctly when using mdadm-4.1-15.el8.x86_64 and later versions. Until we find a solution, let's pinning the package version to older one which does not cause the issue (4.1-14.el8.x86_64). Fixes #9540 Closes #9782 (cherry picked from commit `0d8f932f0b`)	2021-12-28 11:38:33 +02:00
Nadav Har'El	9af2e5ead1	Update Seastar module with additional backports Backported an additional Seastar patch: > Merge 'metrics: Fix dtest->ulong conversion error' from Benny Halevy Fixes #9794.	2021-12-14 13:06:02 +02:00
Avi Kivity	be695a7353	Revert "cql3: Reject updates with NULL key values" This reverts commit `146f7b5421`. It causes a regression, and needs an additional fix. The bug is not important enough to merit this complication. Ref #9311.	2021-12-08 15:17:45 +02:00
Botond Dénes	cc9285697d	mutation_reader: shard_reader: ensure referenced objects are kept alive The shard reader can outlive its parent reader (the multishard reader). This creates a problem for lifecycle management: readers take the range and slice parameters by reference and users keep these alive until the reader is alive. The shard reader outliving the top-level reader means that any background read-ahead that it has to wait on will potentially have stale references to the range and the slice. This was seen in the wild recently when the evictable reader wrapped by the shard reader hit a use-after-free while wrapping up a background read-ahead. This problem was solved by `fa43d76` but any previous versions are susceptible to it. This patch solves this problem by having the shard reader copy and keep the range and slice parameters in stable storage, before passing them further down. Fixes: #9719 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211202113910.484591-1-bdenes@scylladb.com> (cherry picked from commit `417e853b9b`)	2021-12-06 15:25:57 +02:00
Nadav Har'El	21d140febc	alternator: add missing BatchGetItem metric Unfortunately, defining metrics in Scylla requires some code duplication, with the metrics declared in one place but exported in a different place in the code. When we duplicated this code in Alternator, we accidentally dropped the first metric - for BatchGetItem. The metric was accounted in the code, but not exported to Prometheus. In addition to fixing the missing metric, this patch also adds a test that confirms that the BatchGetItem metric increases when the BatchGetItem operation is used. This test failed before this patch, and passes with it. The test only currently tests this for BatchGetItem (and BatchWriteItem) but it can be later expanded to cover all the other operations as well. Fixes #9406 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210929121611.373074-1-nyh@scylladb.com> (cherry picked from commit `5cbe9178fd`)	2021-12-06 12:45:34 +02:00
Yaron Kaikov	77e05ca482	release: prepare for 4.4.8	2021-12-05 21:52:32 +02:00
Eliran Sinvani	5375b8f1a1	testlib: close index_reader to avoid racing condition In order to avoid race condition introduced in `9dce1e4` the index_reader should be closed prior to it's destruction. This only exposes 4.4 and earlier releases to this specific race. However, it is always a good idea to first close the index reader and only then destroy it since it is most likely to be assumed by all developers that will change the reader index in the future. Ref #9704 (because on 4.4 and earlier releases are vulnerable). Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Fixes #9704 (cherry picked from commit `ddd7248b3b`) Closes #9717	2021-12-05 12:02:13 +01:00
Juliusz Stasiewicz	7a82432e38	transport: Fix abort on certain configurations of native_transport_port(_ssl) The reason was accessing the `configs` table out of index. Also, native_transport_port-s can no longer be disabled by setting to 0, as per the table below. Rules for port/encryption (the same apply to shard_aware counterpart): np := native_transport_port.is_set() nps := native_transport_port_ssl.is_set() ceo := ceo.at("enabled") == "true" eq := native_transport_port_ssl() == native_transport_port() +-----+-----+-----+-----+ \| np \| nps \| ceo \| eq \| +-----+-----+-----+-----+ \| 0 \| 0 \| 0 \| * \| => listen on native_transport_port, unencrypted \| 0 \| 0 \| 1 \| * \| => listen on native_transport_port, encrypted \| 0 \| 1 \| 0 \| * \| => nonsense, don't listen \| 0 \| 1 \| 1 \| * \| => listen on native_transport_port_ssl, encrypted \| 1 \| 0 \| 0 \| * \| => listen on native_transport_port, unencrypted \| 1 \| 0 \| 1 \| * \| => listen on native_transport_port, encrypted \| 1 \| 1 \| 0 \| * \| => listen on native_transport_port, unencrypted \| 1 \| 1 \| 1 \| 0 \| => listen on native_transport_port, unencrypted + native_transport_port_ssl, encrypted \| 1 \| 1 \| 1 \| 1 \| => native_transport_port(_ssl), encrypted +-----+-----+-----+-----+ Fixes #7783 Fixes #7866 Closes #7992 (cherry picked from commit `29e4737a9b`)	2021-11-29 17:37:31 +02:00
Dejan Mircevski	146f7b5421	cql3: Reject updates with NULL key values We were silently ignoring INSERTs with NULL values for primary-key columns, which Cassandra rejects. Fix it by rejecting any modification_statement that would operate on empty partition or clustering range. This is the most direct fix, because range and slice are calculated in one place for all modification statements. It covers not only NULL cases, but also impossible restrictions like c>0 AND c<0. Unfortunately, Cassandra doesn't treat all modification statements consistently, so this fix cannot fully match its behavior. We err on the side of tolerance, accepting some DELETE statements that Cassandra rejects. We add a TODO for rejecting such DELETEs later. Fixes #7852. Tests: unit (dev), cql-pytest against Cassandra 4.0 Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #9286 (cherry picked from commit `1fdaeca7d0`)	2021-11-29 17:31:54 +02:00
Nadav Har'El	e1c7a906f0	cql: fix error return from execution of fromJson() and other functions As reproduced in cql-pytest/test_json.py and reported in issue #7911, failing fromJson() calls should return a FUNCTION_FAILURE error, but currently produce a generic SERVER_ERROR, which can lead the client to think the server experienced some unknown internal error and the query can be retried on another server. This patch adds a new cassandra_exception subclass that we were missing - function_execution_exception - properly formats this error message (as described in the CQL protocol documentation), and uses this exception in two cases: 1. Parse errors in fromJson()'s parameters are converted into a function_execution_exception. 2. Any exceptions during the execute() of a native_scalar_function_for function is converted into a function_execution_exception. In particular, fromJson() uses a native_scalar_function_for. Note, however, that functions which already took care to produce a specific Cassandra error, this error is passed through and not converted to a function_execution_exception. An example is the blobAsText() which can return an invalid_request error, so it is left as such and not converted. This also happens in Cassandra. All relevant tests in cql-pytest/test_json.py now pass, and are no longer marked xfail. This patch also includes a few more improvements to test_json.py. Fixes #7911 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210118140114.4149997-1-nyh@scylladb.com> (cherry picked from commit `702b1b97bf`)	2021-11-29 16:59:56 +02:00
Piotr Jastrzebski	c5d6e75db8	sstables: Fix writing KA/LA sstables index Before this patch when writing an index block, the sstables writer was storing range tombstones that span the boundary of the block in order of end bounds. This led to a range tombstone being ignored by a reader if there was a row tombstone inside it. This patch sorts the range tombstones based on start bound before writing them to the index file. The assumption is that writing an index block is rare so we can afford sortting the tombstones at that point. Additionally this is a writer of an old format and writing to it will be dropped in the next major release so it should be rarely used already. Kudos to Kamil Braun <kbraun@scylladb.com> for finding the reproducer. Test: unit(dev) Fixes #9690 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> (cherry picked from commit scylladb/scylla-enterprise@eb093afd6f) (cherry picked from commit `ab425a11a8`)	2021-11-28 11:09:39 +02:00
Tomasz Grabiec	da630e80ea	cql: Fix missing data in indexed queries with base table short reads Indexed queries are using paging over the materialized view table. Results of the view read are then used to issue reads of the base table. If base table reads are short reads, the page is returned to the user and paging state is adjusted accordingly so that when paging is resumed it will query the view starting from the row corresponding to the next row in the base which was not yet returned. However, paging state's "remaining" count was not reset, so if the view read was exhausted the reading will stop even though the base table read was short. Fix by restoring the "remaining" count when adjusting the paging state on short read. Tests: - index_with_paging_test - secondary_index_test Fixes #9198 Message-Id: <20210818131840.1160267-1-tgrabiec@scylladb.com> (cherry picked from commit `1e4da2dcce`)	2021-11-23 11:22:30 +02:00
Takuya ASADA	8ea1cbe78d	docker: add stopwaitsecs We need stopwaitsecs just like we do TimeoutStpSec=900 on scylla-server.service, to avoid timeout on scylla-server shutdown. Fixes #9485 Closes #9545 (cherry picked from commit `c9499230c3`)	2021-11-15 13:36:48 +02:00
Asias He	03b04d40f2	gossip: Fix use-after-free in real_mark_alive and mark_dead In commit `11a8912093` (gossiper: get_gossip_status: return string_view and make noexcept) get_gossip_status returns a pointer to an endpoint_state in endpoint_state_map. After commit `425e3b1182` (gossip: Introduce direct failure detector), gossiper::mark_dead and gossiper::real_mark_alive can yield in the middle of the function. It is possible that endpoint_state can be removed, causing use-after-free to access it. To fix, make a copy before we yield. Fixes #8859 Closes #8862 (cherry picked from commit `7a32cab524`)	2021-11-15 13:23:11 +02:00
Takuya ASADA	175d004513	scylla_util.py: On is_gce(), return False when it's on GKE GKE metadata server does not provide same metadata as GCE, we should not return True on is_gce(). So try to fetch machine-type from metadata server, return False if it 404 not found. Fixes #9471 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes #9582 (cherry picked from commit `9b4cf8c532`)	2021-11-15 13:18:13 +02:00
Asias He	091b794742	repair: Return HTTP 400 when repiar id is not found There are two APIs for checking the repair status and they behave differently in case the id is not found. ``` {"host": "192.168.100.11:10001", "method": "GET", "uri": "/storage_service/repair_async/system_auth?id=999", "duration": "1ms", "status": 400, "bytes": 49, "dump": "HTTP/1.1 400 Bad Request\r\nContent-Length: 49\r\nContent-Type: application/json\r\nDate: Wed, 03 Nov 2021 10:49:33 GMT\r\nServer: Seastar httpd\r\n\r\n{\"message\": \"unknown repair id 999\", \"code\": 400}"} {"host": "192.168.100.11:10001", "method": "GET", "uri": "/storage_service/repair_status?id=999&timeout=1", "duration": "0ms", "status": 500, "bytes": 49, "dump": "HTTP/1.1 500 Internal Server Error\r\nContent-Length: 49\r\nContent-Type: application/json\r\nDate: Wed, 03 Nov 2021 10:49:33 GMT\r\nServer: Seastar httpd\r\n\r\n{\"message\": \"unknown repair id 999\", \"code\": 500}"} ``` The correct status code is 400 as this is a parameter error and should not be retried. Returning status code 500 makes smarter http clients retry the request in hopes of server recovering. After this patch: curl -X PGET 'http://127.0.0.1:10000/storage_service/repair_async/system_auth?id=9999' {"message": "unknown repair id 9999", "code": 400} curl -X GET 'http://127.0.0.1:10000/storage_service/repair_status?id=9999' {"message": "unknown repair id 9999", "code": 400} Fixes #9576 Closes #9578 (cherry picked from commit `f5f5714aa6`)	2021-11-15 13:16:08 +02:00
Calle Wilund	8be87bb0b1	cdc: fix broken function signature in maybe_back_insert_iterator Fixes #9103 compare overload was declared as "bool" even though it is a tri-cmp. causes us to never use the speed-up shortcut (lessen search set), in turn meaning more overhead for collections. Closes #9104 (cherry picked from commit `59555fa363`)	2021-11-15 13:13:51 +02:00
Takuya ASADA	a84142705a	scylla_io_setup: handle nr_disks on GCP correctly nr_disks is int, should not be string. Fixes #9429 Closes #9430 (cherry picked from commit `3b798afc1e`)	2021-11-15 13:06:40 +02:00
Michał Chojnowski	fc32534aee	utils: fragment_range: fix FragmentedView utils for views with empty fragments The copying and comparing utilities for FragmentedView are not prepared to deal with empty fragments in non-empty views, and will fall into an infinite loop in such case. But data coming in result_row_view can contain such fragments, so we need to fix that. Fixes #8398. Closes #8397 (cherry picked from commit `f23a47e365`)	2021-11-15 12:57:21 +02:00
Hagit Segev	4e526ad88a	release: prepare for 4.4.7	2021-11-14 19:54:05 +02:00
Avi Kivity	176f253aa3	build: clobber user/group info from node_exporter tarball node_exporter is packaged with some random uid/gid in the tarball. When extracting it as an ordinary user this isn't a problem, since the uid/gid are reset to the current user, but that doesn't happen under dbuild since `tar` thinks the current user is root. This causes a problem if one wants to delete the build directory later, since it becomes owned by some random user (see /etc/subuid) Reset the uid/gid infomation so this doesn't happen. Closes #9579 Fixes #9610. (cherry picked from commit `e1817b536f`)	2021-11-10 14:19:28 +02:00
Nadav Har'El	c49cd5d9b6	alternator: fix bug in ReturnValues=ALL_NEW This patch fixes a bug in UpdateItem's ReturnValues=ALL_NEW, which in some cases returned the OLD (pre-modification) value of some of the attributes, instead of its NEW value. The bug was caused by a confusion in our JSON utility function, rjson::set(), which sounds like it can set any member of a map, but in fact may only be used to add a new member - if a member with the same name (key) already existed, the result is undefined (two values for the same key). In ReturnValues=ALL_NEW we did exactly this: we started with a copy of the original item, and then used set() to override some of the members. This is not allowed. So in this patch, we introduce a new function, rjson::replace(), which does what we previously thought that rjson::set() does - i.e., replace a member if it exists, or if not, add it. We call this function in the ReturnValues=ALL_NEW code. This patch also adds a test case that reproduces the incorrect ALL_NEW results - and gets fixed by this patch. In an upcoming patch, we should rename the confusingly-named set() functions and audit all their uses. But we don't do this in this patch yet. We just add some comments to clarify what set() does - but don't change it, and just add one new function for replace(). Fixes #9542 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211104134937.40797-1-nyh@scylladb.com> (cherry picked from commit `b95e431228`)	2021-11-08 14:10:36 +02:00
Dejan Mircevski	5d4abb521b	types: Unreverse tuple subtype for serialization When a tuple value is serialized, we go through every element type and use it to serialize element values. But an element type can be reversed, which is artificially different from the type of the value being read. This results in a server error due to the type mismatch. Fix it by unreversing the element type prior to comparing it to the value type. Fixes #7902 Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #8316 (cherry picked from commit `318f773d81`)	2021-11-07 19:25:43 +02:00
Asias He	cfc2562dec	storage_service: Abort restore_replica_count when node is removed from the cluster Consider the following procedure: - n1, n2, n3 - n3 is down - n1 runs nodetool removenode uuid_of_n3 to removenode from n3 the cluster - n1 is down in the middle of removenode operation Node n1 will set n3 to removing gossip status during removenode operation. Whenever existing nodes learn a node is in removing gossip status, they will call restore_replica_count to stream data from other nodes for the ranges n3 loses if n3 was removed from the cluster. If the streaming fails, the streaming will sleep and retry. The current max number of retry attempts is 5. The sleep interval starts at 60 seconds and increases 1.5 times per sleep. This can leave the cluster in a bad state. For example, nodes can go out of disk space if the streaming continues. We need a way to abort such streaming attempts. To abort the removenode operation and forcely remove the node, users can run `nodetool removenode force` on any existing nodes to move the node from removing gossip status to removed gossip status. However, the restore_replica_count will not be aborted. In this patch, a status checker is added in restore_replica_count, so that once a node is in removed gossip status, restore_replica_count will be aborted. This patch is for older releases without the new NODE_OPS_CMD infrastructure where such abort will happen automatically in case of error. Fixes #8651 Closes #8655 (cherry picked from commit `0858619cba`)	2021-11-02 17:26:35 +02:00
Benny Halevy	4a1171e2fa	large_data_handle: add sstable name to log messages Although the sstable name is part of the system.large_* records, it is not printed in the log. In particular, this is essential for the "too many rows" warning that currently does not record a row in any large_* table so we can't correlate it with a sstable. Fixes #9524 Test: unit(dev) DTest: wide_rows_test.py Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211027074104.1753093-1-bhalevy@scylladb.com> (cherry picked from commit `a21b1fbb2f`)	2021-10-29 10:48:35 +03:00
Asias He	542a508c50	repair: Handle everywhere_topology in bootstrap_with_repair The everywhere_topology returns the number of nodes in the cluster as RF. This makes only streaming from the node losing the range impossible since no node is losing the range after bootstrap. Shortcut to stream from all nodes in local dc in case the keyspace is everywhere_topology. Fixes #8503 (cherry picked from commit `3c36517598`)	2021-10-28 18:56:23 +03:00
Hagit Segev	dd018d4de4	release: prepare for 4.4.6	2021-10-28 18:00:13 +03:00
Benny Halevy	70098a1991	date_tiered_manifest: get_now: fix use after free of sstable_list The sstable_list is destroyed right after the temporary lw_shared_ptr<sstable_list> returned from `cf.get_sstables()` is dereferenced. Fixes #9138 Test: unit(dev) DTest: resharding_test.py:ReshardingTombstones_with_DateTieredCompactionStrategy.disable_tombstone_removal_during_reshard_test (debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210804075813.42526-1-bhalevy@scylladb.com> (cherry picked from commit `3ad0067272`)	2021-10-28 11:24:03 +03:00
Jan Ciolek	008f2ff370	cql3: Fix need_filtering on indexed table There were cases where a query on an indexed table needed filtering but need_filtering returned false. This is fixed by using new conditions in cases where we are using an index. Fixes #8991. Fixes #7708. For now this is an overly conservative implementation that returns true in some cases where filtering is not needed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> (cherry picked from commit `54149242b4`)	2021-10-28 11:24:03 +03:00
Benny Halevy	f71cdede5e	bytes_ostream: max_chunk_size: account for chunk header Currently, if the data_size is greater than max_chunk_size - sizeof(chunk), we end up allocating up to max_chunk_size + sizeof(chunk) bytes, exceeding buf.max_chunk_size(). This may lead to allocation failures, as seen in https://github.com/scylladb/scylla/issues/7950, where we couldn't allocate 131088 (= 128K + 16) bytes. This change adjusted the expose max_chunk_size() to be max_alloc_size (128KB) - sizeof(chunk) so that the allocated chunks would normally be allocated in 128KB chunks in the write() path. Added a unit test - test_large_placeholder that stresses the chunk allocation path from the write_place_holder(size) entry point to make sure it handles large chunk allocations correctly. Refs #7950 Refs #8081 Test: unit(release), bytes_ostream_test(debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210303143413.902968-1-bhalevy@scylladb.com> (cherry picked from commit `ff5b42a0fa`)	2021-10-28 11:24:03 +03:00
Botond Dénes	0fd17af2ee	evictable_reader: reset _range_override after fast-forwarding `_range_override` is used to store the modified range the reader reads after it has to be recreated (when recreating a reader it's read range is reduced to account for partitions it already read). When engaged, this field overrides the `_pr` field as the definitive range the reader is supposed to be currently reading. Fast forwarding conceptually overrides the range the reader is currently reading, however currently it doesn't reset the `_range_override` field. This resulted in `_range_override` (containing the modified pre-fast-forward range) incorrectly overriding the fast-forwarded-to range in `_pr` when validating the first partition produced by the just recreated reader, resulting in a false-positive validation failure. Fixes: #8059 Tests: unit(release) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210217164744.420100-1-bdenes@scylladb.com> [avi: add #include] (cherry picked from commit `c3b4c3f451`)	2021-10-28 11:12:01 +03:00
Benny Halevy	77cb6596c4	utils: phased_barrier: advance_and_await: make noexcept As a function returning a future, simplify its interface by handling any exceptions and returning an exceptional future instead of propagating the exception. In this specific case, throwing from advance_and_await() will propagate through table::await_pending_* calls short-circuiting a .finally clause in table::stop(). Also, mark as noexcept methods of class table calling advance_and_await and table::await_pending_ops that depends on them. Fixes #8636 A followup patch will convert advance_and_await to a coroutine. This is done separately to facilitate backporting of this patch. Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210511161407.218402-1-bhalevy@scylladb.com> (cherry picked from commit `c0dafa75d9`)	2021-10-13 12:26:12 +03:00
Avi Kivity	c81c7d2d89	Merge 'rjson: Add throwing allocator' from Piotr Sarna This series adds a wrapper for the default rjson allocator which throws on allocation/reallocation failures. It's done to work around several rapidjson (the underlying JSON parsing library) bugs - in a few cases, malloc/realloc return value is not checked, which results in dereferencing a null pointer (or an arbitrary pointer computed as 0 + `size`, with the `size` parameter being provided by the user). The new allocator will throw an `rjson:error` if it fails to allocate or reallocate memory. This series comes with unit tests which checks the new allocator behavior and also validates that an internal rapidjson structure which we indirectly rely upon (Stack) is not left in invalid state after throwing. The last part is verified by the fact that its destructor ran without errors. Fixes #8521 Refs #8515 Tests: * unit(release) * YCSB: inserting data similar to the one mentioned in #8515 - 1.5MB objects clustered in partitions 30k objects in size - nothing crashed during various YCSB workloads, but nothing also crashed for me locally before this patch, so it's not 100% robust relevant YCSB workload config for using 1.5MB objects: ```yaml fieldcount=150 fieldlength=10000 ``` Closes #8529 * github.com:scylladb/scylla: test: add a test for rjson allocation test: rename alternator_base64_test to alternator_unit_test rjson: add a throwing allocator (cherry picked from commit `c36549b22e`)	2021-10-12 13:57:15 +03:00
Benny Halevy	b3a762f179	streaming: stream_session: do not escape curly braces in format strings Those turn into '{}' in the formatted strings and trigger a logger error in the following sstlog.warn(err.c_str()) call. Fixes #8436 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210408173048.124417-1-bhalevy@scylladb.com> (cherry picked from commit `76cd315c42`)	2021-10-12 13:49:24 +03:00
Calle Wilund	2bba07bdf4	table: ensure memtable is actually in memtable list before erasing Fixes #8749 if a table::clear() was issued while we were flushing a memtable, the memtable is already gone from list. We need to check this before erase. Otherwise we get random memory corruption via std::vector::erase v2: * Make interface more set-like (tolerate non-existance in erase). Closes #8904 (cherry picked from commit `373fa3fa07`)	2021-10-12 13:47:33 +03:00
Benny Halevy	87bfb57ccf	utils: merge_to_gently: prevent stall in std::copy_if std::copy_if runs without yielding. See https://github.com/scylladb/scylla/issues/8897#issuecomment-867522480 Note that the standard states that no iterators or references are invalidated on insert so we can keep inserting before last1 when merging the remainder of list2 at the tail of list1. Fixes #8897 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `453e7c8795`)	2021-10-12 13:05:58 +03:00
Michael Livshin	6ca8590540	avoid race between compaction and table stop Also add a debug-only compaction-manager-side assertion that tests that no new compaction tasks were submitted for a table that is being removed (debug-only because not constant-time). Fixes #9448. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Message-Id: <20211007110416.159110-1-michael.livshin@scylladb.com> (cherry picked from commit `e88891a8af`)	2021-10-12 12:51:44 +03:00
Takuya ASADA	da57d6c7cd	scylla_cpuscaling_setup: add --force option To building Ubuntu AMI with CPU scaling configuration, we need force running mode for scylla_cpuscaling_setup, which run setup without checking scaling_governor support. See scylladb/scylla-machine-image#204 Closes #9326 (cherry picked from commit `f928dced0c`)	2021-10-05 16:20:22 +03:00
Takuya ASADA	61469d62b8	scylla_ntp_setup: support 'pool' directive on ntp.conf Currently, scylla_ntp_setup only supports 'server' directive, we should support 'pool' too. Fixes #9393 Closes #9397	2021-10-03 14:11:54 +03:00
Takuya ASADA	c63092038e	scylla_cpuscaling_setup: disable ondemand.service on Ubuntu On Ubuntu, scaling_governor becomes powersave after rebooted, even we configured cpufrequtils. This is because ondemand.service, it unconditionally change scaling_governor to ondemand or powersave. cpufrequtils will start before ondemand.service, scaling_governor overwrite by ondemand.service. To configure scaling_governor correctly, we have to disable this service. Fixes #9324 Closes #9325 (cherry picked from commit `cd7fe9a998`)	2021-10-03 14:08:37 +03:00
Raphael S. Carvalho	cb7fbb859b	compaction_manager: prevent unbounded growth of pending tasks There will be unbounded growth of pending tasks if they are submitted faster than retiring them. That can potentially happen if memtables are frequently flushed too early. It was observed that this unbounded growth caused task queue violations as the queue will be filled with tons of tasks being reevaluated. By avoiding duplication in pending task list for a given table T, growth is no longer unbounded and consequently reevaluation is no longer aggressive. Refs #9331. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210930125718.41243-1-raphaelsc@scylladb.com> (cherry picked from commit `52302c3238`)	2021-10-03 13:11:14 +03:00
Yaron Kaikov	01920c1293	release: prepare for 4.4.5	2021-09-23 14:37:52 +03:00
Eliran Sinvani	fd64cae856	dist: rpm: Add specific versioning and python3 dependency The Red Hat packages were missing two things, first the metapackage wasn't dependant at all in the python3 package and second, the scylla-server package dependencies didn't contain a version as part of the dependency which can cause to some problems during upgrade. Doing both of the things listed here is a bit of an overkill as either one of them separately would solve the problem described in #XXXX but both should be applied in order to express the correct concept. Fixes #8829 Closes #8832 (cherry picked from commit `9bfb2754eb`)	2021-09-12 16:01:15 +03:00
Calle Wilund	b1032a2699	snapshot: Add filter to check for existing snapshot Fixes #8212 Some snapshotting operations call in on a single table at a time. When checking for existing snapshots in this case, we should not bother with snapshots in other tables. Add an optional "filter" to check routine, which if non-empty includes tables to check. Use case is "scrub" which calls with a limited set of tables to snapshot. Closes #8240 (cherry picked from commit `f44420f2c9`)	2021-09-12 11:16:12 +03:00
Avi Kivity	90941622df	Merge "evictable_readers: don't drop static rows, drop assumption about snapshot isolation" from Botond " This mini-series fixes two loosely related bugs around reader recreation in the evictable reader (related by both being around reader recreation). A unit test is also added which reproduces both of them and checks that the fixes indeed work. More details in the patches themselves. This series replaces the two independent patches sent before: * [PATCH v1] evictable_reader: always reset static row drop flag * [PATCH v1] evictable_reader: relax partition key check on reader recreation As they depend on each other, it is easier to add a test if they are in a series. Fixes: #8923 Fixes: #8893 Tests: unit(dev, mutation_reader_test:debug) " * 'evictable-reader-recreation-more-bugs/v1' of https://github.com/denesb/scylla: test: mutation_reader_test: add more test for reader recreation evictable_reader: relax partition key check on reader recreation evictable_reader: always reset static row drop flag (cherry picked from commit `4209dfd753`)	2021-09-06 17:28:49 +03:00
Avi Kivity	4250ab27d8	Update seastar submodule (perftune failure on bond NIC) * seastar 4b7d434965...4a58d76fea (1): > perftune.py: instrument bonding tuning flow with 'nic' parameter Fixes #9225.	2021-08-19 16:59:41 +03:00
Takuya ASADA	475e0d0893	scylla_cpuscaling_setup: change scaling_governor path On some environment /sys/devices/system/cpu/cpufreq/policy0/scaling_governor does not exist even it supported CPU scaling. Instead, /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor is avaliable on both environment, so we should switch to it. Fixes #9191 Closes #9193 (cherry picked from commit `e5bb88b69a`)	2021-08-12 12:10:15 +03:00
Raphael S. Carvalho	27333587a8	compaction: Prevent tons of compaction of fully expired sstable from happening in parallel Compaction manager can start tons of compaction of fully expired sstable in parallel, which may consume a significant amount of resources. This problem is caused by weight being released too early in compaction, after data is all compacted but before table is called to update its state, like replacing sstables and so on. Fully expired sstables aren't actually compacted, so the following can happen: - compaction 1 starts for expired sst A with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 2 starts for expired sst B with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 3 starts for expired sst C with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 1 is done updating table state, so it finally completes and releases all the resources. - compaction 2 is done updating table state, so it finally completes and releases all the resources. - compaction 3 is done updating table state, so it finally completes and releases all the resources. This happens because, with expired sstable, compaction will release weight faster than it will update table state, as there's nothing to be compacted. With my reproducer, it's very easy to reach 50 parallel compactions on a single shard, but that number can be easily worse depending on the amount of sstables with fully expired data, across all tables. This high parallelism can happen only with a couple of tables, if there are many time windows with expired data, as they can be compacted in parallel. Prior to `55a8b6e3c9`, weight was released earlier in compaction, before last sstable was sealed, but right now, there's no need to release weight earlier. Weight can be released in a much simpler way, after the compaction is actually done. So such compactions will be serialized from now on. Fixes #8710. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210527165443.165198-1-raphaelsc@scylladb.com> [avi: drop now unneeded storage_service_for_tests] (cherry picked from commit `a7cdd846da`)	2021-08-10 18:16:47 +03:00
Nadav Har'El	0cfe0e8c8e	secondary index: fix regression in CREATE INDEX IF NOT EXISTS The recent commit `0ef0a4c78d` added helpful error messages in case an index cannot be created because the intended name of its materialized view is already taken - but accidentally broke the "CREATE INDEX IF NOT EXISTS" feature. The checking code was correct, but in the wrong place: we need to first check maybe the index already exists and "IF NOT EXISTS" was chosen - and only do this new error checking if this is not the case. This patch also includes a cql-pytest test for reproducing this bug. The bug is also reproduced by the translated Cassandra unit tests cassandra_tests/validation/entities/secondary_index_test.py:: testCreateAndDropIndex and this is how I found this bug. After these patch, all these tests pass. Fixes #8717. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210526143635.624398-1-nyh@scylladb.com> (cherry picked from commit `97e827e3e1`)	2021-08-10 17:41:51 +03:00
Nadav Har'El	cb3225f2de	Merge 'Fix index name conflicts with regular tables' from Piotr Sarna When an index is created without an explicit name, a default name is chosen. However, there was no check if a table with conflicting name already exists. The check is now in place and if any conflicts are found, a new index name is chosen instead. When an index is created with an explicit name and a conflicting regular table is found, index creation should simply fail. This series comes with a test. Fixes #8620 Tests: unit(release) Closes #8632 * github.com:scylladb/scylla: cql-pytest: add regression tests for index creation cql3: fail to create an index if there is a name conflict database: check for conflicting table names for indexes (cherry picked from commit `cee4c075d2`)	2021-08-10 12:56:52 +03:00
Hagit Segev	69daa9fd00	release: prepare for 4.4.4	2021-08-01 14:22:01 +03:00
Piotr Jastrzebski	f91cea66a6	api: use proper type to reduce partition count Partition count is of a type size_t but we use std::plus<int> to reduce values of partition count in various column families. This patch changes the argument of std::plus to the right type. Using std::plus<int> for size_t compiles but does not work as expected. For example plus<int>(2147483648LL, 1LL) = -2147483647 while the code would probably want 2147483649. Fixes #9090 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Closes #9074 (cherry picked from commit `90a607e844`)	2021-07-27 12:38:22 +03:00
Raphael S. Carvalho	9dce1e4b2b	sstables: Close promoted index readers when advancing to next summary index Problem fixed on master since `5ed559c`. So branch-4.5 and up aren't affected. Index reader fails to close input streams of promoted index readers when advancing to next summary entry, so Scylla can abort as a result of a stream being destroyed while there were reads in progress. This problem was seen when row cache issued a fast forward, so index reader was asked to advance to next summary entry while the previous one still had reads in progress. By closing the list of index readers when there's only one owner holding it, the problem is safely fixed, because it cannot happen that an index_bound like _lower_bound or _upper_bound will be left with a list that's already closed. Fixes #9049. test: mode(dev, debug). No observable perf regression: BEFORE: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 8.168640 4 100000 12242 108 12262 11982 50032.2 50049 6403116 20707 0 0 8 8 0 0 0 83.3% -> 1 1 22.257916 4 50000 2246 3 2249 2238 150025.0 150025 6454272 100001 0 49999 100000 149999 0 0 0 54.7% -> 1 8 9.384961 4 11112 1184 5 1184 1178 77781.2 77781 1439328 66618 11111 1 33334 44444 0 0 0 44.0% -> 1 16 4.976144 4 5883 1182 6 1184 1173 41180.0 41180 762053 35264 5882 0 17648 23530 0 0 0 44.1% -> 1 32 2.582744 4 3031 1174 4 1175 1167 21216.0 21216 392619 18176 3031 0 9092 12122 0 0 0 43.8% -> 1 64 1.308410 4 1539 1176 2 1178 1173 10772.0 10772 199353 9233 1539 0 4616 6154 0 0 0 44.0% -> 1 256 0.331037 4 390 1178 12 1190 1165 2729.0 2729 50519 2338 390 0 1169 1558 0 0 0 44.0% -> 1 1024 0.085108 4 98 1151 7 1155 1141 685.0 685 12694 587 98 0 293 390 0 0 0 42.9% -> 1 4096 0.024393 6 25 1025 5 1029 1020 174.0 174 3238 149 25 0 74 98 0 0 0 37.4% -> 64 1 8.765446 4 98462 11233 16 11236 11182 54642.0 54648 6405470 23632 1 1538 4615 4615 0 0 0 79.3% -> 64 8 8.456430 4 88896 10512 48 10582 10464 55578.0 55578 6405971 24031 4166 0 5553 5553 0 0 0 77.3% -> 64 16 7.798197 4 80000 10259 108 10299 10077 51248.0 51248 5922500 22160 4996 0 4998 4998 0 0 0 74.8% -> 64 32 6.605148 4 66688 10096 64 10168 10033 42715.0 42715 4936359 18796 4164 0 4165 4165 0 0 0 75.5% -> 64 64 4.933287 4 50016 10138 28 10189 10111 32039.0 32039 3702428 14106 3124 0 3125 3125 0 0 0 75.3% -> 64 256 1.971701 4 20032 10160 57 10347 10103 12831.0 12831 1482993 5731 1252 0 1250 1250 0 0 0 74.1% -> 64 1024 0.587026 4 5888 10030 84 10277 9946 3770.0 3770 435895 1635 368 0 366 366 0 0 0 74.6% -> 64 4096 0.157401 4 1600 10165 69 10202 9698 1023.0 1023 118449 455 100 0 98 98 0 0 0 73.9% AFTER: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 8.191639 4 100000 12208 46 12279 12161 50031.2 50025 6403108 20243 0 0 0 0 0 0 0 87.0% -> 1 1 22.933121 4 50000 2180 36 2198 2115 150025.0 150025 6454272 100001 0 49999 100000 149999 0 0 0 54.9% -> 1 8 9.471735 4 11112 1173 5 1178 1168 77781.2 77781 1439328 66663 11111 0 33334 44445 0 0 0 44.6% -> 1 16 5.001569 4 5883 1176 2 1176 1170 41180.0 41180 762053 35296 5882 1 17648 23529 0 0 0 44.6% -> 1 32 2.587069 4 3031 1172 1 1173 1164 21216.0 21216 392619 18185 3031 1 9092 12121 0 0 0 44.8% -> 1 64 1.310747 4 1539 1174 3 1177 1171 10772.0 10772 199353 9233 1539 0 4616 6154 0 0 0 44.9% -> 1 256 0.335490 4 390 1162 2 1167 1161 2729.0 2729 50519 2338 390 0 1169 1558 0 0 0 45.7% -> 1 1024 0.081944 4 98 1196 21 1210 1162 685.0 685 12694 585 98 0 293 390 0 0 0 46.2% -> 1 4096 0.022266 6 25 1123 3 1125 1105 174.0 174 3238 149 24 0 74 98 0 0 0 41.9% -> 64 1 8.731741 4 98462 11276 45 11417 11231 54642.0 54640 6405470 23686 0 1538 4615 4615 0 0 0 80.2% -> 64 8 8.396247 4 88896 10588 19 10596 10560 55578.0 55578 6405971 24275 4166 0 5553 5553 0 0 0 77.6% -> 64 16 7.700995 4 80000 10388 88 10405 10221 51248.0 51248 5922500 22100 5000 0 4998 4998 0 0 0 76.4% -> 64 32 6.517276 4 66688 10232 31 10342 10201 42715.0 42715 4936359 19013 4164 0 4165 4165 0 0 0 75.3% -> 64 64 4.898669 4 50016 10210 60 10291 10150 32039.0 32039 3702428 14110 3124 0 3125 3125 0 0 0 74.4% -> 64 256 1.969972 4 20032 10169 22 10173 10091 12831.0 12831 1482993 5660 1252 0 1250 1250 0 0 0 74.3% -> 64 1024 0.575180 4 5888 10237 84 10316 10028 3770.0 3770 435895 1656 368 0 366 366 0 0 0 74.6% -> 64 4096 0.158503 4 1600 10094 81 10195 10014 1023.0 1023 118449 460 100 0 98 98 0 0 0 73.5% Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210722180302.64675-1-raphaelsc@scylladb.com>	2021-07-25 14:03:04 +03:00
Asias He	99b8c04a40	repair: Consider memory bloat when calculate repair parallelism The repair parallelism is calculated by the number of memory allocated to repair and memory usage per repair instance. Currently, it does not consider memory bloat issues (e.g., issue #8640) which cause repair to use more memory and cause std::bad_alloc. Be more conservative when calculating the parallelism to avoid repair using too much memory. Fixes #8641 Closes #8652 (cherry picked from commit `b8749f51cb`)	2021-07-15 13:02:01 +03:00
Takuya ASADA	74cd6928c0	scylla-fstrim.timer: drop BindsTo=scylla-server.service To avoid restart scylla-server.service unexpectedly, drop BindsTo= from scylla-fstrim.timer. Fixes #8921 Closes #8973 (cherry picked from commit `def81807aa`)	2021-07-08 10:06:42 +03:00
Avi Kivity	a178098277	Update tools/java submodule for rack/dc properties * tools/java aab793d9f5...14e635e5de (1): > cassandra.in.sh: Add path to rack/dc properties file to classpath Fixes #7930.	2021-07-08 09:53:15 +03:00
Takuya ASADA	da1a9c7bc7	dist/redhat: fix systemd unit name of scylla-node-exporter systemd unit name of scylla-node-exporter is scylla-node-exporter.service, not node-exporter.service. Fixes #8966 Closes #8967 (cherry picked from commit `f19ebe5709`)	2021-07-07 18:37:55 +03:00
Takuya ASADA	3666bb84a7	dist: stop removing /etc/systemd/system/.mount on package uninstall Listing /etc/systemd/system/.mount as ghost file seems incorrect, since user may want to keep using RAID volume / coredump directory after uninstalling Scylla, or user may want to upgrade enterprise version. Also, we mixed two types of files as ghost file, it should handle differently: 1. automatically generated by postinst scriptlet 2. generated by user invoked scylla_setup The package should remove only 1, since 2 is generated by user decision. However, just dropping .mount from %files section causes another problem, rpm will remove these files during upgrade, instead of uninstall (#8924). To fix both problem, specify .mount files as "%ghost %config". It will keep files both package upgrade and package remove. See scylladb/scylla-enterprise#1780 Closes #8810 Closes #8924 Closes #8959 (cherry picked from commit `f71f9786c7`)	2021-07-07 18:37:55 +03:00
Pavel Emelyanov	df6d471e08	hasher: More picky noexcept marking of feed_hash() Commit `5adb8e555c` marked the ::feed_hash() and a visitor lambda of digester::feed_hash() as noexcept. This was quite recklesl as the appending_hash<>::operator()s called by ::feed_hash() are not all marked noexcept. In particular, the appending_hash<row>() is not such and seem to throw. The original intent of the mentioned commit was to facilitate the partition_hasher in repair/ code. The hasher itself had been removed by the `0af7a22c21`, so it no longer needs the feed_hash-s to be noexcepts. The fix is to inherit noexcept from the called hashers, but for the digester::feed_hash part the noexcept is just removed until clang compilation bug #50994 is fixed. fixes: #8983 tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210706153608.4299-1-xemul@scylladb.com> (cherry picked from commit `63a2fed585`)	2021-07-07 18:36:18 +03:00
Raphael S. Carvalho	92b85da380	LCS: reshape: Fix overlapping check when determining if a sstable set is disjoint Wrong comparison operator is used when checking for overlapping. It would miss overlapping when last key of a sstable is equal to the first key of another sstable that comes next in the set, which is sorted by first key. Fixes #8531. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `39ecddbd34`)	2021-07-07 14:04:22 +03:00
Juliusz Stasiewicz	d214d91a09	tests: Adjusted tests for DC checking in NTS CQL test relied on quietly acceptiong non-existing DCs, so it had to be removed. Also, one boost-test referred to nonexisting `datacenter2` and had to be removed. (cherry picked from commit `97bb15b2f2`)	2021-06-21 17:53:47 +03:00
Nadav Har'El	a7a1e59594	cql-pytest: remove "xfail" tag from two passing tests Issue #7595 was already fixed last week, in commit `b6fb5ee912`, so the two tests which failed because of this issue no longer fail and their "xfail" tag can be removed. Refs #7595. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210216160606.1172855-1-nyh@scylladb.com> (cherry picked from commit `946e63ee6e`)	2021-06-20 19:37:28 +03:00
Juliusz Stasiewicz	856aeb5ddb	locator: Check DC names in NTS The same trick is used as in C*: `79e693e16e/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java (L241)` Fixes #7595 (cherry picked from commit `b6fb5ee912`)	2021-06-20 19:24:29 +03:00
Piotr Sarna	61659fdbdb	Merge 'view: fix use-after-move when handling view update failures' Backport of `6726fe79b6`. The code was susceptible to use-after-move if both local and remote updates were going to be sent. The whole routine for sending view updates is now rewritten to avoid use-after-move. Fixes #8830 Tests: unit(release), dtest(secondary_indexes_test.py:TestSecondaryIndexes.test_remove_node_during_index_build) Closes #8834 * backport-6726fe7-4.4: view: fix use-after-move when handling view update failures db,view: explicitly move the mutation to its helper function db,view: pass base token by value to mutate_MV	2021-06-16 14:15:12 +02:00
Piotr Sarna	b06e9447b1	view: fix use-after-move when handling view update failures The code was susceptible to use-after-move if both local and remote updates were going to be sent. The whole routine for sending view updates is now rewritten to avoid use-after-move. Refs #8830 Tests: unit(release), dtest(secondary_indexes_test.py:TestSecondaryIndexes.test_remove_node_during_index_build) (cherry picked from commit `8a049c9116`)	2021-06-16 13:40:57 +02:00
Piotr Sarna	6a407984d8	db,view: explicitly move the mutation to its helper function The `apply_to_remote_endpoints` helper function used to take its `mut` parameter by reference, but then moved the value from it, which is confusing and prone to errors. Since the value is moved-from, let's pass it to the helper function as rvalue ref explicitly. (cherry picked from commit `7cdbb7951a`)	2021-06-16 13:38:39 +02:00
Piotr Sarna	74df68c67f	db,view: pass base token by value to mutate_MV The base token is passed cross-continuations, so the current way of passing it by const reference probably only works because the token copying is cheap enough to optimize the reference out. Fix by explicitly taking the token by value. (cherry picked from commit `88d4a66e90`)	2021-06-16 13:38:01 +02:00
Raphael S. Carvalho	a6b3a2b945	LCS: Fix terrible write amplification when reshaping level 0 LCS reshape is basically 'major compacting' level 0 until it contains less than N sstables. That produces terrible write amplification, because any given byte will be compacted (initial # of sstables / max_threshold (32)) times. So if L0 initially contained 256 ssts, there would be a WA of about 8. This terrible write amplification can be reduced by performing STCS instead on L0, which will leave L0 in a good shape without hurting WA as it happens now. Fixes #8345. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210322150655.27011-1-raphaelsc@scylladb.com> (cherry picked from commit `bcbb39999b`)	2021-06-14 20:27:41 +03:00
Michał Chojnowski	2cf998e418	cdc: log: fix use-after-free in process_bytes_visitor Due to small value optimization used in `bytes`, views to `bytes` stored in `vector` can be invalidated when the vector resizes, resulting in use-after-free and data corruption. Fix that. Fixes #8117 (cherry picked from commit `8cc4f39472`)	2021-06-13 19:06:25 +03:00
Botond Dénes	6a23208ce4	mutation_test: test_mutation_diff_with_random_generator: compact input mutations This test checks that `mutation_partition::difference()` works correctly. One of the checks it does is: m1 + m2 == m1 + (m2 - m1). If the two mutations are identical but have compactable data, e.g. a shadowable tombstone shadowed by a row marker, the apply will collapse these, causing the above equality check to fail (as m2 - m1 is null). To prevent this, compact the two input mutations. Fixes: #8221 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210310141118.212538-1-bdenes@scylladb.com> (cherry picked from commit `cf28552357`)	2021-06-13 18:30:43 +03:00
Takuya ASADA	94d73d2d26	scylla_coredump_setup: avoid coredump failure when hard limit of coredump is set to zero On the environment hard limit of coredump is set to zero, coredump test script will fail since the system does not generate coredump. To avoid such issue, set ulimit -c 0 before generating SEGV on the script. Note that scylla-server.service can generate coredump even ulimit -c 0 because we set LimitCORE=infinity on its systemd unit file. Fixes #8238 Closes #8245 (cherry picked from commit `af8eae317b`)	2021-06-13 18:27:03 +03:00
Avi Kivity	033d56234b	Update seastar submodule (nested exception logging) * seastar 61939b5b8a...4b7d434965 (2): > utils/log.cc: fix nested_exception logging (again) > log: skip on unknown nested mixing instead of stopping the logging Fixes #8327.	2021-06-13 18:22:59 +03:00
Benny Halevy	35d89298da	test: commitlog_test: test_allocation_failure: fill memory using smaller allocations commitlog was changed to use fragmented_temporary_buffer::ostream (db::commitlog::output). So if there are discontiguous small memory blocks, they can be used to satisfy an allocation even if no contiguous memory blocks are available. To prevent that, as Avi suggested, this change allocates in 128K blocks and frees the last one to succeed (so that we won't fail on allocating continuations). Fixes #8028 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210203100333.862036-1-bhalevy@scylladb.com> (cherry picked from commit `ca6f5cb0bc`)	2021-06-10 19:35:40 +03:00
Dejan Mircevski	8011d181b5	cql3: Skip indexed column for CK restrictions When querying an index table, we assemble clustering-column restrictions for that query by going over the base table token, partition columns, and clustering columns. But if one of those columns is the indexed column, there is a problem; the indexed column is the index table's partition key, not clustering key. We end up with invalid clustering slice, which can cause problems downstream. Fix this by skipping the indexed column when assembling the clustering restrictions. Tests: unit (dev) Fixes #7888 Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #8320 (cherry picked from commit `0bd201d3ca`)	2021-06-10 10:43:14 +03:00
Hagit Segev	bfafb84567	release: prepare for 4.4.3	2021-06-09 19:51:57 +03:00
Yaron Kaikov	0d9c09ed04	install.sh: Setup aio-max-nr upon installation This is a follow up change to #8512. Let's add aio conf file during scylla installation process and make sure we also remove this file when uninstall Scylla As per Avi Kivity's suggestion, let's set aio value as static configuration, and make it large enough to work with 500 cpus. Closes #8650 Refs: #8713 (cherry picked from commit `dd453ffe6a`)	2021-06-07 16:30:00 +03:00
Yaron Kaikov	36a4eba22e	scylla_io_setup: configure "aio-max-nr" before iotune On severl instance types in AWS and Azure, we get the following failure during scylla_io_setup process: ``` ERROR 2021-04-14 07:50:35,666 [shard 5] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application ``` We have scylla_prepare:configure_io_slots() running before the scylla-server.service start, but the scylla_io_setup is taking place before 1) Let's move configure_io_slots() to scylla_util.py since both scylla_io_setup and scylla_prepare are import functions from it 2) cleanup scylla_prepare since we don't need the same function twice 3) Let's use configure_io_slots() during scylla_io_setup to avoid such failure Fixes: #8587 Closes #8512 Refs: #8713 (cherry picked from commit `588a065304`)	2021-06-07 16:29:38 +03:00
Nadav Har'El	e9b1f10654	Update tools/java submodule with backported patches * tools/java 6ca351c221...aab793d9f5 (2): > nodetool: alternate way to specify table name which includes a dot > nodetool: do no treat table name with dot as a secondary index Fixes #6521 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-06-07 09:38:53 +03:00
Nadav Har'El	6057be3f42	alternator: fix equality check of nested document containing a set In issue #5021 we noticed that the equality check in Alternator's condition expressions needs to handle sets differently - we need to compare the set's elements ignoring their order. But the implementation we added to fix that issue was only correct when the entire attribute was a set... In the general case, an attribute can be a nested document, with only some inner set. The equality-checking function needs to tranverse this nested document, and compare the sets inside it as appropriate. This is what we do in this patch. This patch also adds a new test comparing equality of a nested document with some inner sets. This test passes on DynamoDB, failed on Alternator before this patch, and passes with this patch. Refs #5021 Fixes #8514 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210419184840.471858-1-nyh@scylladb.com> (cherry picked from commit `dae7528fe5`)	2021-06-07 09:10:08 +03:00
Nadav Har'El	673f823d8b	alternator: fix inequality check of two sets In issue #5021 we noted that Alternator's equality operator needs to be fixed for the case of comparing two sets, because the equality check needs to take into account the possibility of different element order. Unfortunately, we fixed only the equality check operator, but forgot there is also an inequality operator! So in this patch we fix the inequality operator, and also add a test for it that was previously missing. The implementation of the inequality operator is trivial - it's just the negation of the equality test. Our pre-existing tests verify that this is the correct implementation (e.g., if attribute x doesn't exist, then "x = 3" is false but "x <> 3" is true). Refs #5021 Fixes #8513 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210419141450.464968-1-nyh@scylladb.com> (cherry picked from commit `50f3201ee2`)	2021-06-07 08:45:54 +03:00
Nadav Har'El	0082968bd8	alternator: fix equality check of two unset attributes When a condition expression (ConditionExpression, FilterExpression, etc.) checks for equality of two item attributes, i.e., "x = y", and when one of these attributes was missing we correctly returned false. However, we also need to return false when both attributes are missing in the item, because this is what DynamoDB does in this case. In other words an unset attribute is never equal to anything - not even to another unset attribute. This was not happening before this patch: When x and y were both missing attributes, Alternator incorrectly returned true for "x = y", and this patch fixes this case. It also fixes "x <> y" which should to be true when both x and y are unset (but was false before this patch). The other comparison operators - <, <=, >, >=, BETWEEN, were all implemented correctly even before this patch. This patch also includes tests for all the two-unset-attribute cases of all the operators listed above. As usual, we check that these tests pass on both DynamoDB and Alternator to confirm our new behavior is the correct one - before this patch, two of the new tests failed on Alternator and passed on DynamoDB. Fixes #8511 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210419123911.462579-1-nyh@scylladb.com> (cherry picked from commit `46448b0983`)	2021-06-06 16:28:27 +03:00
Takuya ASADA	542cd7aff1	scylla_raid_setup: use /dev/disk/by-uuid to specify filesystem Currently, var-lib-scylla.mount may fails because it can start before MDRAID volume initialized. We may able to add "After=dev-disk-by\x2duuid-<uuid>.device" to wait for device become available, but systemd manual says it automatically configure dependency for mount unit when we specify filesystem path by "absolute path of a device node". So we need to replace What=UUID=<uuid> to What=/dev/disk/by-uuid/<uuid>. Fixes #8279 Closes #8681 (cherry picked from commit `3d307919c3`)	2021-05-24 17:24:07 +03:00
Raphael S. Carvalho	2b29568bf4	sstables/mp_row_consumer: Fix unbounded memory usage when consuming a large run of partition tombstones mp_row_consumer will not stop consuming large run of partition tombstones, until a live row is found which will allow the consumer to stop proceeding. So partition tombstones, from a large run, are all accumulated in memory, leading to OOM and stalls. The fix is about stopping the consumer if buffer is full, to allow the produced fragments to be consumed by sstable writer. Fixes #8071. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210514202640.346594-1-raphaelsc@scylladb.com> Upstream fix: `db4b9215dd`	2021-05-20 21:26:07 +03:00
Hagit Segev	93457807b8	release: prepare for 4.4.2	2021-05-20 00:02:31 +03:00
Takuya ASADA	cee62ab41b	install.sh: apply correct file security context when copying files Currently, unified installer does not apply correct file security context while copying files, it causes permission error on scylla-server.service. We should apply default file security context while copying files, using '-Z' option on /usr/bin/install. Also, because install -Z requires normalized path to apply correct security context, use 'realpath -m <PATH>' on path variables on the script. Fixes #8589 Closes #8602 (cherry picked from commit `60c0b37a4c`)	2021-05-19 12:41:20 +03:00
Takuya ASADA	728a5e433f	install.sh: fix not such file or directory on nonroot Since we have added scylla-node-exporter, we needed to do 'install -d' for systemd directory and sysconfig directory before copying files. Fixes #8663 Closes #8664 (cherry picked from commit `6faa8b97ec`)	2021-05-19 12:41:20 +03:00
Avi Kivity	9a2d4a7cc7	Merge 'Fix type checking in index paging' from Piotr Sarna When recreating the paging state from an indexed query, a bunch of panic checks were introduced to make sure that the code is correct. However, one of the checks is too eager - namely, it throws an error if the base column type is not equal to the view column type. It usually works correctly, unless the base column type is a clustering key with DESC clustering order, in which case the type is actually "reversed". From the point of view of the paging state generation it's not important, because both types deserialize in the same way, so the check should be less strict and allow the base type to be reversed. Tests: unit(release), along with the additional test case introduced in this series; the test also passes on Cassandra Fixes #8666 Closes #8667 * github.com:scylladb/scylla: test: add a test case for paging with desc clustering order cql3: relax a type check for index paging (cherry picked from commit `593ad4de1e`)	2021-05-19 12:41:05 +03:00
Takuya ASADA	cc050fd499	dist/redhat: stop using systemd macros, call systemctl directly Fedora version of systemd macros does not work correctly on CentOS7, since CentOS7 does not support "file trigger" feature. To fix the issue we need to stop using systemd macros, call systemctl directly. See scylladb/scylla-jmx#94 Closes #8005 (cherry picked from commit `7b310c591e`)	2021-05-18 13:50:07 +03:00
Raphael S. Carvalho	61145af5d9	compaction_manager: Don't swallow exception in procedure used by reshape and resharding run_custom_job() was swallowing all exceptions, which is definitely wrong because failure in a resharding or reshape would be incorrectly interpreted as success, which means upper layer will continue as if everything is ok. For example, ignoring a failure in resharding could result in a shared sstable being left unresharded, so when that sstable reaches a table, scylla would abort as shared ssts are no longer accepted in the main sstable set. Let's allow the exception to be propagated, so failure will be communicated, and resharding and reshape will be all or nothing, as originally intended. Fixes #8657. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210515015721.384667-1-raphaelsc@scylladb.com> (cherry picked from commit `10ae77966c`)	2021-05-18 13:00:20 +03:00
Avi Kivity	11bd83e319	Update tools/jmx (rpm systemd macros) * tools/jmx c510a56...7a101a0 (1): > dist/redhat: stop using systemd macros, call systemctl directly Ref scylladb/jmx#94.	2021-05-13 18:24:52 +03:00
Raphael S. Carvalho	b58305d919	compaction_manager: Redefine weight for better control of parallel compactions Compaction manager allows compaction of different weights to proceed in parallel. For example, a small-sized compaction job can happen in parallel to a large-sized one, but similar-sized jobs are serialized. The problem is the current definition of weight, which is the log (base 4) of total size (size of all sstables) of a job. This is what we get with the current weight definition: weight=5 for sizes=[1K, 3K] weight=6 for sizes=[4K, 15K] weight=7 for sizes=[16K, 63K] weight=8 for sizes=[64K, 255K] weight=9 for sizes=[258K, 1019K] weight=10 for sizes=[1M, 3M] weight=11 for sizes=[4M, 15M] weight=12 for sizes=[16M, 63M] weight=13 for sizes=[64M, 254M] weight=14 for sizes=[256M, 1022M] weight=15 for sizes=[1033M, 4078M] weight=16 for sizes=[4119M, 10188M] total weights: 12 Note that for jobs smaller than 1MB, we have 5 different weights, meaning 5 jobs smaller than 1MB could proceed in parallel. High number of parallel compactions can be observed after repair, which potentially produces tons of small sstables of varying sizes. That causes compaction to use a significant amount of resources. To fix this problem, let's add a fixed tax to the size before taking the log, so that jobs smaller than 1M will all have the same weight. Look at what we get with the new weight definition: weight=10 for sizes=[1K, 2M] weight=11 for sizes=[3M, 14M] weight=12 for sizes=[15M, 62M] weight=13 for sizes=[63M, 254M] weight=14 for sizes=[256M, 1022M] weight=15 for sizes=[1033M, 4078M] weight=16 for sizes=[4119M, 10188M] total weights: 7 Fixes #8124. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210217123022.241724-1-raphaelsc@scylladb.com> (cherry picked from commit `81d773e5d8`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210512224405.68925-1-raphaelsc@scylladb.com>	2021-05-13 08:38:40 +03:00
Lauro Ramos Venancio	065111b42b	TWCS: initialize _highest_window_seen The timestamp_type is an int64_t. So, it has to be explicitly initialized before using it. This missing inicialization prevented the major compactation from happening when a time window finishes, as described in #8569. Fixes #8569 Signed-off-by: Lauro Ramos Venancio <lauro.venancio@incognia.com> Closes #8590 (cherry picked from commit `15f72f7c9e`)	2021-05-06 08:52:15 +03:00
Nadav Har'El	ebd2c9bab0	Update tools/java submodule Backport sstableloader fix in tools/java submodule. Fixes #8230. * tools/java a3e010ee4f...6ca351c221 (1): > sstableloader: Handle non-prepared batches with ":" in identifier names Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-05-03 10:08:54 +03:00
Avi Kivity	bf9e1f6d2e	Merge '[branch 4.4] Backport reader_permit: always forward resources to the semaphore ' from Botond Dénes This is a backport of `8aaa3a7` to branch-4.4. The main conflicts were around Benny's reader close series (`fa43d76`), but it also turned out that an additional patch (2f1d65c) also has to backported to make sure admission on signaling resources doesn't deadlock. Refs: #8493 Closes #8571 * github.com:scylladb/scylla: test: mutation_reader_test: add test_reader_concurrency_semaphore_forward_progress test: mutation_reader_test: add test_reader_concurrency_semaphore_readmission_preserves_units reader_concurrency_semaphore: add dump_diagnostics() reader_permit: always forward resources test: multishard_mutation_query_test: fuzzy-test: don't consume resource up-front reader_concurrency_semaphore: make admission conditions consistent	2021-04-30 22:02:46 +03:00
Botond Dénes	a710866235	test: mutation_reader_test: add test_reader_concurrency_semaphore_forward_progress This unit test checks that the semaphore doesn't get into a deadlock when contended, in the presence of many memory-only reads (that don't wait for admission). This is tested by simulating the 3 kind of reads we currently have in the system: * memory-only: reads that don't pass admission and only own memory. * admitted: reads that pass admission. * evictable: admitted reads that are furthermore evictable. The test creates and runs a large number of these reads in parallel, read kinds being selected randomly, then creates a watchdog which kills the test if no progress is being made. (cherry picked from commit `45d580f056`)	2021-04-30 11:03:09 +03:00
Botond Dénes	3c3fc18777	test: mutation_reader_test: add test_reader_concurrency_semaphore_readmission_preserves_units This unit test passes a read through admission again-and-again, just like an evictable reader would be during its lifetime. When readmitted the read sometimes has to wait and sometimes not. This is to check that the readmitting a previously admitted reader doesn't leak any units. (cherry picked from commit `cadc26de38`)	2021-04-30 11:03:09 +03:00
Botond Dénes	960f93383b	reader_concurrency_semaphore: add dump_diagnostics() Allow semaphore related tests to include a diagnostics printout in error messages to help determine why the test failed. (cherry picked from commit `d246e2df0a`)	2021-04-30 09:08:18 +03:00
Botond Dénes	1c0557c638	reader_permit: always forward resources This commit conceptually reverts `4c8ab10`. Said commit was meant to prevent the scenario where memory-only permits -- those that don't pass admission but still consume memory -- completely prevent the admission of reads, possibly even causing a deadlock because a permit might even blocks its own admission. The protection introduced by said commit however proved to be very problematic. It made the status of resources on the permit very hard to reason about and created loopholes via which permits could accumulate without tracking or they could even leak resources. Instead of continuing to patch this broken system, this commit does away with this "protection" based on the observation that deadlocks are now prevented anyway by the admission criteria introduced by `0fe75571d9`, which admits a read anyway when all the initial count resources are available (meaning no admitted reader is alive), regardless of availability of memory. The benefits of this revert is that the semaphore now knows about all the resources and is able to do its job better as it is not "lied to" about resource by the permits. Furthermore the status of a permit's resources is much simpler to reason about, there are no more loopholes in unexpected state transitions to swallow/leak resources. To prove that this revert is indeed safe, in the next commit we add robust tests that stress test admission on a highly contested semaphore. This patch also does away with the registered/admitted differentiation of permits, as this doesn't make much sense anymore, instead these two are unified into a single "active" state. One can always tell whether a permit was admitted or not from whether it owns count resources anyway. (cherry picked from commit `caaa8ef59a`)	2021-04-30 09:08:17 +03:00
Botond Dénes	f23052ae64	test: multishard_mutation_query_test: fuzzy-test: don't consume resource up-front The fuzzy test consumes a large chunk of resource from the semaphore up-front to simulate a contested semaphore. This isn't an accurate simulation, because no permit will have more than 1 units in reality. Furthermore this can even cause a deadlock since `8aaa3a7` as now we rely on all count units being available to make forward progress when memory is scarce. This patch just cuts out this part of the test, we now have a dedicated unit test for checking a heavily contested semaphore, that does it properly, so no need to try to fix this clumsy attempt that is just making trouble at this point. Refs: #8493 Tests: release(multishard_mutation_query_test:fuzzy_test) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210429084458.40406-1-bdenes@scylladb.com> (cherry picked from commit `26ae9555d1`)	2021-04-30 08:57:12 +03:00
Botond Dénes	15a157611a	reader_concurrency_semaphore: make admission conditions consistent Currently there are two places where we check admission conditions: `do_wait_admission()` and `signal()`. Both use `has_available_units()` to check resource availability, but the former has some additional resource related conditions on top (in `may_proceed()`), which lead to the two paths working with slightly different conditions. To fix, push down all resource availability related checks to `has_available_units()` to ensure admission conditions are consistent across all paths. (cherry picked from commit `d90cd6402c`)	2021-04-30 08:57:12 +03:00
Eliran Sinvani	d0b82e1e68	Materialized views: fix possibly old views comming from other nodes Migration manager has a function to get a schema (for read or write), this function queries a peer node and retrieves the schema from it. One scenario where it can happen is if an old node, queries an old not fixed index. This makes a hole through which views that are only adjusted for reading can slip through. Here we plug the hole by fixing such views before they are registered. Closes #8509 (cherry picked from commit `480a12d7b3`) Fixes #8554.	2021-04-29 14:03:03 +03:00
Botond Dénes	840ca41393	database: clear inactive reads in stop() If any inactive read is left in the semaphore, it can block `database::stop()` from shutting down, as sstables pinned by these reads will prevent `sstables::sstables_manager::close()` from finishing. This causes a deadlock. It is not clear how inactive reads can be left in the semaphore, as all users are supposed to clean up after themselves. Post 4.4 releases don't have this problem anymore as the inactive read handle was made a RAII object, removing the associated inactive read when destroyed. In 4.4 and earlier release this wasn't so, so errors could be made. Normally this is not a big issue, as these orphaned inactive reads are just evicted when the resources they own are needed, but it does become a serious issue during shutdown. To prevent a deadlock, clear the inactive reads earlier, in `database::stop()` (currently they are cleared in the destructor). This is a simple and foolproof way of ensuring any leftover inactive reads don't cause problems. Fixes: #8561 Tests: unit(dev) Closes #8562	2021-04-28 19:32:46 +03:00
Takuya ASADA	07051f25f2	dist: increase fs.aio-max-nr value for other apps Current fs.aio-max-nr value cpu_count() * 11026 is exact size of scylla uses, if other apps on the environment also try to use aio, aio slot will be run out. So increase value +65536 for other apps. Related #8133 Closes #8228 (cherry picked from commit `53c7600da8`)	2021-04-25 16:15:25 +03:00
Takuya ASADA	8437f71b1b	dist: tune fs.aio-max-nr based on the number of cpus Current aio-max-nr is set up statically to 1048576 in /etc/sysctl.d/99-scylla-aio.conf. This is sufficient for most use cases, but falls short on larger machines such as i3en.24xlarge on AWS that has 96 vCPUs. We need to tune the parameter based on the number of cpus, instead of static setting. Fixes #8133 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes #8188 (cherry picked from commit `d0297c599a`)	2021-04-25 16:15:12 +03:00
Avi Kivity	9f32f5a60c	Update seastar submodule (io_queue request size) * seastar 37eb6022fc...61939b5b8a (1): > io_queue: Double max request size Fixes #8496	2021-04-25 12:35:34 +03:00
Avi Kivity	910bc2417a	Update seastar submodule (low bandwidth disks) * seastar a75171fc89...37eb6022fc (2): > io_queue: Honor disks with tiny request rate > io_queue: Shuffle fair_group creation Fixes #8378.	2021-04-21 14:02:15 +03:00
Piotr Jastrzebski	7790beb655	row_cache: remove redundant check in make_reader This check is always true because a dummy entry is added at the end of each cache entry. If that wasn't true, the check in else-if would be an UB. Refs #8435. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> (cherry picked from commit `cb3dbb1a4b`)	2021-04-20 13:53:23 +02:00
Piotr Jastrzebski	1379f141c2	cache_flat_mutation_reader: fix do_fill_buffer Make sure that when a partition does not exist in underlying, do_fill_buffer does not try to fast forward withing this nonexistent partition. Test: unit(dev) Fixes #8435 Fixes #8411 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> (cherry picked from commit `1f644df09d`)	2021-04-20 13:53:17 +02:00
Piotr Jastrzebski	d14ec86e7d	read_context: add _partition_exists This new state stores the information whether current partition represented by _key is present in underlying. Refs #8435. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> (cherry picked from commit `ceab5f026d`)	2021-04-20 13:53:10 +02:00
Piotr Jastrzebski	bbada5b9e4	read_context: remove skip_first_fragment arg from create_underlying All callers pass false for its value so no need to keep it around. Refs #8435. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> (cherry picked from commit `b3b68dc662`)	2021-04-20 13:53:01 +02:00
Piotr Jastrzebski	d73ec88916	read_context: skip first fragment in ensure_underlying This was previously done in create_underlying but ensure_underlying is a better place because we will add more related logic to this consumption in the following patches. Refs #8435. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> (cherry picked from commit `088a02aafd`)	2021-04-20 13:52:46 +02:00
Kamil Braun	2efb458c7a	time_series_sstable_set: return partition start if some sstables were ck-filtered out When a particular partition exists in at least one sstable, the cache expects any single-partition query to this partition to return a `partition_start` fragment, even if the result is empty. In `time_series_sstable_set::create_single_key_sstable_reader` it could happen that all sstables containing data for the given query get filtered out and only sstables without the relevant partition are left, resulting in a reader which immediately returns end-of-stream (while it should return a `partition_start` and if not in forwarding mode, a `partition_end`). This commit fixes that. We do it by extending the reader queue (used by the clustering reader merger) with a `dummy_reader` which will be returned by the queue as the very first reader. This reader only emits a `partition_start` and, if not in forwarding mode, a `partition_end` fragment. Fixes #8447. Closes #8448. (cherry picked from commit `5c7ed7a83f`)	2021-04-20 13:52:34 +02:00
Kamil Braun	c05d8fcef1	clustering_order_reader_merger: handle empty readers The merger could return end-of-stream if some (but not all) of the underlying readers were empty (i.e. not even returning a `partition_start`). This could happen in places where it was used (`time_series_sstable_set::create_single_key_sstable_reader`) if we opened an sstable which did not have the queried partition but passed all the filters (specifically, the bloom filter returned a false positive for this sstable). The commit also extends the random tests for the merger to include empty readers and adds an explicit test case that catches this bug (in a limited scope: when we merge a single empty reader). It also modifies `test_twcs_single_key_reader_filtering` (regression test for #8432) because the time where the clustering key filter is invoked changes (some invocations move from the constructor of the merger to operator()). I checked manually that it still catches the bug when I reintroduce it. Fixes #8445. Closes #8446. (cherry picked from commit `7ffb0d826b`)	2021-04-20 13:52:13 +02:00
Kamil Braun	d29960da47	sstables: fix TWCS single key reader sstable filter The filter passed to `min_position_reader_queue`, which was used by `clustering_order_reader_merger`, would incorrectly include sstables as soon as they passed through the PK (bloom) filter, and would include sstables which didn't pass the PK filter (if they passed the CK filter). Fortunately this wouldn't cause incorrect data to be returned, but it would cause sstables to be opened unnecessarily (these sstables would immediately return eof), resulting in a performance drop. This commit fixes the filter and adds a regression test which uses statistics to check how many times the CK filter was invoked. Fixes #8432. Closes #8433. (cherry picked from commit `3687757115`)	2021-04-20 13:51:52 +02:00
Avi Kivity	e0d67ad6e4	Update seastar submodule (fair_queue fixes) * seastar 2c884a7449...a75171fc89 (2): > fair_queue: Preempted requests got re-queued too far > fair_queue: Improve requests preemption while in pending state Fixes #8296.	2021-04-14 15:40:45 +03:00
Hagit Segev	00da6b5e9e	release: prepare for 4.4.1	2021-04-07 00:28:45 +03:00
Gleb Natapov	4200e52444	storage_proxy: do not crash on LOCAL_QUORUM access to a DC with zero replication If a table that is not replicated to a certain DC (rf=0) is accessed with LOCAL_QUORUM on that DC the current code will crash since the 'targets' array will be empty and read executor does not handle it. Fix it by replying with empty result. Fixes #8354 Message-Id: <YGro+l2En3fF80CO@scylladb.com> (cherry picked from commit `cd24dfc7e5`) [avi: re-added virtual keyword when backporting, since 4.4 and below don't have `020da49c89`]	2021-04-06 19:34:49 +03:00
Nadav Har'El	f5e402ea7a	update tools/java submodule Backport for refs #8390. * tools/java 56470fda09...a3e010ee4f (1): > sstableloader: fix handling of rewritten partition Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-04-05 18:39:05 +03:00
Botond Dénes	05c6a40f05	result_memory_accounter: abort unpaged queries hitting the global limit The `result_memory_accounter` terminates a query if it reaches either the global or shard-local limit. This used to be so only for paged queries, unpaged ones could grow indefinitely (until the node OOM'd). This was changed in `fea5067` which enforces the local limit on unpaged queries as well, by aborting them. However a loophole remained in the code: `result_memory_accounter::check_and_update()` has another stop condition, besides `check_local_limit()`, it also checks the global limit. This stop condition was not updated to enforce itself on unpaged queries by aborting them, instead it silently terminated them, causing them to return less data then requested. This was masked by most queries reaching the local limit first. This patch fixes this by aborting unpaged mutation queries when they hit the global limit. Fixes: #8162 Tests: unit(release) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210226102202.51275-1-bdenes@scylladb.com> (cherry picked from commit `dd5a601aaa`)	2021-03-24 13:00:33 +02:00
Nadav Har'El	2b3bc9f174	Merge 'Fix reading whole requests during shedding' from Piotr Sarna When shedding requests (e.g. due to their size or number exceeding the limits), errors were returned right after parsing their headers, which resulted in their bodies lingering in the socket. The server always expects a correct request header when reading from the socket after the processing of a single request is finished, so shedding the requests should also take care of draining their bodies from the socket. Fixes #8193 Closes #8194 * github.com:scylladb/scylla: cql-pytest: add a shedding test transport: return error on correct stream during size shedding transport: return error on correct stream during shedding transport: skip the whole request if it is too large transport: skip the whole request during shedding (cherry picked from commit `0fea089b37`)	2021-03-24 12:49:57 +02:00
Piotr Sarna	4bfa605c38	Merge 'Fix inconsistencies in MV and SI (reworked)' from Eliran Sinvani This is a reworked submission of #7686 which has been reverted. This series fixes some race conditions in MV/SI schema creation and load, we spotted some places where a schema without a base table reference can sneak into the registry. This can cause to an unrecoverable error since write commands with those schemas can't be issued from other nodes. Most of those cases can occur on 2 main and uncommon cases, in a mixed cluster (during an upgrade) and in a small window after a view or base table altering. Fixes #7709 Closes #8091 * github.com:scylladb/scylla: database: Fix view schemas in place when loading global_schema_ptr: add support for view's base table materialized views: create view schemas with proper base table reference. materialized views: Extract fix legacy schema into its own logic (cherry picked from commit `d473bc9b06`)	2021-03-24 12:25:26 +02:00
Tomasz Grabiec	dbb550e1a7	sstable: writer: ka/la: Write row marker cell after row tombstone Row marker has a cell name which sorts after the row tombstone's start bound. The old code was writing the marker first, then the row tombstone, which is incorrect. This was harmeless to our sstable reader, which recognized both as belonging to the current clustering row fragment, and collects both fine. However, if both atoms trigger creation of promoted index blocks, the writer will create a promoted index with entries wich violate the cell name ordering. It's very unlikely to run into in practice, since to trigger promoted index entries for both atoms, the clustering key would be so large so that the size of the marker cell exceeds the desired promoted index block size, which is 64KB by default (but user-controlled via column_index_size_in_kb option). 64KB is also the limit on clustering key size accepted by the system. This was caught by one of our unit tests: sstable_conforms_to_mutation_source_test ...which runs a battery of mutation reader tests with various desired promoted index block sizes, including the target size of 1 byte, which triggers an entry for every atom. The test started to fail for some random seeds after commit `ecb6abe` inside the test_streamed_mutation_forwarding_is_consistent_with_slicing test case, reporting a mutation mismatch in the following line: assert_that(sliced_m).is_equal_to(fwd_m, slice_with_ranges.row_ranges(*m.schema(), m.key())); It compares mutations read from the same sstable using different methods, slicing using clustering key restricitons, and fast forwarding. The reported mismatch was that fwd_m contained the row marker, but sliced_m did not. The sstable does contain the marker, so both reads should return it. After reverting the commit which introduced dynamic adjustments, the test passes, but both mutations are missing the marker, both are wrong! They are wrong because the promoted index contians entries whose starting positions violate the ordering, so binary search gets confused and selects the row tombstone's position, which is emitted after the marker, thus skipping over the row marker. The explanation for why the test started to fail after dynamic adjustements is the following. The promoted index cursor works by incrementally parsing buffers fed by the file input stream. It first parses the whole block and then does a binary search within the parsed array. The entries which cursor touches during binary search depend on the size of the block read from the file. The commit which enabled dynamic adjustements causes the block size to be different for subsequent reads, which allows one of the reads to walk over the corrupted entries and read the correct data by selecting the entry corresponding to the row marker. Fixes #8324 Message-Id: <20210322235812.1042137-1-tgrabiec@scylladb.com> (cherry picked from commit `9272e74e8c`)	2021-03-24 10:38:54 +02:00
Avi Kivity	dffbcabbb1	Merge "mutation_writer: explicitly close writers" from Benny " _consumer_fut is expected to return an exception on the abort path. Wait for it and drop any exception so it won't be abandoned as seen in #7904. A future<> close() method was added to return _consumer_fut. It is called both after abort() in the error path, and after consume_end_of_stream, on the success path. With that, consume_end_of_stream was made void as it doesn't return a future<> anymore. Fixes #7904 Test: unit(release) " * tag 'close-bucket-writer-v5' of github.com:bhalevy/scylla: mutation_writer: bucket_writer: add close mutation_writer/feed_writers: refactor bucket/shard writers mutation_writer: update bucket/shard writers consume_end_of_stream (cherry picked from commit `f11a0700a8`)	2021-03-21 18:09:45 +02:00
Avi Kivity	a715c27a7f	Merge 'cdc: Limit size of topology description' from Piotr Jastrzębski Currently, whole topology description for CDC is stored in a single row. This means that for a large cluster of strong machines (say 100 nodes 64 cpus each), the size of the topology description can reach 32MB. This causes multiple problems. First of all, there's a hard limit on mutation size that can be written to Scylla. It's related to commit log block size which is 16MB by default. Mutations bigger than that can't be saved. Moreover, such big partitions/rows cause reactor stalls and negatively influence latency of other requests. This patch limits the size of topology description to about 4MB. This is done by reducing the number of CDC streams per vnode and can lead to CDC data not being fully colocated with Base Table data on shards. It can impact performance and consistency of data. This is just a quick fix to make it easily backportable. A full solution to the problem is under development. For more details see #7961, #7993 and #7985. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Closes #8048 * github.com:scylladb/scylla: cdc: Limit size of topology description cdc: Extract create_stream_ids from topology_description_generator (cherry picked from commit `c63e26e26f`)	2021-03-21 14:05:36 +02:00
Benny Halevy	6804332291	dist: scylla_util: prevent IndexError when no ephemeral_disks were found Currently we call firstNvmeSize before checking that we have enough (at least 1) ephemeral disks. When none are found, we hit the following error (see #7971): ``` File "/opt/scylladb/scripts/libexec/scylla_io_setup", line 239, in if idata.is_recommended_instance(): File "/opt/scylladb/scripts/scylla_util.py", line 311, in is_recommended_instance diskSize = self.firstNvmeSize File "/opt/scylladb/scripts/scylla_util.py", line 291, in firstNvmeSize firstDisk = ephemeral_disks[0] IndexError: list index out of range ``` This change reverses the order and first checks that we found enough disks before getting the fist disk size. Fixes #7971 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #8027 (cherry picked from commit `55e3df8a72`)	2021-03-21 12:19:24 +02:00
Nadav Har'El	a20991ad62	storage_service: correct missing exception in logging rebuild failure When failing to rebuild a node, we would print the error with the useless explanation "<no exception>". The problem was a typo in the logging command which used std::current_exception() - which wasn't relevant in that point - instead of "ep". Refs #8089 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210314113118.1690132-1-nyh@scylladb.com> (cherry picked from commit `d73934372d`)	2021-03-21 10:51:04 +02:00
Hagit Segev	ff585e0834	release: prepare for 4.4.0	2021-03-19 16:42:30 +02:00
Nadav Har'El	9a58deaaa2	alternator-test: increase read timeout and avoid retries By default the boto3 library waits up to 60 second for a response, and if got no response, it sends the same request again, multiple times. We already noticed in the past that it retries too many times thus slowing down failures, so in our test configuration lowered the number of retries to 3, but the setting of 60-second-timeout plus 3 retries still causes two problems: 1. When the test machine and the build are extremely slow, and the operation is long (usually, CreateTable or DeleteTable involving multiple views), the 60 second timeout might not be enough. 2. If the timeout is reached, boto3 silently retries the same operation. This retry may fail because the previous one really succeeded at least partially! The symptom is tests which report an error when creating a table which already exists, or deleting a table which dooesn't exist. The solution in this patch is first of all to never do retries - if a query fails on internal server error, or times out, just report this failure immediately. We don't expect to see transient errors during local tests, so this is exactly the right behavior. The second thing we do is to increase the default timeout. If 1 minute was not enough, let's raise it to 5 minutes. 5 minutes should be enough for every operation (famous last words...). Even if 5 minutes is not enough for something, at least we'll now see the timeout errors instead of some wierd errors caused by retrying an operation which was already almost done. Fixes #8135 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210222125630.1325011-1-nyh@scylladb.com> (cherry picked from commit `0b2cf21932`)	2021-03-19 00:08:27 +02:00
Raphael S. Carvalho	ee48ed2864	LCS: reshape: tolerate more sstables in level 0 with relaxed mode Relaxed mode, used during initialization, of reshape only tolerates min_threshold (default: 4) L0 sstables. However, relaxed mode should tolerate more sstables in level 0, otherwise boot will have to reshape level 0 every time it crosses the min threshold. So let's make LCS reshape tolerate a max of max_threshold and 32. This change is beneficial because once table is populated, LCS regular compaction can decide to merge those sstables in level 0 into level 1 instead, therefore reducing WA. Refs #8297. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210318131442.17935-1-raphaelsc@scylladb.com> (cherry picked from commit `e53cedabb1`)	2021-03-18 19:19:46 +02:00
Raphael S. Carvalho	03f2eb529f	compaction_manager: Fix performance of cleanup compaction due to unlimited parallelism Prior to `463d0ab`, only one table could be cleaned up at a time on a given shard. Since then, all tables belonging to a given keyspace are cleaned up in parallel. Cleanup serialization on each shard was enforced with a semaphore, which was incorrectly removed by the patch aforementioned. So space requirement for cleanup to succeed can be up to the size of keyspace, increasing the chances of node running out of space. Node could also run out of memory if there are tons of tables in the keyspace. Memory requirement is at least #_of_tables * 128k (not taking into account write behind, etc). With 5k tables, it's ~0.64G per shard. Also all tables being cleaned up in parallel will compete for the same disk and cpu bandwidth, so making them all much slower, and consequently the operation time is significantly higher. This problem was detected with cleanup, but scrub and upgrade go through the same rewrite procedure, so they're affected by exact the same problem. Fixes #8247. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210312162223.149993-1-raphaelsc@scylladb.com> (cherry picked from commit `7171244844`)	2021-03-18 14:28:57 +02:00
Dejan Mircevski	c270014121	cql3/expr: Handle `IN ?` bound to null Previously, we crashed when the IN marker is bound to null. Throw invalid_request_exception instead. This is a 4.4 backport of the #8265 fix. Tests: unit (dev) (cherry picked from commit `8db24fc03b`) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #8307 Fixes #8265.	2021-03-18 10:35:55 +02:00
Pavel Emelyanov	35804855f9	test: Fix exit condition of row_cache_test::test_eviction_from_invalidated The test populates the cache, then invalidates it, then tries to push huge (10x times the segment size) chunks into seastar memory hoping that the invalid entries will be evicted. The exit condition on the last stage is -- total memory of the region (sum of both -- used and free) becomes less than the size of one chunk. However, the condition is wrong, because cache usually contains a dummy entry that's not necessarily on lru and on some test iteration it may happen that evictable size < chunk size < evictable size + dummy size In this case test fails with bad_alloc being unable to evict the memory from under the dummy. fixes: #7959 tests: unit(row_cache_test), unit(the failing case with the triggering seed from the issue + 200 times more with random seeds) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210309134138.28099-1-xemul@scylladb.com> (cherry picked from commit `096e452db9`)	2021-03-16 23:42:11 +01:00
Piotr Sarna	a810e57684	Merge 'Alternator: support nested attribute paths... in all expressions' from Nadav Har'El. This series fixes #5024 - which is about adding support for nested attribute paths (e.g., a.b.c[2]) to Alternator. The series adds complete support for this feature in ProjectionExpression, ConditionExpression, FilterExpression and UpdateExpression - and also its combination with ReturnValues. Many relevant tests - and also some new tests added in this series - now pass. The first patch in the series fixes #8043 a bug in some error cases in conditions, which was discovered while working in this series, and is conceptually separate from the rest of the series. Closes #8066 * github.com:scylladb/scylla: alternator: correct implemention of UpdateItem with nested attributes and ReturnValues alternator: fix bug in ReturnValues=UPDATED_NEW alternator: implemented nested attribute paths in UpdateExpression alternator: limit the depth of nested paths alternator: prepare for UpdateItem nested attribute paths alternator: overhaul ProjectionExpression hierarchy implementation alternator: make parsed::path object printable alternator-test: a few more ProjectionExpression conflict test cases alternator-test: improve tests for nested attributes in UpdateExpression alternator: support attribute paths in ConditionExpression, FilterExpression alternator-test: improve tests for nested attributes in ConditionExpression alternator: support attribute paths in ProjectionExpression alternator: overhaul attrs_to_get handling alternator-test: additional tests for attribute paths in ProjectionExpression alternator-test: harden attribute-path tests for ProjectionExpression alternator: fix ValidationException in FilterExpression - and more (cherry picked from commit `cbbb7f08a0`)	2021-03-15 18:40:12 +02:00
Nadav Har'El	7b19cc17d6	updated tools/java submodule * tools/java 8080009794...56470fda09 (1): > sstableloader: Only escape column names once Backporting fix to Refs #8229. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-03-15 16:47:47 +02:00
Benny Halevy	101e0e611b	storage_service: use atomic_vector for lifecycle_subscribers So it can be modified while walked to dispatch subscribed event notifications. In #8143, there is a race between scylla shutdown and notify_down(), causing use-after-free of cql_server. Using an atomic vector itstead and futurizing unregister_subscriber allows deleting from _lifecycle_subscribers while walked using atomic_vector::for_each. Fixes #8143 Test: unit(release) DTest: update_cluster_layout_tests:TestUpdateClusterLayout.add_node_with_large_partition4_test(release) materialized_views_test.py:TestMaterializedViews.double_node_failure_during_mv_insert_4_nodes_test(release) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210224164647.561493-2-bhalevy@scylladb.com> (cherry picked from commit `baf5d05631`)	2021-03-15 15:25:18 +02:00
Benny Halevy	ba23eb733d	cql_server: event_notifier: unregister_subscriber in stop Move unregister_subscriber from the destructor to stop as preparation for moving storage_service lifescyle_subscribers to atomic_vector and futurizing unregister_subscriber. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210224164647.561493-1-bhalevy@scylladb.com> (cherry picked from commit `1ed04affab`) Ref #8143.	2021-03-15 15:25:10 +02:00
Hagit Segev	ec20ff0988	release: prepare for 4.4.rc4	2021-03-11 23:57:55 +02:00
Raphael S. Carvalho	3613b082bc	compaction: Prevent cleanup and regular from compacting the same sstable Due to regression introduced by `463d0ab`, regular can compact in parallel a sstable being compacted by cleanup, scrub or upgrade. This redundancy causes resources to be wasted, write amplification is increased and so does the operation time, etc. That's a potential source of data resurrection because the now-owned data from a sstable being compacted by both cleanup and regular will still exist in the node afterwards, so resurrection can happen if node regains ownership. Fixes #8155. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210225172641.787022-1-raphaelsc@scylladb.com> (cherry picked from commit `2cf0c4bbf1`) Includes fixup patch: compaction_manager: Fix use-after-free in rewrite_sstables() Use-after-free introduced by `2cf0c4bbf1`. That's because compacting is moved into then_wrapped() lambda, so it's potentially freed on the next iteration of repeat(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210309232940.433490-1-raphaelsc@scylladb.com> (cherry picked from commit `f7cc431477`)	2021-03-11 08:24:01 +02:00
Asias He	b94208009f	gossip: Handle timeout error in gossiper::do_shadow_round Currently, the rpc timeout error for the GOSSIP_GET_ENDPOINT_STATES verb is not handled in gossiper::do_shadow_round. If the GOSSIP_GET_ENDPOINT_STATES rpc call to any of the remote nodes goes timeout, gossiper::do_shadow_round will throw an exception and fail the whole boot up process. It is fine that some of the remote nodes timeout in shadow round. It is not a must to talk to all nodes. This patch fixes an issue we saw recently in our sct tests: ``` INFO \| scylla[1579]: [shard 0] init - Shutting down gossiping INFO \| scylla[1579]: [shard 0] gossip - gossip is already stopped INFO \| scylla[1579]: [shard 0] init - Shutting down gossiping was successful ... ERR \| scylla[1579]: [shard 0] init - Startup failed: seastar::rpc::timeout_error (rpc call timed out) ``` Fixes #8187 Closes #8213 (cherry picked from commit `dc40184faa`)	2021-03-10 16:27:47 +02:00
Nadav Har'El	05c266c02a	Merge 'Fix alternator streams management regression' from Calle Wilund Refs: #8012 Fixes: #8210 With the update to CDC generation management, the way we retrieve and process these changed. One very bad bug slipped through though; the code for getting versioned streams did not take into account the late-in-pr change to make clustering of CDC gen timestamps reversed. So our alternator shard info became quite rump-stumped, leading to more or less no data depending on when generations changed w.r. data. Also, the way we track the above timestamps changed, so we should utilize this for our end-of-iterator check. Closes #8209 * github.com:scylladb/scylla: alternator::streams: Use better method for generation timestamp system_distributed_keyspace: Add better routine to get latest cdc gen. timestamp system_distributed_keyspace: Fix cdc_get_versioned_streams timestamp range (cherry picked from commit `e12e57c915`)	2021-03-10 16:27:43 +02:00
Avi Kivity	4b7319a870	Merge 'Split CDC streams table partitions into clustered rows ' from Kamil Braun Until now, the lists of streams in the `cdc_streams_descriptions` table for a given generation were stored in a single collection. This solution has multiple problems when dealing with large clusters (which produce large lists of streams): 1. large allocations 2. reactor stalls 3. mutations too large to even fit in commitlog segments This commit changes the schema of the table as described in issue #7993. The streams are grouped according to token ranges, each token range being represented by a separate clustering row. Rows are inserted in reasonably large batches for efficiency. The table is renamed to enable easy upgrade. On upgrade, the latest CDC generation's list of streams will be (re-)inserted into the new table. Yet another table is added: one that contains only the generation timestamps clustered in a single partition. This makes it easy for CDC clients to learn about new generations. It also enables an elegant two-phase insertion procedure of the generation description: first we insert the streams; only after ensuring that a quorum of replicas contains them, we insert the timestamp. Thus, if any client observes a timestamp in the timestamps table (even using a ONE query), it means that a quorum of replicas must contain the list of streams. --- Nodes automatically ensure that the latest CDC generation's list of streams is present in the streams description table. When a new generation appears, we only need to update the table for this generation; old generations are already inserted. However, we've changed the description table (from `cdc_streams_descriptions` to `cdc_streams_descriptions_v2`). The existing mechanism only ensures that the latest generation appears in the new description table. We add an additional procedure that rewrites the older generations as well, if we find that it is necessary to do so (i.e. when some CDC log tables may contain data in these generations). Closes #8116 * github.com:scylladb/scylla: tests: add a simple CDC cql pytest cdc: add config option to disable streams rewriting cdc: rewrite streams to the new description table cql3: query_processor: improve internal paged query API cdc: introduce no_generation_data_exception exception type docs: cdc: mention system.cdc_local table cdc: coroutinize do_update_streams_description sys_dist_ks: split CDC streams table partitions into clustered rows cdc: use chunked_vector for streams in streams_version cdc: remove `streams_version::expired` field system_distributed_keyspace: use mutation API to insert CDC streams storage_service: don't use `sys_dist_ks` before it is started (cherry picked from commit `f0950e023d`)	2021-03-09 14:08:44 +02:00
Takuya ASADA	5ce71f3a29	scylla_raid_setup: don't abort using raiddev when array_state is 'clear' On Ubuntu 20.04 AMI, scylla_raid_setup --raiddev /dev/md0 causes '/dev/md0 is already using' (issue #7627). So we merged the patch to find free mdX (`587b909`). However, look into /proc/mdstat of the AMI, it actually says no active md device available: ubuntu@ip-10-0-0-43:~$ cat /proc/mdstat Personalities : unused devices: <none> We currently decide mdX is used when os.path.exists('/sys/block/mdX/md/array_state') == True, but according to kernel doc, the file may available even array is STOPPED: clear No devices, no size, no level Writing is equivalent to STOP_ARRAY ioctl https://www.kernel.org/doc/html/v4.15/admin-guide/md.html So we should also check array_state != 'clear', not just array_state existance. Fixes #8219 Closes #8220 (cherry picked from commit `2d9feaacea`)	2021-03-08 14:28:58 +02:00
Pekka Enberg	f06f4f6ee1	Update tools/jmx submodule * tools/jmx 2c95650...c510a56 (1): > APIBuilder: Unlock RW-lock in remove()	2021-03-04 14:36:45 +02:00
Hagit Segev	c2d9247574	release: prepare for 4.4.rc3	2021-03-04 13:38:43 +02:00
Raphael S. Carvalho	b4e393d215	compaction: Fix leak of expired sstable in the backlog tracker expired sstables are skipped in the compaction setup phase, because they don't need to be actually compacted, but rather only deleted at the end. that is causing such sstables to not be removed from the backlog tracker, meaning that backlog caused by expired sstables will not be removed even after their deletion, which means shares will be higher than needed, making compaction potentially more aggressive than it have to. to fix this bug, let's manually register these sstables into the monitor, such that they'll be removed from the tracker once compaction completes. Fixes #6054. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210216203700.189362-1-raphaelsc@scylladb.com> (cherry picked from commit `5206a97915`)	2021-03-01 14:14:33 +02:00
Avi Kivity	1a2b7037cd	Update seastar submodule * seastar 74ae29bc17...2c884a7449 (1): > io_queue: Fix "delay" metrics Fixes #8166.	2021-03-01 13:57:04 +02:00
Raphael S. Carvalho	048f5efe1c	sstables: Fix TWCS reshape for windows with at least min_threshold sstables TWCS reshape was silently ignoring windows which contain at least min_threshold sstables (can happen with data segregation). When resizing candidates, size of multi_window was incorrectly used and it was always empty in this path, which means candidates was always cleared. Fixes #8147. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210224125322.637128-1-raphaelsc@scylladb.com> (cherry picked from commit `21608bd677`)	2021-02-28 17:20:26 +02:00
Takuya ASADA	056293b95f	dist/debian: don't run dh_installinit for scylla-node-exporter when service name == package name dh_installinit --name <service> is for forcing install debian/.service and debian/.default that does not matches with package name. And if we have subpackages, packager has responsibility to rename debian/.service to debian/<subpackage>.service. However, we currently mistakenly running dh_installinit --name scylla-node-exporter for debian/scylla-node-exporeter.service, the packaging system tries to find destination package for the .service, and does not find subpackage name on it, so it will pick first subpackage ordered by name, scylla-conf. To solve the issue, we just need to run dh_installinit without --name when $product == 'scylla'. Fixes #8163 Closes #8164 (cherry picked from commit `aabc67e386`)	2021-02-28 17:20:26 +02:00
Takuya ASADA	f96ea8e011	scylla_setup: allow running scylla_setup with strict umask setting We currently deny running scylla_setup when umask != 0022. To remove this limitation, run os.chmod(0o644) on every file creation to allow reading from scylla user. Note that perftune.yaml is not really needed to set 0644 since perftune.py is running in root user, but setting it to align permission with other files. Fixes #8049 Closes #8119 (cherry picked from commit `f3a82f4685`)	2021-02-26 08:49:59 +02:00
Hagit Segev	49cd0b87f0	release: prepare for 4.4.rc2	2021-02-24 19:15:29 +02:00
Asias He	0977a73ab2	messaging_service: Move gossip ack message verb to gossip group Fix a scheduling group leak: INFO [shard 0] gossip - gossiper::run sg=gossip INFO [shard 0] gossip - gossiper::handle_ack_msg sg=statement INFO [shard 0] gossip - gossiper::handle_syn_msg sg=gossip INFO [shard 0] gossip - gossiper::handle_ack2_msg sg=gossip After the fix: INFO [shard 0] gossip - gossiper::run sg=gossip INFO [shard 0] gossip - gossiper::handle_ack_msg sg=gossip INFO [shard 0] gossip - gossiper::handle_syn_msg sg=gossip INFO [shard 0] gossip - gossiper::handle_ack2_msg sg=gossip Fixes #7986 Closes #8129 (cherry picked from commit `7018377bd7`)	2021-02-24 14:11:16 +02:00
Pekka Enberg	9fc582ee83	Update seastar submodule * seastar 572536ef...74ae29bc (3): > perftune.py: fix assignment after extend and add asserts > scripts/perftune.py: convert nic option in old perftune.yaml to list for compatibility > scripts/perftune.py: remove repeated items after merging options from file Fixes #7968.	2021-02-23 15:18:00 +02:00
Avi Kivity	4be14c2249	Revert "repair: Make removenode safe by default" This reverts commit `829b4c1438`. It ended up causing repair failures. Fixes #7965.	2021-02-23 14:14:07 +02:00
Tomasz Grabiec	3160dd4b59	table: Fix schema mismatch between memtable reader and sstable writer The schema used to create the sstable writer has to be the same as the schema used by the reader, as the former is used to intrpret mutation fragments produced by the reader. Commit `9124a70` intorduced a deferring point between reader creation and writer creation which can result in schema mismatch if there was a concurrent alter. This could lead to the sstable write to crash, or generate a corrupted sstable. Fixes #7994 Message-Id: <20210222153149.289308-1-tgrabiec@scylladb.com>	2021-02-23 13:48:33 +02:00
Avi Kivity	50a8eab1a2	Update seastar submodule * seastar a287bb1a3...572536ef4 (1): > rpc: streaming sink: order outgoing messages Fixes #7552.	2021-02-23 10:19:45 +02:00
Avi Kivity	04615436a0	Point seastar submodule at scylla-seastar.git This allows us to backport Seastar fixes to this branch.	2021-02-23 10:17:22 +02:00
Takuya ASADA	d1ab37654e	scylla_util.py: resolve /dev/root to get actual device on aws When psutil.disk_paritions() reports / is /dev/root, aws_instance mistakenly reports root partition is part of ephemeral disks, and RAID construction will fail. This prevents the error and reports correct free disks. Fixes #8055 Closes #8040 (cherry picked from commit `32d4ec6b8a`)	2021-02-21 16:22:51 +02:00
Nadav Har'El	b47bdb053d	alternator: fix ValidationException in FilterExpression - and more The first condition expressions we implemented in Alternator were the old "Expected" syntax of conditional updates. That implementation had some specific assumptions on how it handles errors: For example, in the "LT" operator in "Expected", the second operand is always part of the query, so an error in it (e.g., an unsupported type) resulted it a ValidationException error. When we implemented ConditionExpression and FilterExpression, we wrongly used the same functions check_compare(), check_BETWEEN(), etc., to implement them. This results in some inaccurate error handling. The worst example is what happens when you use a FilterExpression with an expression such as "x < y" - this filter is supposed to silently skip items whose "x" and "y" attributes have unsupported or different types, but in our implementation a bad type (e.g., a list) for y resulted in a ValidationException which aborted the entire scan! Interestingly, in once case (that of BEGINS_WITH) we actually noticed the slightly different behavior needed and implemented the same operator twice - with ugly code duplication. But in other operators we missed this problem completely. This patch first adds extensive tests of how the different expressions (Expected, QueryFilter, FilterExpression, ConditionExpression) and the different operators handle various input errors - unsupported types, missing items, incompatible types, etc. Importantly, the tests demonstrate that there is often different behavior depending on whether the bad input comes from the query, or from the item. Some of the new tests fail before this patch, but others pass and were useful to verify that the patch doesn't break anything that already worked correctly previously. As usual, all the tests pass on Cassandra. Finally, this patch fixes all these problems. The comparison functions like check_compare() and check_BETWEEN() now not only take the operands, they also take booleans saying if each of the operands came from the query or from an item. The old-syntax caller (Expected or QueryFilter) always say that the first operand is from the item and the second is from the query - but in the new-syntax caller (ConditionExpression or FilterExpression) any or all of the operands can come from the query and need verification. The old duplicated code for check_BEGINS_WITH() - which a TODO to remove it - is finally removed. Instead we use the same idea of passing booleans saying if each of its operands came from an item or from the query. Fixes #8043 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `653610f4bc`)	2021-02-21 09:25:01 +02:00
Piotr Sarna	e11ae8c58f	test: fix a flaky timeout test depending on TTL One of the USING TIMEOUT tests relied on a specific TTL value, but that's fragile if the test runs on the boundary of 2 seconds. Instead, the test case simply checks if the TTL value is present and is greater than 0, which makes the test robust unless its execution lasts for more than 1 million seconds, which is highly unlikely. Fixes #8062 Closes #8063 (cherry picked from commit `2aa4631148`)	2021-02-14 13:08:39 +02:00
Benny Halevy	e4132edef3	stream_session: prepare: fix missing string format argument As seen in mv_populating_from_existing_data_during_node_decommission_test dtest: ``` ERROR 2021-02-11 06:01:32,804 [shard 0] stream_session - failed to log message: fmt::v7::format_error (argument not found) ``` Fixes #8067 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210211100158.543952-1-bhalevy@scylladb.com> (cherry picked from commit `d01e7e7b58`)	2021-02-14 13:08:20 +02:00
Shlomi Livne	492f0802fb	scylla_io_setup did not configure pre tuned gce instances correctly scylla_io_setup condition for nr_disks was using the bitwise operator (&) instead of logical and operator (and) causing the io_properties files to have incorrect values Fixes #7341 Reviewed-by: Lubos Kosco <lubos@scylladb.com> Signed-off-by: Shlomi Livne <shlomi@scylladb.com> Closes #8019 (cherry picked from commit `718976e794`)	2021-02-14 13:08:00 +02:00
Takuya ASADA	34f22e1df1	dist/debian: install scylla-node-exporter.service correctly node-exporter systemd unit name is "scylla-node-exporter.service", not "node-exporter.service". Fixes #8054 Closes #8053 (cherry picked from commit `856fe12e13`)	2021-02-14 13:07:29 +02:00
Nadav Har'El	acb921845f	cql-pytest: fix flaky timeuuid_test.py The test timeuuid_test.py::testTimeuuid sporadically failed, and it turns out the reason was a bug in the test - which this patch fixes. The buggy test created a timeuuid and then compared the time stored in it to the result of the dateOf() CQL function. The problem is that dateOf() returns a CQL "timestamp", which has millisecond resolution, while the timeuuid may have finer than millisecond resolution. The reason why this test rarely failed is that in our implementation, the timeuuid almost always gets a millisecond-resolution timestamp. Only if now() gets called more than once in one millisecond, does it pick a higher time incremented by less than a millisecond. What this patch does is to truncate the time read from the timeuuid to millisecond resolution, and only then compare it to the result of dateOf(). We cannot hope for more. Fixes #8060 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210211165046.878371-1-nyh@scylladb.com> (cherry picked from commit `a03a8a89a9`)	2021-02-14 13:06:59 +02:00
Botond Dénes	5b6c284281	query: use local limit for non-limited queries in mixed cluster Since `fea5067df` we enforce a limit on the memory consumption of otherwise non-limited queries like reverse and non-paged queries. This limit is sent down to the replicas by the coordinator, ensuring that each replica is working with the same limit. This however doesn't work in a mixed cluster, when upgrading from a version which doesn't have this series. This has been worked around by falling back to the old max_result_size constant of 1MB in mixed clusters. This however resulted in a regression when upgrading from a pre `fea5067df` to a post `fea5067df` one. Pre `fea5067df` already had a limit for reverse queries, which was generalized to also cover non-paged ones too by `fea5067df`. The regression manifested in previously working reverse queries being aborted. This happened because even though the user has set a generous limit for them before the upgrade, in the mix cluster replicas fall back to the much stricter 1MB limit temporarily ignoring the configured limit if the coordinator is an old node. This patch solves this problem by using the locally configured limit instead of the max_result_size constant. This means that the user has to take extra care to configure the same limit on all replicas, but at least they will have working reverse queries during the upgrade. Fixes: #8022 Tests: unit(release), manual test by user who reported the issue Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210209075947.1004164-1-bdenes@scylladb.com> (cherry picked from commit `3d001b5587`)	2021-02-09 18:06:43 +02:00
Yaron Kaikov	7d15319a8a	release: prepare for 4.4 Update Docker parameters for the 4.4 release. Closes #7932	2021-02-09 09:42:53 +02:00
Amnon Heiman	a06412fd24	API: Fix aggregation in column_familiy Few method in column_familiy API were doing the aggregation wrong, specifically, bloom filter disk size. The issue is not always visible, it happens when there are multiple filter files per shard. Fixes #4513 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes #8007 (cherry picked from commit `4498bb0a48`)	2021-02-08 17:03:45 +02:00
Avi Kivity	2500dd1dc4	Merge 'dist/offline_installer/redhat: fix umask error' from Takuya ASADA Since makeself script changes current umask, scylla_setup causes "scylla does not work with current umask setting (0077)" error. To fix that we need use latest version of makeself, and specfiy --keep-umask option. Fixes #6243 Closes #6244 * github.com:scylladb/scylla: dist/offline_redhat: fix umask error dist/offline_installer/redhat: support cross build (cherry picked from commit `bb202db1ff`)	2021-02-01 13:03:06 +02:00
Hagit Segev	fd868722dd	release: prepare for 4.4.rc1	2021-01-31 14:09:44 +02:00
Pekka Enberg	f470c5d4de	Update tools/python3 submodule * tools/python3 c579207...199ac90 (1): > dist: debian: adjust .orig tarball name for .rc releases	2021-01-25 09:26:33 +02:00
Pekka Enberg	3677a72a21	Update tools/python3 submodule * tools/python3 1763a1a...c579207 (1): > dist/debian: handle rc version correctly	2021-01-22 09:36:54 +02:00
Hagit Segev	46e6273821	release: prepare for 4.4.rc0	2021-01-18 20:29:53 +02:00
Jenkins Promoter	ce7e31013c	release: prepare for 4.4	2021-01-18 15:49:55 +02:00
Avi Kivity	60f5ec3644	Merge 'managed_bytes: switch to explicit linearization' from Michał Chojnowski This is a revival of #7490. Quoting #7490: The managed_bytes class now uses implicit linearization: outside LSA, data is never fragmented, and within LSA, data is linearized on-demand, as long as the code is running within with_linearized_managed_bytes() scope. We would like to stop linearizing managed_bytes and keep it fragmented at all times, since linearization can require large contiguous chunks. Large contiguous allocations are hard to satisfy and cause latency spikes. As a first step towards that, we remove all implicitly linearizing accessors and replace them with an explicit linearization accessor, with_linearized(). Some of the linearization happens long before use, by creating a bytes_view of the managed_bytes object and passing it onwards, perhaps storing it for later use. This does not work with with_linearized(), which creates a temporary linearized view, and does not work towards the longer term goal of never linearizing. As a substitute a managed_bytes_view class is introduced that acts as a view for managed_bytes (for interoperability it can also be a view for bytes and is compatible with bytes_view). By the end of the series, all linearizations are temporary, within the scope of a with_linearized() call and can be converted to fragmented consumption of the data at leisure. This has limited practical value directly, as current uses of managed_bytes are limited to keys (which are limited to 64k). However, it enables converting the atomic_cell layer back to managed_bytes (so we can remove IMR) and the CQL layer to managed_bytes/managed_bytes_view, removing contiguous allocations from the coordinator. Closes #7820 * github.com:scylladb/scylla: test: add hashers_test memtable: fix accounting of managed_bytes in partition_snapshot_accounter test: add managed_bytes_test utils: fragment_range: add a fragment iterator for FragmentedView keys: update comments after changes and remove an unused method mutation_test: use the correct preferred_max_contiguous_allocation in measuring_allocator row_cache: more indentation fixes utils: remove unused linearization facilities in `managed_bytes` class misc: fix indentation treewide: remove remaining `with_linearized_managed_bytes` uses memtable, row_cache: remove `with_linearized_managed_bytes` uses utils: managed_bytes: remove linearizing accessors keys, compound: switch from bytes_view to managed_bytes_view sstables: writer: add write_* helpers for managed_bytes_view compound_compat: transition legacy_compound_view from bytes_view to managed_bytes_view types: change equal() to accept managed_bytes_view types: add parallel interfaces for managed_bytes_view types: add to_managed_bytes(const sstring&) serializer_impl: handle managed_bytes without linearizing utils: managed_bytes: add managed_bytes_view::operator[] utils: managed_bytes: introduce managed_bytes_view utils: fragment_range: add serialization helpers for FragmentedMutableView bytes: implement std::hash using appending_hash utils: mutable_view: add substr() utils: fragment_range: add compare_unsigned utils: managed_bytes: make the constructors from bytes and bytes_view explicit utils: managed_bytes: introduce with_linearized() utils: managed_bytes: constrain with_linearized_managed_bytes() utils: managed_bytes: avoid internal uses of managed_bytes::data() utils: managed_bytes: extract do_linearize_pure() thrift: do not depend on implicit conversion of keys to bytes_view clustering_bounds_comparator: do not depend on implicit conversion of keys to bytes_view cql3: expression: linearize get_value_from_mutation() eariler bytes: add to_bytes(bytes) cql3: expression: mark do_get_value() as static	2021-01-18 11:01:28 +02:00
Avi Kivity	ab44464911	Revert "docker: remove sshd from the image" This reverts commit `32fd38f349`. Some tests (in scylla-cluster-tests) depend on it.	2021-01-17 14:34:40 +02:00
Raphael S. Carvalho	00c29e1e24	table: Move notify_bootstrap_or_replace_*() out of line Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210117045747.69891-9-raphaelsc@scylladb.com>	2021-01-17 10:36:13 +02:00
Michał Chojnowski	5b72fb65ae	test: add hashers_test This test is a sanity check. It verifies that our wrappers over well known hashes (xxhash, md5, sha256) actually calculate exactly those hashes. It also checks that the `update()` methods of used hashers are linear with respect to concatenation: that is, `update(a + b)` must be equivalent to `update(a); update(b)`. This wasn't relied on before, but now we need to confirm that hashing fragmented keys without linearizing them won't break backward compatibility.	2021-01-15 18:28:24 +01:00
Michał Chojnowski	85048b349b	memtable: fix accounting of managed_bytes in partition_snapshot_accounter managed_bytes has a small overhead per each fragment. Due to that, managed_bytes containing the same data can have different total memory usage in different allocators. The smaller the preferred max allocation size setting is, the more fragments are needed and the greater total per-fragment overhead is. In particular, managed_bytes allocated in the LSA could grow in memory usage when copied to the standard allocator, if the standard allocator had a preferred max allocation setting smaller than the LSA. partition_snapshot_accounter calculates the amount of memory used by mutation fragments in the memtable (where they are allocated with LSA) based on the memory usage after they are copied to the standard allocator. This could result in an overestimation, as explained above. But partition_snapshot_accounter must not overestimate the amount of freed memory, as doing otherwise might result in OOM situations. This patch prevents the overaccounting by adding minimal_external_memory_usage(): a new version of external_memory_usage(), which ignores allocator-dependent overhead. In particular, it includes the per-fragment overhead in managed_bytes only once, no matter how many fragments there are.	2021-01-15 18:21:13 +01:00
Michał Chojnowski	d31771c0b2	test: add managed_bytes_test	2021-01-15 18:21:13 +01:00
Michał Chojnowski	72ecbd6936	utils: fragment_range: add a fragment iterator for FragmentedView A stylistic change. Iterators are the idiomatic way to iterate in C++.	2021-01-15 14:05:44 +01:00
Michał Chojnowski	2e38647a95	keys: update comments after changes and remove an unused method The comments were outdated after the latest changes (bytes_view vs managed_bytes_view). compound_view_wrapper::get_component() is unused, so we remove it.	2021-01-15 14:05:44 +01:00
Piotr Sarna	6ae94d31c1	treewide: remove shared pointer usage from the pager The pager interface doesn't really need to be virtual, so the next step could be to remove the need for pointers entirely, but migrating from shared_ptr to unique_ptr is a low-hanging fruit. Message-Id: <a5bdecb17ae58e914da020fb58a41f4574565c66.1610709560.git.sarna@scylladb.com>	2021-01-15 15:03:14 +02:00
Avi Kivity	f20736d93d	Merge 'Support unofficial distributions' from Takuya ASADA Since we introduced relocatable package and offline installer, scylla binary itself can run almost any distributions. However, setup scripts are not designed to run in unsupported distributions, it causes error on such environment. This PR adds minimal support to run offline installation on unsupported distributions, tested on SLES, Arch Linux and Gentoo. Closes #7858 * github.com:scylladb/scylla: dist: use sysconfig_parser to parse gentoo config file dist: add package name translation dist: support SLES/OpenSUSE install.sh: add systemd existance check install.sh: ignore error missing sysctl entries dist: show warning on unsupported distributions dist: drop Ubuntu 14.04 code dist: move back is_amzn2() to scylla_util.py dist: rename is_gentoo_variant() to is_gentoo() dist: support Arch Linux dist: make sysconfig directory detectable	2021-01-14 16:59:49 +02:00
Raphael S. Carvalho	97e076365e	Fix stalls on Memtable flush by preempting across fragment generation if needed Flush is facing stalls because partition_snapshot_flat_reader::fill_buffer() generates mutation fragment until buffer is full[1] without yielding. this is the code path: flush_reader::fill_buffer() <---------\| flat_mutation_reader::consume_pausable() <--------\| partition_snapshot_flat_reader::fill_buffer() -\| [1]: https://github.com/scylladb/scylla/blob/6cfc949e/partition_snapshot_reader.hh#L261 This is fixed by breaking the loop in do_fill_buffer() if preemption is needed, allowing do_until() to yield in sequence, and when it resumes, continue from where it left off, until buffer is full. Fixes #7885. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210114141417.285175-1-raphaelsc@scylladb.com>	2021-01-14 16:30:55 +02:00
Ivan Prisyazhnyy	32fd38f349	docker: remove sshd from the image implicit revert of `6322293263` sshd previosly was used by the scylla manager 1.0. new version does not need it. there is no point of having it currently. it also confuses everyone. Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Closes #7921	2021-01-14 12:52:24 +02:00
Pavel Emelyanov	2b31be0daa	client-state,cdc: Remove call for storage_service from permissions check The client_state::check_access() calls for global storage service to get the features from it and check if the CDC feature is on. The latter is needed to perform CDC-specific checks. However it was noticed, that the check for the feature is excessive as all the guarded if-s will resolve to false in case CDC is off and the check_access will effectively work as it would with the feature check. With that observation, it's possible to ditch one more global storage service reference. tests: unit(dev), dtest(dev, auth) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210105063651.7081-1-xemul@scylladb.com>	2021-01-14 12:52:24 +02:00
Takuya ASADA	7a74f8cd2e	dist: use sysconfig_parser to parse gentoo config file Use sysconfig_parser instead of regex, to improve code readability.	2021-01-13 21:34:23 +09:00
Takuya ASADA	2a4d293841	dist: add package name translation Translate package name from CentOS package to different distribution package name, to use single package name for pkg_install().	2021-01-13 21:27:14 +09:00
Takuya ASADA	0a9843842d	dist: support SLES/OpenSUSE Add support SLES/OpenSUSE on setup script.	2021-01-13 19:32:46 +09:00
Takuya ASADA	a34edf8169	install.sh: add systemd existance check offline installer can run in non-systemd distributions, but it won't work since we only have systemd units. So check systemd existance and print error message.	2021-01-13 19:32:45 +09:00
Takuya ASADA	b8c35772b3	install.sh: ignore error missing sysctl entries On some kernel may not have specified sysctl parameter, so we should ignore the error.	2021-01-13 19:32:45 +09:00
Takuya ASADA	e8f74e800c	dist: show warning on unsupported distributions Add warning message on unsupported distributions, for scylla_cpuscaling_setup and scylla_ntp_setup.	2021-01-13 19:32:45 +09:00
Takuya ASADA	2f344cf50d	dist: drop Ubuntu 14.04 code We don't support Ubuntu 14.04 anymore, drop them	2021-01-13 19:32:45 +09:00
Takuya ASADA	8e59f70080	dist: move back is_amzn2() to scylla_util.py Distribution detection functions should be placed same place, so move back it to scylla_util.py	2021-01-13 19:32:45 +09:00
Takuya ASADA	921b1676c0	dist: rename is_gentoo_variant() to is_gentoo() is_redhat_variant() is the function to detect RHEL/CentOS/Fedora/OEL, and is_debian_variant() is the function to detect Debian/Ubuntu. Unlike these functions, is_gentoo_variant() does not detect "Gentoo variants", we should rename it to is_gentoo().	2021-01-13 19:32:45 +09:00
Takuya ASADA	fffa8f5ded	dist: support Arch Linux Add support Arch Linux on setup script.	2021-01-13 19:32:45 +09:00
Takuya ASADA	0d11f9463d	dist: make sysconfig directory detectable Currently, install.sh provide a way to customize sysconfig directory, but sysconfig directory is hardcoded on script. Also, /etc/sysconfig seems correct to use default value, but current code specify /etc/default as non-redhat distributions. Instead of hardcoding, generate generate python script in install.sh to save specified sysconfig directory path in python code.	2021-01-13 19:32:45 +09:00
Wojciech Mitros	93613e20a3	api: remove potential large allocation in /column_family/ GET request handler The reply to a /column_family/ GET request contains info about all column families. Currently, all this info is stored in a single string when replying, and this string may require a big allocation when there are many column families. To avoid that allocation, instead of a single string, use a body_writer function, which writes chunks of the message content to the output stream. Fixes #7916 Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com> Closes #7917	2021-01-13 12:04:18 +02:00
Avi Kivity	ed53b3347e	Merge 'idl: remove the large allocation in mutation_partition_view::rows()' from Wojciech Mitros After these changes the generated code deserializes the stream into a chunked vector, instead of an contiguous one, so even if there are many fields in it, there won't be any big allocations. I haven't run the scylla cluster test with it yet but it passes the unit tests. Closes #7919 * github.com:scylladb/scylla: idl: change the type of mutation_partition_view::rows() to a chunked_vector idl-compiler: allow fields of type utils::chunked_vector	2021-01-13 11:07:29 +02:00
Nadav Har'El	711b311d47	cql-pytest: tests for fromJson() integer overflow Numbers in JSON are not limited in range, so when the fromJson() function converts a number to a limited-range integer column in Scylla, this conversion can overflow. The following tests check that this conversion should result in an error (FunctionFailure), not silent trunction. Scylla today does silently wrap around the number, so these tests xfail. They pass on Cassandra. Refs #7914. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210112151041.3940361-1-nyh@scylladb.com>	2021-01-13 11:07:29 +02:00
Nadav Har'El	617e1be1b6	cql-pytest: expand tests for fromJson() failures This patch adds more (failing) tests for issue #7911, where fromJson() failures should be reported as a clean FunctionFailure error, not an internal server error. The previous tests we had were about JSON parse failures, but a different type of error we should support is valid JSON which returned the wrong type - e.g., the JSON returning a string when an integer was expected, or the JSON returning a string with non-ASCII characters when ASCII was expected. So this patch adds more such tests. All of them xfail on Scylla, and pass on Cassandra. Refs #7911. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210112122211.3932201-1-nyh@scylladb.com>	2021-01-13 11:07:29 +02:00
Nadav Har'El	2ebe8055ee	cql-pytest: add test for fromJson() null parameter. This patch adds a reproducer test for issue #7912, which is about passing a null parameter to the fromJson() function supposed to be legal (and return a null value), and is legal in Cassandra, but isn't allowed in Scylla. There are two tests - for a prepared and unprepared statement - which fail in different ways. The issue is still open so the tests xfail on Scylla - and pass on Cassandra. Refs #7912. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210112114254.3927671-1-nyh@scylladb.com>	2021-01-13 11:07:29 +02:00
dgarcia360	78e9f45214	docs: update url Related issue scylladb/sphinx-scylladb-theme#88 Once this commit is merged, the docs will be published under the new domain name https://scylla.docs.scylladb.com Frequently asked questions: Should we change the links in the README/docs folder? GitHub automatically handles the redirections. For example, https://scylladb.github.io/sphinx-scylladb-theme/stable/examples/index.html redirects to https://sphinx-theme.scylladb.com/stable/examples/index.html Nevertheless, it would be great to change URLs progressively to avoid the 301 redirections. Do I need to add this new domain in the custom dns domain section on GitHub settings? It is not necessary. We have already edited the DNS for this domain and the theme creates programmatically the required CNAME file. If everything goes well, GitHub should detect the new URL after this PR is merged. The DNS doesn't seem to have the right SSL certificates GitHub handles the certificate provisioning but is not aware of the subdomain for this repo yet. make multi-version will create a new file "CNAME". This is published in gh-pages branch, therefore GitHub should create the missing cert. Closes #7877	2021-01-13 11:07:29 +02:00
Avi Kivity	d508a63d4b	row_cache: linearize key in cache_entry::do_read() do_read() does not linearize cache_entry::_key; this can cause a crash with keys larger than 13k. Fixes #7897. Closes #7898	2021-01-13 11:07:29 +02:00
dgarcia360	36f8d35812	docs: added multiversion_regex_builder Fixed makefile Added path Closes #7876	2021-01-13 11:07:29 +02:00
Benny Halevy	5e41228fe8	test: everywhere: use seastar::testing::local_random_engine Use the thread_local seastar::testing::local_random_engine in all seastar tests so they can be reproduced using the --random-seed option. Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210112103713.578301-2-bhalevy@scylladb.com>	2021-01-13 11:07:29 +02:00
Benny Halevy	43ab094c88	configure: add utf8_test to pure_boost_tests Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210112103713.578301-1-bhalevy@scylladb.com>	2021-01-13 11:07:29 +02:00
Dejan Mircevski	d79c2cab63	cql3: Use correct comparator in timeuuid min/max The min/max aggregators use aggregate_type_for comparators, and the aggregate_type_for<timeuuid> is regular uuid. But that yields wrong results; timeuuids should be compared as timestamps. Fix it by changing aggregate_type_for<timeuuid> from uuid to timeuuid, so aggregators can distinguish betwen the two. Then specialize the aggregation utilities for timeuuid. Add a cql-pytest and change some unit tests, which relied on naive uuid comparators. Fixes #7729. Tests: unit (dev, debug) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #7910	2021-01-13 11:07:29 +02:00
Avi Kivity	96d64b7a1f	Merge "Wire interposer consumer for memtable flush" from Raphael " Without interposer consumer on flush, it could happen that a new sstable, produced by memtable flush, will not conform to the strategy invariant. For example, with TWCS, this new sstable could span multiple time windows, making it hard for the strategy to purge expired data. If interposer is enabled, the data will be correctly segregated into different sstables, each one spanning a single window. Fixes #4617. tests: - mode(dev). - manually tested it by forcing a flush of memtable spanning many windows " * 'segregation_on_flush_v2' of github.com:raphaelsc/scylla: test: Add test for TWCS interposer on memtable flush table: Wire interposer consumer for memtable flush table: Add write_memtable_to_sstable variant which accepts flat_mutation_reader table: Allow sstable write permit to be shared across monitors memtable: Track min timestamp table: Extend cache update to operate a memtable split into multiple sstables	2021-01-13 11:07:29 +02:00
Nadav Har'El	8164c52871	cql-pytest: add test for fromJson() parse error This patch adds a reproducer test for issue #7911, which is about a parse error in JSON string passed to the fromJson() function causing an internal error instead of the expected FunctionFailure error. The issue is still open so the test xfails on Scylla (and passes on Cassandra). Refs #7911. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210112094629.3920472-1-nyh@scylladb.com>	2021-01-13 11:07:29 +02:00
Pavel Solodovnikov	10e3da692f	lwt: validate `paxos_grace_seconds` table option The option can only take integer values >= 0, since negative TTL is meaningless and is expected to fail the query when used with `USING TTL` clause. It's better to fail early on `CREATE TABLE` and `ALTER TABLE` statement with a descriptive message rather than catch the error during the first lwt `INSERT` or `UPDATE` while trying to insert to system.paxos table with the desired TTL. Tests: unit(dev) Fixes: #7906 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210111202942.69778-1-pa.solodovnikov@scylladb.com>	2021-01-13 11:07:29 +02:00
Gleb Natapov	51bf5f5846	raft: test: do not check snapshot during backpressure test Unfortunately snapshot checking still does not work in the presence of log entries reordering. It is impossible to know when exactly the snapshot will be taken and if it is taken before all smaller than snapshot idx entries are applied the check will fail since it assumes that. This patch disabled snapshot checking for SUM state machine that is used in backpressure test. Message-Id: <20201126122349.GE1655743@scylladb.com>	2021-01-13 11:07:29 +02:00
Wojciech Mitros	59769efd3b	idl: change the type of mutation_partition_view::rows() to a chunked_vector The value of mutation_partition_view::rows() may be very large, but is used almost exclusively for iteration, so in order to avoid a big allocation for an std::vector, we change its type to an utils::chunked_vector. Fixes #7918 Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-01-13 04:25:53 +01:00
Wojciech Mitros	88e750f379	idl-compiler: allow fields of type utils::chunked_vector The utils::chunked_vector has practically the same methods as a std::vector, so the same code can be generated for it. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-01-13 04:09:18 +01:00
Avi Kivity	ccd09f1398	Update seastar submodule * seastar 6b36e84c3...a287bb1a3 (1): > merge: file: correct dma alignment for odd filesystems Ref #7794.	2021-01-11 20:38:59 +02:00
Tomasz Grabiec	6cfc949e62	Merge "sstables: validate the writer's input with the mutation fragment stream validator" from Botond We have recently seen a suspected corrupt mutation fragment stream to get into an sstable undetected, causing permanent corruption. One of the suspected ways this could happen is the compaction sstable write path not being covered with a validator. To prevent events like this in the future make sure all sstable write paths are validated by embedding the validator right into the sstable writer itself. Refs: #7623 Refs: #7640 Tests: unit(release) * https://github.com/denesb/scylla.git sstable-writer-fragment-stream-validation/v2: sstable_writer: add validation test/boost/sstable_datafile_test: sstable_scrub_test: disable key validation mutation_fragment_stream_validator: make it easier to validate concrete fragment types flat_mutation_reader: extract fragment stream validator into its own header	2021-01-11 14:57:48 +01:00
Pekka Enberg	42806c6f40	Update seastar submodule * seastar ed345cdb...6b36e84c (3): > perftune.py: Don't print nic driver name to avoid Fixes #7905 > io_tester: Make file sizes configurable > io_queue: Limit tickets for oversized requests	2021-01-11 14:12:06 +02:00
Pavel Solodovnikov	0981b786a8	db/query_options: specify serial consistency for DEFAULT specific_options Cassandra constructs `QueryOptions.SpecificOptions` in the same way that we do (by not providing `serial_constency`), but they do have a user-defined constructor which does the following thing: this.serialConsistency = serialConsistency == null ? ConsistencyLevel.SERIAL : serialConsistency; This effectively means that DEFAULT `SpecificOptions` always have `SerialConsistency` set to `SERIAL`, while we leave this `std::nullopt`, since we don't have a constructor for `specific_options` which does this. Supply `db::consistency_level::SERIAL` explicitly to the `specific_options::DEFAULT` value. Tests: unit(dev) Fixes: #7850 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20201231104018.362270-1-pa.solodovnikov@scylladb.com>	2021-01-11 12:12:29 +02:00
Nadav Har'El	a3f9bd9c3f	cql-pytest: add xfailing reproducer for issue #7888 This adds a simple reproducer for a bug involving a CONTAINS relation on frozen collection clustering columns when the query is restricted to a single partition - resulting in a strange "marshalling error". This bug still exists, so the test is marked xfail. Refs #7888. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210107191417.3775319-1-nyh@scylladb.com>	2021-01-11 08:49:16 +01:00
Nadav Har'El	678da50a10	cql-pytest: add reproducers for reversed frozen collection bugs We add a reproducer for issues #7868 and #7875 which are about bugs when a table has a frozen collection as its clustering key, and it is sorted in reverse order: If we tried to insert an item to such a table using an unprepared statement, it failed with a wrong error ("invalid set literal"), but if we try to set up a prepared statement, the result is even worse - an assertion failure and a crash. Interestingly, neither of these problems happen without reversed sort order (WITH CLUSTERING ORDER BY (b DESC)), and we also add a test which demonstrates that with default (increasing) order, everything works fine. All tests pass successfully when run against Cassandra. The fix for both issues was already committed, so I verified these tests reproduced the bug before that commit, and pass now. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210110232312.3844408-1-nyh@scylladb.com>	2021-01-11 08:48:30 +01:00
Nadav Har'El	f32c34d8ad	cql-pytest: port Cassandra's unit test validation/entities/frozen_collections_test In this patch, we port validation/entities/frozen_collections_test.java, containing 33 tests for frozen collections of all types, including nesting collections. In porting these tests, I uncovered four previously unknown bugs in Scylla: Refs #7852: Inserting a row with a null key column should be forbidden. Refs #7868: Assertion failure (crash) when clustering key is a frozen collection and reverse order. Refs #7888: Certain combination of filtering, index, and frozen collection, causes "marshalling error" failure. Refs #7902: Failed SELECT with tuple of reversed-ordered frozen collections. These tests also provide two more reproducers for an already known bug: Refs #7745: Length of map keys and set items are incorrectly limited to 64K in unprepared CQL. Due to these bugs, 7 out of the 33 tests here currently xfail. We actually had more failing tests, but we fixed issue #7868 before this patch went in, so its tests are passing at the time of this submission. As usual in these sort of tests, all 33 pass when running against Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210110231350.3843686-1-nyh@scylladb.com>	2021-01-11 08:48:08 +01:00
Nadav Har'El	0516cd1609	alternator test: de-duplicate some duplicate code In test_streams.py we had some code to get a list of shards and iterators duplicated three times. Put it in a function, shards_and_latest_iterators(), to reduce this duplication. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201006112421.426096-1-nyh@scylladb.com>	2021-01-11 08:47:25 +01:00
Botond Dénes	cb4d92aae4	sstable_writer: add validation Add a mutation_fragment_stream_validating_filter to sstables::writer_impl and use it in sstable_writer to validate the fragment stream passed down to the writer implementation. This ensures that all fragment streams written to disk are validated, and we don't have to worry about validating each source separately. The current validator from sstable::write_components() is removed. This covers only part of the write paths. Ad-hoc validations in the reader implementations are removed as well as they are now redundant.	2021-01-11 09:12:56 +02:00
Botond Dénes	4b254a26ab	test/boost/sstable_datafile_test: sstable_scrub_test: disable key validation The test violates clustering key order on purpose to produce a corrupt sstable (to test scrub). Disable key validation so when we move the validator into the writer itself in the next patch it doesn't abort the test.	2021-01-11 09:12:56 +02:00
Botond Dénes	8dae6152bf	mutation_fragment_stream_validator: make it easier to validate concrete fragment types The current API is tailored to the `mutation_fragment` type. In the next patch we will want to use the validator from a context where the mutation fragments are already decomposed into their respective concrete types, e.g. static_row, clustering_row, etc. To avoid having to reconstruct a mutation fragment type just to use the validator, add an API which allows validating these concrete types conveniently too.	2021-01-11 08:07:42 +02:00
Botond Dénes	495f9d54ba	flat_mutation_reader: extract fragment stream validator into its own header To allow using it without pulling in the huge `flat_mutation_reader.hh`.	2021-01-11 08:07:42 +02:00
Dejan Mircevski	3aa80f47fe	abstract_type: Rework unreversal methods Replace two methods for unreversal (`as` and `self_or_reversed`) with a new one (`without_reversed`). More flexible and better named. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #7889	2021-01-10 19:30:12 +02:00
Tomasz Grabiec	15b5b286d9	Merge "frozen_mutation: better diagnostics for out-of-order and duplicate rows" from Botond Currently, frozen mutations, that contain partitions with out-of-order or duplicate rows will trigger (if they even do) an assert in `row::append_cell()`. However, this results in poor diagnostics (if at all) as the context doesn't contain enough information on what exactly went wrong. This results in a cryptic error message and an investigation that can only start after looking at a coredump. This series remedies this problem by explicitly checking for out-of-order and duplicate rows, as early as possible, when the supposedly empty row is created. If the row already existed (is a duplicate) or it is not the last row in the partition (out-of-order row) an exception is thrown and the deserialization is aborted. To further improve diagnostics, the partition context is also added to the exception. Tests: unit(release) * botond/frozen-mutation-bad-row-diagnostics/v3: frozen_mutation: add partition context to errors coming from deserializing partition_builder: accept_row(): use append_clustering_row() mutation_partition: add append_clustered_row()	2021-01-10 19:30:12 +02:00
Pekka Enberg	e5fe0acd15	Update seastar submodule * seastar 56cfe179...ed345cdb (1): > perftune.py: Fix the dump options after adding multiple nics option Refs #6266	2021-01-08 18:13:26 +01:00
Benny Halevy	60bde99e8e	flat_mutation_reader: consume_in_thread: always filter.on_end_of_stream on return Since we're calling _consumer.consume_end_of_stream() unconditionally when consume_pausable_in_thread returns. Refs #7623 Refs #7640 Test: unit(dev) Dtest: materialized_views_test.py:TestMaterializedViews.interrupt_build_process_with_resharding_low_to_half_test Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210106103024.3494569-1-bhalevy@scylladb.com>	2021-01-08 18:13:26 +01:00
Michał Chojnowski	f317b3c39f	mutation_test: use the correct preferred_max_contiguous_allocation in measuring_allocator measuring_allocator is a wrapper around standard_allocator, but it exposed the default preferred_max_contiguous_allocation, not the one from standard_allocator. Thus managed_bytes allocated in those two allocators had fragments of different size, and their total memory usage differed, causing test_external_memory_usage to fail if standard_allocator::preferred_max_contiguous_allocation was changed from the default. Fix that.	2021-01-08 14:16:08 +01:00
Pavel Solodovnikov	907b73a652	row_cache: more indentation fixes Fixup indentation issues introduced in recent patches. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-08 14:16:08 +01:00
Pavel Solodovnikov	eb523d4ac8	utils: remove unused linearization facilities in `managed_bytes` class Remove the following bits of `managed_bytes` since they are unused: * `with_linearized_managed_bytes` function template * `linearization_context_guard` RAII wrapper class for managing `linearization_context` instances. * `do_linearize` function * `linearization_context` class Since there is no more public or private methods in `managed_class` to linearize the value except for explicit `with_linearized()`, which doesn't use any of aforementioned parts, we can safely remove these. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-08 14:16:08 +01:00
Pavel Solodovnikov	8709844566	misc: fix indentation The patch fixes indentation issues introduced in previous patches related to removing `with_linearized_managed_bytes` uses from the code tree. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-08 14:16:08 +01:00
Pavel Solodovnikov	e04eb68a9c	treewide: remove remaining `with_linearized_managed_bytes` uses There is no point in calling the wrapper since linearization code is private in `managed_bytes` class and there is no one to call `managed_bytes::data` because it was deleted recently. This patch is a prerequisite for removing `with_linearized_managed_bytes` function completely, alongside with the corresponding parts of implementation in `managed_bytes`. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-08 14:16:08 +01:00
Pavel Solodovnikov	bf8b138b42	memtable, row_cache: remove `with_linearized_managed_bytes` uses Since `managed_bytes::data()` is deleted as well as other public APIs of `managed_bytes` which would linearize stored values except for explicit `with_linearized`, there is no point invoking `with_linearized_managed_bytes` hack which would trigger automatic linearization under the hood of managed_bytes. Remove useless `with_linearized_managed_bytes` wrapper from memtable and row_cache code. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-08 14:16:08 +01:00
Avi Kivity	3bf6b78668	utils: managed_bytes: remove linearizing accessors Accessor that require linearization, such as data(), begin(), and casting to bytes_view, are no longer used and are now removed.	2021-01-08 14:16:08 +01:00
Michał Chojnowski	dbcf987231	keys, compound: switch from bytes_view to managed_bytes_view The keys classes (partition_key et al) already use managed_bytes, but they assume the data is not fragmented and make liberal use of that by casting to bytes_view. The view classes use bytes_view. Change that to managed_bytes_view, and adjust return values to managed_bytes/managed_bytes_view. The callers are adjusted. In some places linearization (to_bytes()) is needed, but this isn't too bad as keys are always <= 64k and thus will not be fragmented when out of LSA. We can remove this linearization later. The serialize_value() template is called from a long chain, and can be reached with either bytes_view or managed_bytes_view. Rather than trace and adjust all the callers, we patch it now with constexpr if. operator bytes_view (in keys) is converted to operator managed_bytes_view, allowing callers to defer or avoid linearization.	2021-01-08 14:16:08 +01:00
Michał Chojnowski	a1a0839164	sstables: writer: add write_* helpers for managed_bytes_view We will use them in the upcoming patch where we transition keys from bytes_view to mutable_bytes_view.	2021-01-08 14:16:08 +01:00
Michał Chojnowski	45c1b90eb5	compound_compat: transition legacy_compound_view from bytes_view to managed_bytes_view The underlying view will change from bytes_view to managed_bytes_view in the next commits, so we prepare for that.	2021-01-08 14:16:08 +01:00
Avi Kivity	d9fcc4f4ef	types: change equal() to accept managed_bytes_view bytes_view can convert to managed_bytes_view, so the change is compatible with the existing representation and the next patches, which change compound types to use managed_bytes_view.	2021-01-08 14:16:08 +01:00
Michał Chojnowski	1de0b9a425	types: add parallel interfaces for managed_bytes_view We will need those to transition keys and compound from bytes_view to managed_bytes_view.	2021-01-08 14:16:08 +01:00
Avi Kivity	d1f354f5fb	types: add to_managed_bytes(const sstring&) This is a helper for tests (similar to to_bytes(const sstring&)).	2021-01-08 14:16:08 +01:00
Michał Chojnowski	c6eb485675	serializer_impl: handle managed_bytes without linearizing With managed_bytes_view implemented, it's easy to de/serialize managed_bytes without linearization.	2021-01-08 14:16:08 +01:00
Michał Chojnowski	bf0ec63e34	utils: managed_bytes: add managed_bytes_view::operator[] This operator has a single purpose: an easier port of legacy_compound_view from bytes_view to managed_bytes_view. It is inefficient and should be removed as soon as legacy_compound_view stops using operator[].	2021-01-08 14:16:08 +01:00
Michał Chojnowski	778269151a	utils: managed_bytes: introduce managed_bytes_view managed_bytes_view is a non-owning view into managed_bytes. It can also be implicitly constructed from bytes_view. It conforms to the FragmentedView concept and is mainly used through that interface. It will be used as a replacement for bytes_view occurrences currently obtained by linearizing managed_bytes.	2021-01-08 14:16:08 +01:00
Michał Chojnowski	cf7d25b98d	utils: fragment_range: add serialization helpers for FragmentedMutableView We will use them to write to managed_bytes_view in an upcoming patch, to avoid linearization in compound_type::serialize_value.	2021-01-08 14:16:07 +01:00
Michał Chojnowski	75898ee44e	bytes: implement std::hash using appending_hash This is a preparation for the upcoming introduction of managed_bytes_view, intended as a fragmented replacement for bytes_view. To ease the transition, we want both types to give equal hashes for equal contents.	2021-01-08 13:17:46 +01:00
Michał Chojnowski	4822730752	utils: mutable_view: add substr() Analogous to bytes_view::substr. This bit of functionality will be used to implement managed_bytes_mutable_view.	2021-01-08 13:17:46 +01:00
Dejan Mircevski	9eed26ca3d	cql3: Fix maps::setter_by_key for unset values Unset values for key and value were not handled. Handle them in a manner matching Cassandra. This fixes all cases in testMapWithUnsetValues, so re-enable it (and fix a comment typo in it). Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2021-01-07 13:22:20 +02:00
Dejan Mircevski	4515a49d4d	cql3: Fix `IN ?` for unset values When the right-hand side of IN is an unset value, we must report an error, like Cassandra does. This fixes testListWithUnsetValues, so re-enable it. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2021-01-07 13:22:20 +02:00
Dejan Mircevski	5bee97fa51	cql3: Fix handling of scalar unset value Make the bind() operation of the scalar marker handle the unset-value case (which it previously didn't). Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2021-01-07 13:22:20 +02:00
Dejan Mircevski	8b2f459622	cql3: Fix crash when removing unset_value from set Avoid crash described in #7740 by ignoring the update when the element-to-remove is UNSET_VALUE. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2021-01-07 13:22:20 +02:00
Pekka Enberg	e81f4caf67	Update seastar submodule * seastar a2fc9d72...56cfe179 (1): > perftune.py: Fix nic_is_bond_iface() and other function signatures Refs #6266	2021-01-07 13:22:20 +02:00
Takuya ASADA	10184ba64f	redis: implement parse error, reply error message correctly Since we haven't implemented parse error on redis protocol parser, reply message is broken at parse error. Implemented parse error, reply error message correctly. Fixes #7861 Fixes #7114 Closes #7862	2021-01-07 13:22:20 +02:00
Dejan Mircevski	176ff0238a	cql3: Fix handling of reverse-order maps When the clustering order is reversed on a map column, the column type is reversed_type_impl, not map_type_impl. Therefore, we have to check for both reversed type and map type in some places. This patch handles reverse types in enough places to make test_clustering_key_reverse_frozen_map pass. However, it leaves other places (invocations of is_map() and *_cast<map_type_impl>()) as they currently are; some are protected by callers from being invoked on reverse types, but some are quite possibly bugs untriggered by existing tests. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2021-01-07 13:22:20 +02:00
Dejan Mircevski	6bb10fcf36	cql3: Fix handling of reverse-order lists When the clustering order is reversed on a list column, the column type is reversed_type_impl, not list_type_impl. Therefore, we have to check for both reversed type and list type in some places. This patch handles reverse types in enough places to make test_clustering_key_reverse_frozen_list pass. However, it leaves other places (invocations of is_list() and *_cast<list_type_impl>()) as they currently are; some are protected by callers from being invoked on reverse types, but some are quite possibly bugs untriggered by existing tests. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2021-01-07 13:22:20 +02:00
Dejan Mircevski	14fa39cfa6	cql3: Fix handling of reverse-order sets When the clustering order is reversed on a set column, the column type is reversed_type_impl, not set_type_impl. Therefore, we have to check for both reversed type and set type in some places. To make such checks easier, add convenience methods self_or_reversed() and as() to abstract_type. Invoke those methods (instead of is_set() and casts) enough to make test_clustering_key_reverse_frozen_set pass. Leave other invocations of is_set() and *_cast<set_type_impl>() as they are; some are protected by callers from being invoked on reverse types, but some are quite possibly bugs untriggered by existing tests. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2021-01-07 13:22:20 +02:00
Raphael Carvalho	28a2aca627	Fix doc for building pkgs for a specific build mode Closes #7878	2021-01-05 18:56:21 +02:00
Tomasz Grabiec	1d717f37e2	vint-serialization: Reference the correct spec We are not using the protobol buffers format for vint. Message-Id: <1609865471-22292-1-git-send-email-tgrabiec@scylladb.com>	2021-01-05 18:54:09 +02:00
Vojtech Havel	d858c57357	cql3: allow SELECTs restricted by "IN" to retrieve collections This patch enables select cql statements where collection columns are selected columns in queries where clustering column is restricted by "IN" cql operator. Such queries are accepted by cassandra since v4.0. The internals actually provide correct support for this feature already, this patch simply removes relevant cql query check. Tests: cql-pytest (testInRestrictionWithCollection) Fixes #7743 Fixes #4251 Signed-off-by: Vojtech Havel <vojtahavel@gmail.com> Message-Id: <20210104223422.81519-1-vojtahavel@gmail.com>	2021-01-05 14:39:18 +02:00
Pekka Enberg	e54cc078a1	Update seastar submodule * seastar d1b5d41b...a2fc9d72 (6): > perftune.py: support passing multiple --nic options to tune multiple interfaces at once > perftune.py recognize and sort IRQs for Mellanox NICs > perftune.py: refactor getting of driver name into __get_driver_name() Fixes #6266 > install-dependencies: support Manjaro > append_challenged_posix_file_impl: optimize_queue: use max of sloppy_size_hint and speculative_size > future: do_until: handle exception in stop condition	2021-01-05 13:32:21 +02:00
Avi Kivity	43a2636229	Merge "Remove proxy from size-estimates reader" from Pavel E " The size_estimates_mutation_reader call for global proxy to get database from. The database is used to find keyspaces to work with. However, it's safe to keep the local database refernece on the reader itself. tests: unit(debug) " * 'br-no-proxy-in-size-estimate-reader' of https://github.com/xemul/scylla: size_estimate_reader: Use local db reference not global size_estimate_reader: Keep database reference on mutation reader size_estimate_reader: Keep database reference on virtual_reader	2021-01-05 11:28:09 +02:00
Pavel Emelyanov	9632af5d6b	schema_tables: Drop unused merge_schema overload After the `d3aa1759` one of them became unused. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210105051724.5249-1-xemul@scylladb.com>	2021-01-05 11:25:22 +02:00
Michał Chojnowski	6c97027f85	utils: fragment_range: add compare_unsigned We will use it to compare fragmented buffers (mainly managed_bytes_view in types, compound, and tests) without linearization.	2021-01-04 22:50:45 +01:00
Michał Chojnowski	2d28471a59	utils: managed_bytes: make the constructors from bytes and bytes_view explicit Conversions from views to owners have no business being implicit. Besides, they would also cause various ambiguity problems when adding managed_bytes_view.	2021-01-04 22:22:12 +01:00
Raphael S. Carvalho	d265bb9bdb	test: Add test for TWCS interposer on memtable flush Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-01-04 16:55:06 -03:00
Raphael S. Carvalho	9124a708f1	table: Wire interposer consumer for memtable flush From now on, memtable flush will use the strategy's interposer consumer iff split_during_flush is enabled (disabled by default). It has effect only for TWCS users as TWCS it's the only strategy that goes on to implement this interposer consumer, which consists of segregating data according to the window configuration. Fixes #4617. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-01-04 16:26:07 -03:00
Raphael S. Carvalho	c926a948e5	table: Add write_memtable_to_sstable variant which accepts flat_mutation_reader This new variant will be needed for interposer consumer. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-01-04 16:23:00 -03:00
Raphael S. Carvalho	32acb44fec	table: Allow sstable write permit to be shared across monitors As a preparation for interposer on flush, let's allow database write monitor to store a shared sstable write permit, which will be released as soon as any of the sstable writers reach the sealing stage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-01-04 14:46:43 -03:00
Nadav Har'El	ed31dd1742	cql-pytest: port Cassandra's unit test validation/entities/counters_test In this patch, we port validation/entities/collection_test.java, containing 7 tests for CQL counters. Happily, these tests did not uncover any bugs in Scylla and all pass on both Cassandra and Scylla. There is one small difference that I decided to ignore instead of reporting a bug. If you try a CREATE TABLE with both counter and non-counter columns, Scylla gives a ConfigurationException error, while Cassandra gives a more reasonable InvalidRequest. The ported test currently allows both. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201223181325.3148928-1-nyh@scylladb.com>	2021-01-04 18:25:48 +01:00
Nadav Har'El	05d6eff850	cql-pytest: add tests for non-support of unicode equivalence` In issue #7843 there were questions raised on how much does Scylla support the notion of Unicode Equivalence, a.k.a. Unicode normalization. Consider the Spanish letter ñ - it can be represented by a single Unicode character 00F1, but can also be represented as a 006E (lowercase "n") followed by a 0303 ("combining tilde"). Unicode specifies that these two representations should be considered "equivalent" for purposes of sorting or searching. But the following tests demonstrates that this is not, in fact, supported in Scylla or Cassandra: 1. If you use one representation as the key, then looking up the other one will not find the row. Scylla (and Cassandra) do not consider the two strings equivalent. 2. The LIKE operator (a Scylla-only extension) doesn't know that the single-character ñ begins with an n, or that the two-character ñ is just a single character. This is despite the thinking on #7843 which by using ICU in the implementation of LIKE, we somehow got support for this. We didn't. Refs #7843 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201229125330.3401954-1-nyh@scylladb.com>	2021-01-04 18:25:28 +01:00
Nadav Har'El	feb028c97e	cql-pytest: add reproducer for issue 7856 This patch adds a reproducer for issue #7856, which is about frozen sets and how we can in Scylla (but not in Cassandra), insert one in the "wrong" order, but only in very specific circumstances which this reproducer demonstrates: The bug can only be reproduced in a nested frozen collection, and using prepared statements. Refs #7856 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201231085500.3514263-1-nyh@scylladb.com>	2021-01-04 18:25:12 +01:00
Raphael S. Carvalho	738049cba2	memtable: Track min timestamp Tracking both min and max timestamp will be required for memtable flush to short-circuit interposer consumer if needed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-01-04 13:24:43 -03:00
Raphael S. Carvalho	5519fdba72	table: Extend cache update to operate a memtable split into multiple sstables This extension is needed for future work where a memtable will be segregated during flush into one sstable or more. So now multiple sstables can be added to the set after a memtable flush, and compaction is only triggered at the end. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-01-04 13:24:10 -03:00
Piotr Sarna	d5da455d95	schema_tables: describe calculate_schema_digest better - the mystical `accept_predicate` is renamed to `accept_keyspace` to be more self-descriptive - a short comment is added to the original calculate_schema_digest function header, mentioning that it computes schema digest for non-system keyspaces Refs #7854 Message-Id: <04f1435952940c64afd223bd10a315c3681b1bef.1609763443.git.sarna@scylladb.com>	2021-01-04 14:46:17 +02:00
Amos Kong	8b231a3bd9	install.sh: switch to use realpath for EnvironmentFile In scylla-jmx, we fixed a hardcode sysconfdir in EnvironmentFile path, realpath was used to convert the path. This patch changed to use realpath in scylla repo to make it consistent with scylla-jmx. Suggested-by: Pekka Enberg <penberg@scylladb.com> Signed-off-by: Amos Kong <amos@scylladb.com> Closes #7860	2021-01-04 12:45:17 +02:00
Avi Kivity	33ee07a9d8	Merge 'Skip internal distributed tables in schema_change_test' from Piotr Sarna The original idea for `schema_change_test` was to ensure that if schema hasn't changed, the digest also remained unchanged. However, a cumbersome side effect of adding an internal distributed table (or altering one) is that all digests in `schema_change_test` are immediately invalid, because the schema changed. Until now, each time a distributed system table was added/amended, a new test case for `schema_change_test` was generated, but this effort is not worth the effect - when a distributed system table is added, it will always propagate on its own, so generating a new test case does not bring any tangible new test coverage - it's just a pain. To avoid this pain, `schema_change_test` now explicitly skips all internal keyspaces - which includes internal distributed tables - when calculating schema digest. That way, patches which change the way of computing the digest itself will still require adding a new test case, which is good, but, at the same time, changes to distributed tables will not force the developers to introduce needless schema features just for the sake of this test. Tests: * unit(dev) * manual(rebasing on top of a change which adds two distributed system tables - all tests still passed) Refs #7617 Closes #7854 * github.com:scylladb/scylla: schema_change_test: skip distributed system tables in digest schema_tables: allow custom predicates in schema digest calc alternator: drop unneeded sstring creation system_keyspace: migrate helper functions to string_view database: migrate find_keyspace to string views	2021-01-04 12:44:03 +02:00
Piotr Sarna	e26aa836a9	schema_change_test: skip distributed system tables in digest With previous design of the schema change test, a regeneration was necessary each time a new distributed system table was added. It was not the original purpose of the test to keep track of new distributed tables which simply propagate on their own, so the test case is now modified: internal distributed tables are not part of the schema digest anymore, which means that changes inside them will not cause mismatches. This change involves a one-shot regeneration of all digests, which due to historical reasons included internal distributed tables in the digest, but no further regenerations should ever be necessary when a new internal distributed table is added.	2021-01-04 10:24:40 +01:00
Piotr Sarna	13a60b02ea	schema_tables: allow custom predicates in schema digest calc For testing purposes it would be useful to be able to skip computing schema for certain tables (namely, internal distributed tables). In order to allow that, a function which accepts a custom predicate is added.	2021-01-04 10:11:41 +01:00
Piotr Sarna	12b5184933	alternator: drop unneeded sstring creation It's now possible to use string views to check if a particular table is a system table, so it's no longer needed to explicitly create an sstring instance.	2021-01-04 09:47:01 +01:00
Piotr Sarna	f293c59a46	system_keyspace: migrate helper functions to string_view Functions for checking if the keyspace is system/internal were based on sstring references, which is impractical compared to string views and may lead to unnecessary creation of sstring instances.	2021-01-04 09:47:01 +01:00
Piotr Sarna	aba9772eff	database: migrate find_keyspace to string views ... in order to avoid creating unnecessary sstring instances just to compare strings.	2021-01-04 09:47:01 +01:00
Gleb Natapov	d3aa17591c	migration_manager: drop announce_locally flag It looks like the history of the flag begins in Cassandra's https://issues.apache.org/jira/browse/CASSANDRA-7327 where it is introduced to speedup tests by not needing to start the gossiper. The thing is we always start gossiper in our cql tests, so the flag only introduce noise. And, of course, since we want to move schema to use raft it goes against the nature of the raft to be able to apply modification only locally, so we better get rid of the capability ASAP. Tests: units(dev, debug) Message-Id: <20201230111101.4037543-2-gleb@scylladb.com>	2021-01-03 13:58:09 +02:00
Gleb Natapov	491f10bb70	schema-tables: make schema update global when fixing legacy SI tables When a node notice that it uses legacy SI tables it converts them to use new format, but it update only local schema. It will only cause schema discrepancy between nodes, there schema change should propagate globally. Fixes #7857. Message-Id: <20201230111101.4037543-1-gleb@scylladb.com>	2021-01-03 13:57:46 +02:00
Raphael S. Carvalho	d55d65d77c	compaction: Enable filtering reader only on behalf of cleanup compaction After `13fa2bec4c`, every compaction will be performed through a filtering reader because consumers cannot do the filtering if interposer consumer is enabled. It turns out that filtering_reader is adding significant overhead when regular compactions are running. As no other compaction type need to actually do any filtering, let's limit filtering_reader to cleanup compaction. Alternatively, we could disable interposer consumer on behalf of cleanup, or add support for the consumers to do the filtering themselves but that would add lots of complexity. Fixes #7748. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20201230194516.848347-2-raphaelsc@scylladb.com>	2021-01-03 12:02:43 +02:00
Raphael S. Carvalho	e42d277805	compaction: Drop needless partition filter for regular compaction This filter is used to discard data that doesn't belong to current shard, but scylla will only make a sstable available to regular compaction after it was resharded on either boot or refresh. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20201230194516.848347-1-raphaelsc@scylladb.com>	2021-01-03 12:02:42 +02:00
Pekka Enberg	5872b754e0	Revert "dist/docker: Remove 'epel-release' from Docker image" This reverts commit `ceb67e7728`. The "epel-release" package is needed to install the "supervisord" package, which I somehow missed in testing... Fixes #7851	2021-01-02 12:49:12 +02:00
Nadav Har'El	93a2c52338	cql-pytest: add tests for inserting rows with missing key columns This patch adds two simple tests for what happens when a user tries to insert a row with one of the key column missing. The first tests confirms that if the column is completely missing, we correctly print an error (this was issue #3665, that was already marked fixed). However, the second test demonstrates that we still have a bug when the key column appears on the command, but with a null value. In this case, instead of failing the insert (as Cassandra does), we silently ignore it. This is the proper behavior for UNSET_VALUE, but not for null. So the second test is marked xfail, and I opened issue #7852 about it. Refs #3665 Refs #7852 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201230132350.3463906-1-nyh@scylladb.com>	2020-12-30 18:20:01 +01:00
Nadav Har'El	10fbef5bff	cql-pytest: clean up test_using_timeout.py In a previous version of test_using_timeout.py, we had tables pre-filled with some content labled "everything". The current version of the tests don't use it, so drop it completely. One test, test_per_query_timeout_large_enough, still had code that did res = list(cql.execute(f"SELECT * FROM {table} USING TIMEOUT 24h")) assert res == everything this was a bug - it only works as expected if this test is run before anything other test is run, and will fail if we ever reorder or parallelize these tests. So drop these two lines. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201229145435.3421185-1-nyh@scylladb.com>	2020-12-30 09:16:25 +01:00
Nadav Har'El	5f24ff9187	Merge 'Coroutinize alternator tagging requests' from Piotr Sarna This miniseries rewrites two alternator request handlers from seastar threads to coroutines - since these handlers are not on a hot path and using seastar threads is way too heavy for such a simple routine. NOTE: this pull request obviously has to wait until coroutines are fully supported in Seastar/Scylla. Closes #7453 * github.com:scylladb/scylla: alternator: coroutinize untagging a resource alternator: coroutinize tagging a resource	2020-12-29 23:36:25 +02:00
Avi Kivity	700ddd1914	Merge 'scylla_setup: enable node_exporter for offline installation' from Amos Kong node_exporter had been added to scylla-server package by commit `95197a09c9`. So we can enable it by default for offline installation. Closes #7832 * github.com:scylladb/scylla: scylla_setup: cleanup if judgments scylla_setup: enable node_exporter for offline installation	2020-12-28 22:07:36 +02:00
Avi Kivity	1716359455	Update tools/jmx submodule * tools/jmx 20469bf...2c95650 (1): > install.sh: set a valid WorkingDirectory for nonroot offline install	2020-12-28 21:19:04 +02:00
Avi Kivity	f7b731bc46	Merge 'Fix potential reactor stall on LCS compaction completion' from Raphael Carvalho On every compaction completion, sstable set is rebuilt from scratch. With LCS and ~160G of data per shard, it means we'll have to create a new sstable set with ~1000 entries whenever compaction completes, which will likely result in reactor stalling for a significant amount of time. Fixes #7758. Closes #7842 * github.com:scylladb/scylla: table: Fix potential reactor stall on LCS compaction completion table: decouple preparation from execution when updating sstable set table: change rebuild_sstable_list to return new sstable set row_cache: allow external updater to decouple preparation from execution	2020-12-28 21:16:17 +02:00
Pavel Emelyanov	7ac435f67c	test: Enhance test for range_tombstone_list de-overlapping The range_tombstone_list always (unless misused?) contains de-overlapped entries. There's a test_add_random that checks this, but it suffers from several problems: - generated "random" ranges are sequential and may only overlap on their borders - test uses the keys of the same prefix length Enhance the generator part to produce a purely random sequence of ranges with bound keys of arbitrary length. Just pay attention to generate the "valid" individual ranges, whose start is not ahead of the end. Also -- rename the test to reflect what it's doing and increase the number of iterations. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20201228115525.20327-1-xemul@scylladb.com>	2020-12-28 18:26:48 +02:00
Raphael S. Carvalho	8dd7280107	table: Fix potential reactor stall on LCS compaction completion On every compaction completion, sstable set is rebuilt from scratch. With LCS and ~160G of data per shard, it means we'll have to create a new sstable set with ~1000 entries whenever compaction completes, which will likely result in reactor stalling for a significant amount of time. This is fixed by futurizing build_new_sstable_list(), so it will yield whenever needed. Fixes #7758. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-12-28 13:17:50 -03:00
Raphael S. Carvalho	6082da4703	table: decouple preparation from execution when updating sstable set row cache now allows updater to first prepare the work, and then execute the update atomically as the last step. let's do that when rebuilding the set, so now new set is created in the preparation phase, and the new set replaces the old one in the execution phase, satisfying the atomicity requirement of row cache. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-12-28 13:17:48 -03:00
Raphael S. Carvalho	43f0200b8f	table: change rebuild_sstable_list to return new sstable set procedure is changed to return the new set, so caller will be responsible for replacing the old set with the new one. this will allow our future work where building new set and enabling it will be decoupled. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-12-28 13:17:47 -03:00
Raphael S. Carvalho	198b87503f	row_cache: allow external updater to decouple preparation from execution External updater may do some preparatory work like constructing a new sstable list, and at the end atomically replace the old list by the new one. Decoupling the preparation from execution will give us the following benefits: - the preparation step can now yield if needed to avoid reactor stalls, as it's been futurized. - the execution step will now be able to provide strong exception guarantees, as it's now decoupled from the preparation step which can be non-exception-safe. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-12-28 13:17:45 -03:00
Avi Kivity	3325960486	Update seastar submodule * seastar 1f5e3d3419...d1b5d41b6d (1): > append_challenged_posix_file_impl: adjust sloppy_size only in optimize_queue Fixes #7439 (the coredump part).	2020-12-28 13:00:04 +02:00
Nadav Har'El	7eda6b1e90	cql-pytest: increase default request timeout The CQL tests in test/cql-pytest use the Python CQL driver's default timeout for execute(), which is 10 seconds. This usually more than enough. However, in extreme cases like noted in issue #7838, 10 seconds may not be enough. In that issue, we run a very slow debug build on a very slow test machine, and encounter a very slow request (a DROP KEYSPACE that needs to drop multiple tables). So this patch increases the default timeout to an even larger 120 seconds. We don't care that this timeout is ridiculously large - under normal operations it will never be reached, there is no code which loops for this amount of time for example. Tested that this patch fixes #7838 by choosing a much lower timeout (1 second) and reproducing test failures caused by timeouts. Fixes #7838. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201228090847.3234862-1-nyh@scylladb.com>	2020-12-28 11:19:37 +02:00
Amos Kong	8723b0ce86	leveled_compaction_strategy: fix boundary of maximum sstable level The MAX_LEVELS is the levels count, but sstable level (index) starts from 0. So the maximum and valid level is MAX_LEVELS - 1. Signed-off-by: Amos Kong <amos@scylladb.com> Closes #7833	2020-12-27 18:59:54 +02:00
Benny Halevy	8a745a0ee0	compaction: compaction_writer: destroy shared_sstable after the sstable_writer sstable_writer may depend on the sstable throughout its whole lifecycle. If the sstable is freed before the sstable_writer we might hit use-after-free as in the follwing case: ``` std::_Deque_iterator<sstables::compression::segmented_offsets::bucket, sstables::compression::segmented_offsets::bucket&, sstables::compression::segmented_offsets::bucket>::operator+=(long) at /usr/include/c++/10/bits/stl_deque.h:240 (inlined by) std::operator+(std::_Deque_iterator<sstables::compression::segmented_offsets::bucket, sstables::compression::segmented_offsets::bucket&, sstables::compression::segmented_offsets::bucket> const&, long) at /usr/include/c++/10/bits/stl_deque.h:378 (inlined by) std::_Deque_iterator<sstables::compression::segmented_offsets::bucket, sstables::compression::segmented_offsets::bucket&, sstables::compression::segmented_offsets::bucket>::operator[](long) const at /usr/include/c++/10/bits/stl_deque.h:252 (inlined by) std::deque<sstables::compression::segmented_offsets::bucket, std::allocator<sstables::compression::segmented_offsets::bucket> >::operator[](unsigned long) at /usr/include/c++/10/bits/stl_deque.h:1327 (inlined by) sstables::compression::segmented_offsets::push_back(unsigned long, sstables::compression::segmented_offsets::state&) at ./sstables/compress.cc:214 sstables::compression::segmented_offsets::writer::push_back(unsigned long) at ./sstables/compress.hh:123 (inlined by) compressed_file_data_sink_impl<crc32_utils, (compressed_checksum_mode)1>::put(seastar::temporary_buffer<char>) at ./sstables/compress.cc:519 seastar::output_stream<char>::put(seastar::temporary_buffer<char>) at table.cc:? (inlined by) seastar::output_stream<char>::put(seastar::temporary_buffer<char>) at ././seastar/include/seastar/core/iostream-impl.hh:432 seastar::output_stream<char>::flush() at table.cc:? seastar::output_stream<char>::close() at table.cc:? sstables::file_writer::close() at sstables.cc:? sstables::mc::writer::~writer() at writer.cc:? (inlined by) sstables::mc::writer::~writer() at ./sstables/mx/writer.cc:790 sstables::mc::writer::~writer() at writer.cc:? flat_mutation_reader::impl::consumer_adapter<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> > >::~consumer_adapter() at compaction.cc:? (inlined by) std::_Optional_payload_base<sstables::compaction_writer>::_M_destroy() at /usr/include/c++/10/optional:260 (inlined by) std::_Optional_payload_base<sstables::compaction_writer>::_M_reset() at /usr/include/c++/10/optional:280 (inlined by) std::_Optional_payload<sstables::compaction_writer, false, false, false>::~_Optional_payload() at /usr/include/c++/10/optional:401 (inlined by) std::_Optional_base<sstables::compaction_writer, false, false>::~_Optional_base() at /usr/include/c++/10/optional:474 (inlined by) std::optional<sstables::compaction_writer>::~optional() at /usr/include/c++/10/optional:659 (inlined by) sstables::compacting_sstable_writer::~compacting_sstable_writer() at ./sstables/compaction.cc:229 (inlined by) compact_mutation<(emit_only_live_rows)0, (compact_for_sstables)1, sstables::compacting_sstable_writer, noop_compacted_fragments_consumer>::~compact_mutation() at ././mutation_compactor.hh:468 (inlined by) compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer>::~compact_for_compaction() at ././mutation_compactor.hh:538 (inlined by) std::default_delete<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >::operator()(compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer>) const at /usr/include/c++/10/bits/unique_ptr.h:85 (inlined by) std::unique_ptr<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer>, std::default_delete<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> > >::~unique_ptr() at /usr/include/c++/10/bits/unique_ptr.h:361 (inlined by) stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >::~stable_flattened_mutations_consumer() at ././mutation_reader.hh:342 (inlined by) flat_mutation_reader::impl::consumer_adapter<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> > >::~consumer_adapter() at ././flat_mutation_reader.hh:201 auto flat_mutation_reader::impl::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, flat_mutation_reader::no_filter>(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, flat_mutation_reader::no_filter, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ././flat_mutation_reader.hh:272 (inlined by) auto flat_mutation_reader::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, flat_mutation_reader::no_filter>(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, flat_mutation_reader::no_filter, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ././flat_mutation_reader.hh:383 (inlined by) auto flat_mutation_reader::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> > >(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ././flat_mutation_reader.hh:389 (inlined by) seastar::future<void> sstables::compaction::setup<noop_compacted_fragments_consumer>(noop_compacted_fragments_consumer)::{lambda(flat_mutation_reader)#1}::operator()(flat_mutation_reader)::{lambda()#1}::operator()() at ./sstables/compaction.cc:612 ``` What happens here is that: compressed_file_data_sink_impl(output_stream<char> out, sstables::compression* cm, sstables::local_compression lc) : _out(std::move(out)) , _compression_metadata(cm) , _offsets(_compression_metadata->offsets.get_writer()) , _compression(lc) , _full_checksum(ChecksumType::init_checksum()) _compression_metadata points to a buffer held by the sstable object. and _compression_metadata->offsets.get_writer returns a writer that keeps a reference to the segmented_offsets in the sstables::compression that is used in the ~writer -> close path. Fixes #7821 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201227145726.33319-1-bhalevy@scylladb.com>	2020-12-27 17:02:13 +02:00
Pavel Emelyanov	387889315e	mutation-partition: Relax putting a dummy entry into a continuous range When applying a mutation partition to another if a dummy entry from the source falls into a destination continuous range, it can be just dropped. However, current implementation still inserts it and then instantly removes. Relax this code-flow by dropping the unwanted entry without tossing it. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20201224130438.11389-1-xemul@scylladb.com>	2020-12-27 14:47:32 +02:00
Amos Kong	9adc6f68ee	scylla_setup: cleanup if judgments This patch merged two nested if judgments. Signed-off-by: Amos Kong <amos@scylladb.com>	2020-12-26 04:45:25 +08:00
Amos Kong	632b01ce4e	scylla_setup: enable node_exporter for offline installation node_exporter had been added to scylla-server package by commit `95197a09c9`. So we can enable it by default for offline installation. Signed-off-by: Amos Kong <amos@scylladb.com>	2020-12-25 10:54:31 +08:00
Pavel Emelyanov	72c2482f73	mutation-partition: Construct rows_entry directly from clustering_row When a rows_entry is added to row_cache it's constructed from clustering_row by unpacking all its internals and putting them into the rows_entry's deletable_row. There's a shorter way -- the clustering_row already has the deletale_row onboard from which rows_entry can copy-construct its. This lets keeping the rows_entry and deletable_row set of constructors a bit shorter. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20201224161112.20394-1-xemul@scylladb.com>	2020-12-24 18:13:44 +02:00
Avi Kivity	8f06a687b4	Merge "idl: minor improvements to idl compiler" from Pavel S " This series does a lot of cleanups, dead code removal, and most importantly fixes the following things in IDL compiler tool: * The grammar now rejects invalid identifiers, which, in some cases, allowed to write things like `std:vector`. * Error reporting is improved significantly and failures are now pointing to the place of failure much more accurately. This is done by restricting rule backtracing on those rules which don't need it. " * 'idl-compiler-minor-fixes-v4' of https://github.com/ManManson/scylla: idl: move enum and class serializer code writers to the corresponding AST classes idl: extract writer functions for `write`, `read` and `skip` impls for classes and enums idl: minor fixes and code simplification idl: change argument name from `hout` to `cout` in all dependencies of `add_visitors` fn idl: fix parsing of basic types and discard unneeded terminals idl: remove unused functions idl: improve error tracing in the grammar and tighten-up some grammar rules idl: remove redundant `set_namespace` function idl: remove unused `declare_class` function idl: slightly change `str` and `repr` for AST types idl: place directly executed init code into if __name__=="__main__"	2020-12-24 15:14:09 +02:00
Takuya ASADA	95197a09c9	dist: add node_exporter to scylla-server package To connection-less environment, we need to add node_exporter binary to scylla-server package, not downloading it from internet. Related #7765 Fixes #2190 Closes #7796	2020-12-24 11:44:13 +02:00
Pavel Solodovnikov	219ac2bab5	large_data_handler: fix segmentation fault when constructing `data_value` from a `nullptr` It turns out that `cql_table_large_data_handler::record_large_rows` and `cql_table_large_data_handler::record_large_cells` were broken for reporting static cells and static rows from the very beginning: In case a large static cell or a large static row is encountered, it tries to execute `db::try_record` with `nullptr` additional values, denoting that there is no clustering key to be recorded. These values are next passed to `qctx.execute_cql()`, which creates `data_value` instances for each statement parameter, hence invoking `data_value(nullptr)`. This uses `const char*` overload which delegates to `std::string_view` ctor overload. It is UB to pass `nullptr` pointer to `std::string_view` ctor. Hence leading to segmentation faults in the aforementioned large data reporting code. What we want here is to make a null `data_value` instead, so just add an overload specifically for `std::nullptr_t`, which will create a null `data_value` with `text` type. A regression test is provided for the issue (written in `cql-pytest` framework). Tests: test/cql-pytest/test_large_cells_rows.py Fixes: #6780 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20201223204552.61081-1-pa.solodovnikov@scylladb.com>	2020-12-24 11:37:43 +02:00
Nadav Har'El	79faaa34c7	alternator test: confirm that list index can't be a reference In Alternator's expression parser in alternator/expressions.g, a list can be indexed by a '[' INTEGER ']'. I had doubts whether maybe a value-reference for the index, e.g., "something[:xyz]", should also work. So this patch adds a test that checks whether "something[:xyz]" works, and confirms that both DynamoDB and Alternator don't accept it and consider it a syntax error. So Alternator's parser is correct to insist that the index be a literal integer. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201214100302.2807647-1-nyh@scylladb.com>	2020-12-24 11:37:29 +02:00
Piotr Sarna	b62457d5b0	test: add verification to using timeout prepared statements Previously the test cases only verified that the queries did not time out with sufficiently large timeout, but now they also check that appropriate data is inserted and can be read. Message-Id: <8bc979434fce977c30d8516dc82789d4fe317696.1608734455.git.sarna@scylladb.com>	2020-12-24 11:37:29 +02:00
Piotr Sarna	1577e6f632	test: add cases for using timeout with batches The test suite for USING TIMEOUT already included SELECT, INSERT and UPDATE statements, but missed batches. The suite is now updated to include batch tests. Tests: unit(dev) Message-Id: <a6738d2ed3d62681615523d01109362766c90325.1608734455.git.sarna@scylladb.com>	2020-12-24 11:37:29 +02:00
Piotr Sarna	4eb41b7d56	test: use random keys in tests for USING TIMEOUT Since the tables are written to and it's possible to run mutliple test cases concurrently, the cases now use pseudorandom keys instead of hardcoded values. Message-Id: <d864dbb096360c17cdc2ebd8e79bfd983c19910e.1608734455.git.sarna@scylladb.com>	2020-12-24 11:37:29 +02:00
Avi Kivity	0bbd78037f	Update seastar submodule * seastar 2bd8c8d088...1f5e3d3419 (5): > Merge "Avoid fair-queue rovers overflow if not configured" from Pavel E > doc: add a coroutines section to the tutorial > Merge "tests/perf: add random-seed config option" from Benny > iotune: Print parameters affecting the measurement results > cook: Add patch cmd for ragel build (signed char confusion on aarch64)	2020-12-24 11:37:29 +02:00
Piotr Sarna	3b26fc01c2	alternator: coroutinize untagging a resource Historically, a seastar thread was used for this request because it's not on a critical path, but a coroutine makes the code simpler.	2020-12-23 15:53:57 +01:00
Piotr Sarna	1ca39cc8c1	alternator: coroutinize tagging a resource Historically, a seastar thread was used for this request because it's not on a critical path, but a coroutine makes the code simpler.	2020-12-23 15:53:57 +01:00
Pavel Solodovnikov	3a91f1127d	idl: move enum and class serializer code writers to the corresponding AST classes Expand the role of AST classes to also supply methods for actually generating the code. More changes will follow eventually until all generation code is handled by these classes. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-22 23:23:12 +03:00
dgarcia360	fd5f0c3034	docs: add organization Closes #7818	2020-12-22 15:33:31 +02:00
Pekka Enberg	ceb67e7728	dist/docker: Remove 'epel-release' from Docker image We no longer need the 'epel-release' package for anything as our scylla-server package bundles all the necessary dependencies. Closes #7823	2020-12-22 14:55:17 +02:00
Avi Kivity	e2dfa24540	Merge "token_metadata: add clear_gently" from Benny " We've encountered a number of reactor stalls related to token_metadata that were fixed in `052a8d036d`. This is a follow-up series that adds a clear_gently method to token_metadata that uses continuations to prevent reactor stalls when destroying token_metadata objects. Test: unit(dev), {network_topology_strategy,storage_proxy}_test(debug) " * tag 'token_metadata_clear_gently-v3' of github.com:bhalevy/scylla: token_metadata: add clear_gently token_metadata: shared_token_metadata: add mutate_token_metadata token_metdata: futurize update_normal_tokens abstract_replication_strategy: get_pending_address_ranges: invoke clone_only_token_map if can_yield repair: replace_with_repair: convert to coroutine	2020-12-22 13:23:31 +02:00
Nadav Har'El	f2978e1873	cql-pytest: port Cassandra's collection_test.py A previous patch added test/cql-pytest/cassandra_tests - a framework for porting Cassandra's unit tests to Python - but only ported two tiny test files with just 3 tests. In this patch, we finally port a much larger test file validation/entities/collection_test.java. This file includes 50 separate tests, which cover a lot of aspects of collection support, as well as how other stuff interact with collections. As of now, 23 (!) of these 50 tests fail, and exposed six new issues in Scylla which I carefully documented: Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax Refs #7740: CQL prepared statements incomplete support for "unset" values Refs #7743: Restrictions missing support for "IN" on tables with collections, added in Cassandra 4.0 Refs #7745: Length of map keys and set items are incorrectly limited to 64K in unprepared CQL Refs #7747: Handling of multiple list updates in a single request differs from recent Cassandra Refs #7751: Allow selecting map values and set elements, like in Cassandra 4.0 These issues vary in severity - some are simply new Cassandra 4.0 features that Scylla never implemented, but one (#7740) is an old Cassandra 2.2 feature which it seems we did not implement correctly in some cases that involve collections. Note that there are some things that the ported tests do not include. In a handful of places there are things which the Python driver checks, before sending a request - not giving us an opportunity to check how the server handles such errors. Another notable change in this port is that the original tests repeated a lot of tests with and without a "nodetool flush". In this port I chose to stub the flush() function - it does NOT flush. I think the point of these tests is to check the correctness of the CQL features - not to verify that memtable flush works correctly. Doing a real memtable flush is not only slow, it also doesn't really check much (Scylla may still serve data from cache, not sstables). So I decided it is pointless. An important goal of this patch is that all 50 tests (except three skipped tests because Python has client-side checking), pass when run on Cassandra (with test/cql-pytest/run-cassandra). This is very important: It was very easy to make mistakes while porting the tests, and I did make many such mistakes; But running the against Cassandra allowed me to fix those mistakes - because the correct tests should pass on Cassandra. And now they do. Unfortunately, the new tests are significantly slower than what we've been accustomed in Alternator/CQL tests. The 50 tests create more than a hundred tables, udfs, udts, and similar slow operations - they do not reuse anything via fixtures. The total time for these 50 tests (in dev build mode) is around 18 seconds. Just one test - testMapWithLargePartition is responsibe for almost half (!) of that time - we should consider in the future whether it's worth it or can be made smaller. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201215155802.2867386-1-nyh@scylladb.com>	2020-12-22 13:22:09 +02:00
Avi Kivity	5a33ce58a7	Update seastar submodule * seastar 3b8903d406...2bd8c8d088 (8): > core: remove unused chrono.h reference > cmake: force cxx standard if dialect is specified > queue: add front() > coroutine: deprecate coroutine forwarding > memory: Use 2^n sizes when searching for preferred span size > shared_ptr: define debug_shared_ptr_counter_type constructor as noexcept > install-dependencies: add pkg-config to Debian/Ubuntu packages > log: do_log: prevent garbling due to context switch	2020-12-22 13:22:09 +02:00
Benny Halevy	322aa2f8b5	token_metadata: add clear_gently clear_gently gently clears the token_metadata members. It uses continuations to allow yielding if needed to prevent reactor stalls. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-22 11:22:21 +02:00
Benny Halevy	56aa49ca81	token_metadata: shared_token_metadata: add mutate_token_metadata mutate_token_metadata acquires the shared_token_metadata lock, clones the token_metadata (using clone_async) and calls an asynchronous functor on the cloned copy of the token_metadata to mutate it. If the functor is successful, the mutated clone is set back to to the shared_token_metadata, otherwise, the clone is destroyed. With that, get rid of shared_token_metadata::clone Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-22 11:22:19 +02:00
Benny Halevy	e089c22ec1	token_metdata: futurize update_normal_tokens The function complexity if O(#tokens) in the worst case as for each endpoint token to traverses _token_to_endpoint_map lineraly to erase the endpoint mapping if it exists. This change renames the current implementation of update_normal_tokens to update_normal_tokens_sync and clones the code as a coroutine that returns a future and may yield if needed. Eventually we should futurize the whole token_metadata and abstract_replication_strategy interface and get rid of the synchronous functions. Until then the sync version is still required from call sites that are neither returning a future nor run in a seastar thread. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-22 10:35:15 +02:00
Benny Halevy	e7f4cd89a9	abstract_replication_strategy: get_pending_address_ranges: invoke clone_only_token_map if can_yield Optimize the can_yield case by invoking the futurized version of clone_only_token_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-22 09:49:08 +02:00
Benny Halevy	55316df6bf	repair: replace_with_repair: convert to coroutine Prepare to futurizing update_normal_tokens. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-22 09:49:08 +02:00
Piotr Sarna	da7e87dc56	test: add cases for using timeout with bind markers The test suite for USING TIMEOUT already included binding the timeout value, but only for wildcard (?). The test case is now extended with named bind markers. Tests: unit(dev) Message-Id: <b5344f40d26d90b36e90a04c2474127728535eaa.1608573624.git.sarna@scylladb.com>	2020-12-22 09:03:56 +02:00
Pekka Enberg	961b9e8390	install.sh: Add seastar-cpu-map.sh to $PATH Add the seastar-cpu-map.sh to the SBINFILES variable, which is used to create symbolic links to scripts so that they appear in $PATH. Please note that there are additional Python scripts (like perftune.py), which are not in $PATH. That's because Python scripts are handled separately in "install.sh" and no Python script has a "sbin" symlink. We might want to change this in the future, though. Fixes #6731 Closes #7809	2020-12-21 14:12:27 +02:00
Avi Kivity	0f7b6dd180	utils: managed_bytes: introduce with_linearized() This is a temporary scaffold for weaning ourselves off linearization. It differs from with_linearized_managed_bytes in that it does not rely on the environment (linearization_context) and so is easier to remove.	2020-12-20 15:14:44 +01:00
Avi Kivity	c37e495958	utils: managed_bytes: constrain with_linearized_managed_bytes() The passed function must be called with a no parameters; document and enforce it.	2020-12-20 15:14:44 +01:00
Avi Kivity	a1df1b3c34	utils: managed_bytes: avoid internal uses of managed_bytes::data() We use managed_bytes::data() in a few places when we know the data is non-fragmented (such as when the small buffer optimization is in use). We'd like to remove managed_bytes::data() as linearization is bad, so in preparation for that, replace internal uses of data() with the equivalent direct access.	2020-12-20 15:14:44 +01:00
Avi Kivity	72a2554a86	utils: managed_bytes: extract do_linearize_pure() do_linearize() is an impure function as it changes state in linearization_context. Extract the pure parts into a new do_linearize_pure(). This will be used to linearize managed_bytes without a linearization_context, during the transition period where fragmented and non-fragmented values coexist.	2020-12-20 15:14:44 +01:00
Avi Kivity	4b3f0fd7c0	thrift: do not depend on implicit conversion of keys to bytes_view This implicit conversion will soon be gone, as it is dangerous. Ask for the representation explicitly.	2020-12-20 15:14:44 +01:00
Avi Kivity	8521248955	clustering_bounds_comparator: do not depend on implicit conversion of keys to bytes_view This implicit conversion will soon be gone, as it is dangerous. Ask for the representation explicitly.	2020-12-20 15:14:44 +01:00
Avi Kivity	1dd6d7029a	cql3: expression: linearize get_value_from_mutation() eariler do_get_value() is careful to return a fragmented view, but its only caller get_value_from_mutation() linearizes it immediately afterwards. Linearize it sooner; this prevents mixing in fragmented values from cells (now via IMR) and fragmented values from partition/clustering keys. It only works now because keys are not fragmented outside LSA, and value_view has a special case for single-fragment values. This helps when keys become fragmented.	2020-12-20 15:14:44 +01:00
Avi Kivity	b59a21967c	bytes: add to_bytes(bytes) Converting from bytes to bytes is nonsensical, but it helps when transitioning to other types (managed_bytes/managed_bytes_view), and these types will have to_bytes() conversions.	2020-12-20 15:14:44 +01:00
Avi Kivity	28126257c2	cql3: expression: mark do_get_value() as static It is used only later in this file.	2020-12-20 15:14:44 +01:00
Avi Kivity	b3e39d81aa	Merge 'Avoid scanning sstables in parallel for TWCS single-partition queries' from Kamil Braun We introduce a new single-key sstable reader for sstables created by `TimeWindowCompactionStrategy`. The reader uses the fact that sstables created by TWCS are mostly disjoint with respect to the contained `position_in_partition`s in order to avoid having multiple sstable readers opened at the same time unnecessarily. In case there are overlapping ranges (for example, in the current time-window), it performs the necessary merging (it uses `clustering_order_reader_merger`, introduced recently). The reader uses min/max clustering key metadata present in `md` sstables in order to decide when to open or close a sstable reader. The following experiment was performed: 1. create a TWCS table with 1 minute windows 2. fill the table with 8 equal windows of data (each window flushed to a separate sstable) 3. perform `select * from ks.t where pk = 0 limit 1` query with and without the change The expectation is that with the commit, only one sstable will be opened to fetch that one row; without the commit all 8 sstables would be opened at once. The difference in the value of `scylla_reactor_aio_bytes_read` was measured (value after the query minus value before the query), both with and without the commit. With the commit, the difference was 67584. Without the commit, the difference was 528384. 528384 / 67584 ~= 7.8. Fixes #6418. Closes #7437 * github.com:scylladb/scylla: sstables: gather clustering key filtering statistics in TWCS single key reader sstables: use time_series_sstable_set in time_window_compaction_strategy sstable_set: new reader for TWCS single partition queries mutation_reader_test: test clustering_order_reader_merger with time_series_sstable_set sstable_set: introduce min_position_reader_queue sstable_set: introduce time_series_sstable_set sstables: add min_position and max_position accessors sstable_set: make create_single_key_sstable_reader a virtual method clustering_order_reader_merger: fix the 0 readers case	2020-12-19 23:53:18 +02:00
Kamil Braun	53414558a1	sstables: gather clustering key filtering statistics in TWCS single key reader	2020-12-18 16:33:27 +01:00
Kamil Braun	4f2d45001c	sstables: use time_series_sstable_set in time_window_compaction_strategy The following experiment was performed: 1. create a TWCS table with 1 minute windows 2. fill the table with 8 windows of data (each window flushed to a separate sstable) 3. perform `select * from ks.t where pk = 0 limit 1` query with and without the change The expectation is that with the commit, only one sstable will be opened to fetch that one row; without the commit all 8 sstables would be opened at once. The difference in the value of `scylla_reactor_aio_bytes_read` was measured (value after the query minus value before the query), both with and without the commit. With the commit, the difference was 67584. Without the commit, the difference was 528384. 528384 / 67584 ~= 7.8. Fixes https://github.com/scylladb/scylla/issues/6418.	2020-12-18 16:33:27 +01:00
Kamil Braun	f0842ba34e	sstable_set: new reader for TWCS single partition queries This commit introduces a new implementation of `create_single_key_sstable_reader` in `time_series_sstable_set` dedicated for TWCS-created sstables. It uses the fact that such sstables are mostly disjoint with respect to contained `position_in_partition`s in order to decrease the number of sstable readers that are opened at the same time. The implementation uses `clustering_order_reader_merger` under the hood. The reader assumes that the schema does not have static columns and none of the queried sstable contain partition tombstones; also, it assumes that the sstables have the min/max clustering key metadata in order for the implementation to be efficient. Thus, if we detect that some of these assumptions aren't true, we fall back to the old implementation.	2020-12-18 16:33:27 +01:00
Kamil Braun	b41139a07f	mutation_reader_test: test clustering_order_reader_merger with time_series_sstable_set	2020-12-18 16:33:27 +01:00
Kamil Braun	d0548aa77f	sstable_set: introduce min_position_reader_queue This is a queue of readers of sstables in a time_series_sstable_set, returning the readers in order of the smallest position_in_partition that the sstables have. It uses the min/max clustering key sstable metadata. The readers are opened lazily, at the moment of being returned.	2020-12-18 16:33:27 +01:00
Kamil Braun	52697022b0	sstable_set: introduce time_series_sstable_set At this moment it is a slightly less efficient version of bag_sstable_set, but in following commits we will use the new data structures to gain advantage in single partition queries for sstables created by TimeWindowCompactionStrategy.	2020-12-18 16:33:27 +01:00
Kamil Braun	2a160dd909	sstables: add min_position and max_position accessors The methods return a lower-bound and an upper-bound for the position-in-partitions appearing in a given sstable.	2020-12-18 16:33:27 +01:00
Kamil Braun	fe26da82ba	sstable_set: make create_single_key_sstable_reader a virtual method ... of sstable_set_impl. Soon we shall provide a specialized implementation in one of the `sstable_set_impl` derived classes. The existing implementation is used as the default one.	2020-12-18 12:31:16 +01:00
Kamil Braun	5e846b33b8	clustering_order_reader_merger: fix the 0 readers case With 0 readers the merger would produce a `partition_end` fragment when it should immediately return `end_of_stream` instead.	2020-12-18 12:30:40 +01:00
Gleb Natapov	85cffd1aeb	lwt: rewrite storage_proxy::cas using coroutings Makes code much simpler to understand. Message-Id: <20201201160213.GW1655743@scylladb.com>	2020-12-17 18:15:35 +01:00
Avi Kivity	a60c81b615	Merge 'cql3: Fix handling of impossible restrictions on a primary-key column' from Dejan Mircevski There were two problems with handling conflicting equalities on the same PK column (eg, c=1 AND c=0): 1. When the column is indexed, Scylla crashed (#7772) 2. Computing ranges and slices was throwing an exception This series fixes them both; it also happens to resolve some old TODOs from restriction_test. Tests: unit (dev, debug) Closes #7804 * github.com:scylladb/scylla: cql3: Fix value_for when restriction is impossible cql3: Fix range computation for p=1 AND p=1	2020-12-17 12:01:36 +02:00
Dejan Mircevski	46b4b59945	cql3: Fix value_for when restriction is impossible Previously, single_column_restrictions::value_for() assumed that a column's restriction specifies exactly one value for the column. But since `37ebe521e3`, multiple equalities on the same column are allowed, so the restriction could be a conjunction of conflicting equalities (eg, c=1 AND c=0). That violates an assert and crashes Scylla. This patch fixes value_for() by gracefully handling the impossible-restriction case. Fixes #7772 Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-12-16 15:00:29 -05:00
Dejan Mircevski	4bb1107652	cql3: Fix range computation for p=1 AND p=1 Previously compute_bounds was assuming that primary-key columns are restricted by exactly one equality, resulting in the following error: query 'select p from t where p=1 and p=1' failed: std::bad_variant_access (std::get: wrong index for variant) This patch removes that assumption and deals correctly with the multiple-equalities case. As a byproduct, it also stops raising "invalid null value" exceptions for null RHS values. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-12-16 14:46:48 -05:00
Pavel Solodovnikov	edf9ccee48	idl: extract writer functions for `write`, `read` and `skip` impls for classes and enums Split `write`, `read` and `skip` serializer function writers to separate functions in `handle_class` and `handle_enum` functions, which slightly improves readability. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-16 20:33:55 +03:00
Pavel Solodovnikov	8049cb0f91	idl: minor fixes and code simplification * Introduce `ns_qualified_name` and `template_params_str` functions to simplify code a little bit in `handle_enum` and `handle_class` functions. * Previously each serializer had a separate namespace open-close statements, unify them into a single namespace scope. * Fix a few more `hout` -> `cout` argument names. * Rename `template` pattern to `template_decl` to improve clarity. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-16 19:32:08 +03:00
Pavel Solodovnikov	0de96426db	idl: change argument name from `hout` to `cout` in all dependencies of `add_visitors` fn Prior to the patch all functions that are called from `add_visitors` and this function itself declared the argument denoting the output file as `hout`. Though, this was quite misleading since `hout` is meant to be header file with declarations, while `cout` is an implementation file. These functions write to implmentation file hence `hout` should be changed to `cout` to avoid confusion. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-16 19:32:03 +03:00
Pavel Solodovnikov	0defb52855	idl: fix parsing of basic types and discard unneeded terminals Prior to the patch `btype` production was using `with_colon` rule, which accidentally supported parsing both numbers and identifiers (along with other invalid inputs, such as "123asd"). It was changed to use `ns_qualified_ident` and those places which can accept numeric constants, are explicitly listing it as an alternative, e.g. template parameter list. Unfortunately, I had to make TemplateType to explicitly construct `BasicType` instances from numeric constants in template arguments list. This is exactly the way it was handled before, though. But nonetheless, this should be addressed sometime later. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-16 19:31:57 +03:00
Pavel Solodovnikov	0cc87ead3d	idl: remove unused functions Remove the following functions since they are not used: * `open_namespaces` * `close_namespaces` * `flat_template` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-16 19:31:51 +03:00
Pavel Solodovnikov	bea965a0a7	idl: improve error tracing in the grammar and tighten-up some grammar rules This patch replaces use of some handwritten rules to use their alternatives already defined in `pyparsing.pyparsing_common` class, i.e.: `number`, `identifier` productions. Changed ignore patterns for comments to use pre-defined `pp.cppStyleComment` instead of hand-written combination of '//'-style and C-style comment rules. Operator '-' is now used whenever possible to improve debugging experience: it disables default backtracking for productions so that compiler fails earlier and can now point more precisely to a place in the input string where it failed instead of backtracking to the top-level rule and reporting error there. Template names and class names now use `ns_qualified_ident` rule instead of `with_colon` which prevents grammar from matching invalid identifiers, such as `std:vector`. Many places are using the updated `identifier` production, which is working correctly unlike its predecessor: now inputs such as `1ident` are considered invalid. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-16 19:31:46 +03:00
Pavel Solodovnikov	3a037bc5b6	idl: remove redundant `set_namespace` function Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-16 19:31:40 +03:00
Pavel Solodovnikov	e76e8aec0e	idl: remove unused `declare_class` function Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-16 19:31:35 +03:00
Pavel Solodovnikov	745f4ac23b	idl: slightly change `str` and `repr` for AST types Surround string representation with angle brackets. This improves readability when printing debug output. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-16 19:31:20 +03:00
Pavel Solodovnikov	4a61270701	idl: place directly executed init code into if __name__=="__main__" Since idl compiler is not intended to be used as a module to other python build scripts, move initialization code under an if checking that current module name is "__main__". Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-16 19:30:33 +03:00
Gleb Natapov	37368726c9	migration_manager: remove unused announce() variant Message-Id: <20201216153150.GG3244976@scylladb.com>	2020-12-16 18:14:07 +02:00
Konstantin Osipov	2c46938c2a	commitlog: avoid a syscall in a most common case of segment recycle When recycling a segment in O_DSYNC mode if the size of the segment is neither shrunk nor grown, avoid calling file::truncate() or file::allocate(). Message-Id: <20201215182332.1017339-2-kostja@scylladb.com>	2020-12-16 14:57:36 +02:00
Avi Kivity	fdb47c954d	Merge "idl: allow IDL compiler to parse `const` specifiers for template arguments" from Pavel S " This patch series consists of the following patches: 1. The first one turned out to be a massive rewrite of almost everything in `idl-compiler.py`. It aims to decouple parser structures from the internal representation which is used in the code-generation itself. Prior to the patch everything was working with raw token lists and the code was extremely fragile and hard to understand and modify. Moreover, every change in the parser code caused a cascade effect of breaking things at many different places, since they were relying on the exact format of output produced by parsing rules. Now there is a bunch of supplementary AST structures which provide hierarchical and strongly typed structure as the output of parsing routine. It is much easier to verify (by the means of `isinstance`, for example) and extend since the internal structures used in code-generation are decoupled from the structure of parsing rules, which are now controlled by custom parse actions providing high-level abstractions. It is tested manually by checking that the old code produces exactly the same autogenerated sources for all Scylla IDLs as the new one. 2 and 3. Cosmetics changes only: fixed a few typos and moved from old-fashioned `string.Template` to python f-strings. This improves readability of the idl-compiler code by a lot. Only one non-functional whitespace change introduced. 4. This patch adds a very basic support for the parser to understand `const` specifier in case it's used with a template parameter for a data member in a class, e.g. struct my_struct { std::vector<const raft::log_entry> entries; }; It actually does two things: * Adjusts `static_asserts` in corresponding serializer methods to match const-ness of fields. * Defines a second serializer specialization for const type in `.dist.hh` right next to non-const one. This seems to be sufficient for raft-related uses for now. Please note there is no support for the following cases, though: const std::vector<raft::log_entry> entries; const raft::term_t term; None of the existing IDLs are affected by the change, so that we can gradually improve on the feature and write the idl unit-tests to increase test coverage with time. 5. A basic unit-test that writes a test struct with an `std::vector<S<const T>>` field and reads it back to verify that serialization works correctly. 6. Basic documentation for AST classes. TODO: should also update the docs in `docs/IDL.md`. But it is already quite outdated, and some changes would even be out of scope for this patch set. " * 'idl-compiler-refactor-v5' of https://github.com/ManManson/scylla: idl: add docstrings for AST classes idl: add unit-test for `const` specifiers feature idl: allow to parse `const` specifiers for template arguments idl: fix a few typos in idl-compiler idl: switch from `string.Template` to python f-strings and format string in idl-compiler idl: Decouple idl-compiler data structures from grammar structure	2020-12-16 14:05:33 +02:00
Gleb Natapov	61520a33d6	mutation_writer: pass exceptions through feed_writer feed_writer() eats exception and transforms it into an end of stream instead. Downstream validators hate when this happens. Fixes #7482 Message-Id: <20201216090038.GB3244976@scylladb.com>	2020-12-16 13:18:19 +02:00
Pavel Solodovnikov	8b8dce15c3	idl: add docstrings for AST classes Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-16 09:03:39 +03:00
Botond Dénes	978ec7a4bb	tools: introduce scylla-sstable-index A tool which lists all partitions contained in an sstable index. As all partitions in an sstable are indexed, this tool can be used to find out what partitions are contained in a given sstable. The printout has the following format: $pos: $human_readable_value (pk{$raw_hex_value}) Where: * $pos: the position of the partition in the (decompressed) data file * $human_readable_value: the human readable partition key * $raw_hex_value: the raw hexadecimal value of the binary representation of the partition key For now the tool requires the types making up the partition key to be specified on the command line, using the `--type\|-t` command line argument, using the Cassandra type class name notation for types. As these are not assumed to be widely known, this patch includes a document mapping all cql3 types to their Cassandra type class name equivalent (but not just). Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20201208092323.101349-1-bdenes@scylladb.com>	2020-12-15 18:46:47 +02:00
Calle Wilund	71c5dc82df	database: Verify iff we actually are writing memtables to disk in truncate Fixes #7732 When truncating with auto_snapshot on, we try to verify the low rp mark from the CF against the sstables discarded by the truncation timestamp. However, in a scenario like: Fill memtables Flush Truncate with snapshot A Fill memtables some more Truncate Move snapshot A to upload + refresh (load old tables) Truncate The last op will assert, because while we have sstables loaded, which will be discarded now, we did not in fact generate any _new_ ones (since memtables are empty), and the RP we get back from discard is one from an earlier generation set. (Any permutation of events that create the situation "empty memtable" + "non-empty sstables with only old tables" will generate the same error). Added a check that before flushing checks if we actually have any data, and if not, does not uphold the RP relation assert. Closes #7799	2020-12-15 16:24:36 +02:00
Avi Kivity	7636799b18	Merge 'Add waiting for flushes on table drops' from Piotr Sarna This series makes sure that before the table is dropped, all pending memtable flushes related to its memtables would finish. Normally, flushes are not problematic in Scylla, because all tables are by default `auto_snapshot=true`, which also implies that a table is flushed before being dropped. However, with `auto_snapshot=false` the flush is not attempted at all. It leads to the following race: 1. Run a node with `auto_snapshot=false` 2. Schedule a memtable flush (e.g. via nodetool) 3. Get preempted in the middle of the flush 4. Drop the table 5. The flush that already started wakes up and starts operating on freed memory, which causes a segfault Tests: manual(artificially preempting for a long time in bullet point 2. to ensure that the race occurs; segfaults were 100% reproducible before the series and do not happen anymore after the series is applied) Fixes #7792 Closes #7798 * github.com:scylladb/scylla: database: add flushes to waiting for pending operations table: unify waiting for pending operations database: add a phaser for flush operations database: add waiting for pending streams on table drop	2020-12-15 16:02:47 +02:00
Pavel Solodovnikov	1e6df841a5	idl: add unit-test for `const` specifiers feature Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-15 16:03:18 +03:00
Pavel Solodovnikov	facf27dbe4	idl: allow to parse `const` specifiers for template arguments This patch introduces very limited support for declaring `const` template parameters in data members. It's not covering all the cases, e.g. `const type member_variable` and `const template_def<T1, T2, ...>` syntax is not supported at the moment. Though the changes are enough for raft-related use: this makes it possible to declare `std::vector<raft::log_entries_ptr>` (aka `std::vector<lw_shared_ptr<const raft::log_entry>>`) in the IDL. Existing IDL files are not affected in any way. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-15 16:03:11 +03:00
Pavel Solodovnikov	f02703fcd7	idl: fix a few typos in idl-compiler Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-15 16:02:55 +03:00
Pavel Solodovnikov	28b602833f	idl: switch from `string.Template` to python f-strings and format string in idl-compiler Move to a modern and lightweight syntax of f-strings introduced in python 3.6. It improves readability and provides greater flexibility. A few places are now using format strings instead, though. In case when multiline substitution variable is used, the template string should be first re-indented and only after that the formatting should be applied, or we can end up with screwed indentation the in generated sources. This change introduces one invisible whitespace change in `query.dist.impl.hh`, otherwise all generated code is exactly the same. Tests: build(dev) and diff genetated IDL sources by hand Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-15 16:01:17 +03:00
Pavel Solodovnikov	4ab1f7f55d	idl: Decouple idl-compiler data structures from grammar structure Instead of operating on the raw lists of tokens, transform them into typed structures representation, which makes the code by many orders of magnitude simpler to read, understand and extend. This includes sweeping changes throughout the whole source code of the tool, because almost every function was tightly coupled to the way data was passed down from the parser right to the code generation routines. Tested manually by checking that old generated sources are precisely the same as the new generated sources. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-12-15 15:59:17 +03:00
Piotr Sarna	b1208d0fcc	database: add flushes to waiting for pending operations In order to prevent races with table drops, the helper function which waits for all pending operations to finish now also waits for pending flushes.	2020-12-15 13:11:33 +01:00
Piotr Sarna	cd1e351dc1	table: unify waiting for pending operations In order to reduce code duplication which already caused a bug, waiting for pending operations is now unified with a single helper function.	2020-12-15 13:11:25 +01:00
Piotr Sarna	df3204426d	database: add a phaser for flush operations Pending flushes can participate in races when a table with auto_snapshot==false is dropped. The race is as follows: 1. A flush of table T is initiated 2. The flush operation is preempted 3. Table T is dropped without flushing, because it has auto_snapshot off 4. The flush operation from (2.) wakes up and continues working on table T, which is already dropped 5. Segfault/memory corruption To prevent such races, a phaser for pending flushes is introduced	2020-12-15 12:59:36 +01:00
Piotr Sarna	57d63ca036	database: add waiting for pending streams on table drop We already wait for pending reads and writes, so for completeness we should also wait for all pending stream operations to finish before dropping the table to avoid inconsistencies.	2020-12-15 12:55:45 +01:00
Takuya ASADA	ebc4076fa5	tools: toolchain: add node_exporter Download node_exporter in frozen image to prepare adding node_exporter to relocatable pacakge. Related #2190 Closes #7765 [avi: updated toolchain, x86_64/aarch64/s390x]	2020-12-14 20:34:17 +02:00
Piotr Sarna	13317f7698	alternator: ensure correct isolation level in tracing tests Taking advantage of the fact that isolation level can be defined for a table with a tag, the tracing test that relies on CAS can now be sure to have a correct isolation level. Message-Id: <43f005ab9d566c7d3d55ce93c553127b1df9e87f.1607954739.git.sarna@scylladb.com>	2020-12-14 17:37:55 +02:00
Piotr Sarna	7081e361cc	test: add isolation level requirement message to tracing tests Alternator tracing tests require the cluster to have the 'always' isolation level configured to work properly. If that's not the case, the tests will fail due to not having CAS-related traces present in the logs. In order to help the users fix their configuration, a helper message is printed before the test case is performed. Automatic tests do not need this, because they are all ran with matching isolation level, but this message could greatly improve the user experience for manual tests. Message-Id: <62bcbf60e674f57a55c9573852b6a28f99cbf408.1607949754.git.sarna@scylladb.com>	2020-12-14 14:53:58 +02:00
Piotr Sarna	4b0303d8ae	tests: make alternator tracing tests idempotent The outcome of alternator tracing tests was that tracing probability was always set to 0 after the test was finished. That makes sense for most test runs, but manual tests can work on existing clusters with tracing probability set to some other value. Due to preserve previous trace probability, the value is now extracted and stored, so that it can be restored after the test is done. Message-Id: <94f829b63f92847b4abb3b16f228bf9870f90c2e.1607949754.git.sarna@scylladb.com>	2020-12-14 14:53:23 +02:00
Avi Kivity	19ff528ef3	Update seastar submodule * seastar 2de43eb6bf...3b8903d406 (3): > coroutines: check preemption flag in co_await > memory: consider span freelist objects in small pool diagnostics > util: noncopyable_function: avoid gcc uninitialized error in move constructor	2020-12-14 12:50:32 +02:00
Pekka Enberg	8d00c16feb	transport/server: Code cleanups Fix up some coding style issues spotted while reading the code: - Fix indentation to be 4 spaces - Remove superfluous semicolons Closes #7793	2020-12-14 12:48:05 +02:00
Konstantin Osipov	b6c6cc275f	commitlog: align input of dma_write() during segment recycle Normally a file size should be aligned around block size, since we never write to it any unaligned size. However, we're not protected against partial writes. Just to be safe, align up the amount of bytes to zerofill when recycling a segment. Message-Id: <20201211142628.608269-4-kostja@scylladb.com>	2020-12-14 12:16:18 +02:00
Konstantin Osipov	ad6817bcde	commitlog: fix typo in a comment Message-Id: <20201211142628.608269-2-kostja@scylladb.com>	2020-12-14 12:16:14 +02:00
Benny Halevy	0e79e0f215	test: mutation_diff: extend section markers When the different mutations are printed via BOOST_REQUIRE_EQUAL, we don't get the "expect {} but got {}" section markers. Instead, the parts we're interested in are bracketed like "critical check X == Y has failed [{} != {}]" Test: with both formats: - https://github.com/scylladb/scylla/files/3890627/test_concurrent_reads_and_eviction.log - https://github.com/scylladb/scylla/files/4303117/flat_mutation_reader_test.118.log - https://github.com/scylladb/scylla/files/5687372/flat_mutation_reader_test.172.log.gz Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201214100521.3814909-1-bhalevy@scylladb.com>	2020-12-14 12:11:34 +02:00
Nadav Har'El	72cb3e9255	alternator test: add missing wait for update_table to finish Three tests in test_streams.py run update_table() on a table without waiting for it to complete, and then call update_table() on the same table or delete it. This always works in Scylla, and usually works in AWS, but if we reach the second call, it may fail because the previous update_table() did not take effect yet. We sometimes see these failures when running the Alternator test suite against AWS. So in this patch, after an each update_table() we wait for the table to return from UPDATING to ACTIVE status. The entire Alternator test suite now passes (or skipped) on AWS, so: Fixes #7778. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201213164931.2767236-1-nyh@scylladb.com>	2020-12-14 09:18:38 +01:00
Nadav Har'El	43ce0aef3d	alternator test: fix test wrongly failing on AWS The test test_query_filter.py::test_query_filter_paging fails on AWS and shouldn't fail, so this patch fixes the test. Note that this is only a test problem - no fix is needed for Alternator itself. The test reads 20 results with 1-result pages, and assumed that 21 pages are returned. The 21st page may happen because when the server returns the 20th, it might not yet know there will be no additional results, so another page is needed - and will be empty. Still a different implementation might notice that the last page completed the iteration, and not return an extra empty page. This is perfectly fine, and this is what AWS DynamoDB does today - and should not be considered an error. Refs #7778 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201213143612.2761943-1-nyh@scylladb.com>	2020-12-14 09:18:31 +01:00
Nadav Har'El	4ab98a4c68	alternator: use a more specific error when Authorization header is missing When request signature checking is enabled in Alternator, each request should come with the appropriate Authorization header. Most errors in this preparing this header will result in an InvalidSignatureException response; But DynamoDB returns a more specific error when this header is completely missing: MissingAuthenticationTokenException. We should do the same, but before this patch we return InvalidSignatureException also for a missing header. The test test_authorization.py::test_no_authorization_header used to enshrine our wrong error message, and failed when run against AWS. After this patch, we fix the error message and the test - which now passes against both Alternator and AWS. Refs #7778. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201213133825.2759357-1-nyh@scylladb.com>	2020-12-14 09:18:24 +01:00
Avi Kivity	39afe14ad4	Merge 'Add per query timeout' from Piotr Sarna This series allows setting per-query timeout via CQL. It's possible via the existing `USING` clause, which is extended to be available for `SELECT` statement as well. This parameter accepts a duration and can also be provided as a marker. The parameter acts as a regular part of the `USING` clause, which means that it can be used along with `USING TIMESTAMP` and `USING TTL` without issues. The series comes with a pytest test suite. Examples: ```cql SELECT * FROM t USING TIMEOUT 200ms; ``` ```cql INSERT INTO t(a,b,c) VALUES (1,2,3) USING TIMESTAMP 42 AND TIMEOUT 50ms; ``` Working with prepared statements works as usual - the timeout parameter can be explicitly defined or provided as a marker: ```cql SELECT * FROM t USING TIMEOUT ?; ``` ```cql INSERT INTO t(a,b,c) VALUES (?,?,?) USING TIMESTAMP 42 AND TIMEOUT 50ms; ``` Tests: unit(dev) Fixes #7777 Closes #7781 * github.com:scylladb/scylla: test: add prepared statement tests to USING TIMEOUT suite docs: add an entry about USING TIMEOUT test: add a test suite for USING TIMEOUT storage_proxy: start propagating local timeouts as timeouts cql3: allow USING clause for SELECT statement cql3: add TIMEOUT attribute to the parser cql3: add per-query timeout to select statement cql3: add per-query timeout to batch statement cql3: add per-query timeout to modification statement cql3: add timeout to cql attributes	2020-12-14 09:46:46 +02:00
Piotr Sarna	d6e7e36280	test: add prepared statement tests to USING TIMEOUT suite	2020-12-14 07:50:40 +01:00
Piotr Sarna	da77ab832b	docs: add an entry about USING TIMEOUT The paragraph describes how USING TIMEOUT clause can be used along with some simple examples.	2020-12-14 07:50:40 +01:00
Piotr Sarna	0148b41a02	test: add a test suite for USING TIMEOUT The test suite is based on cql-pytest and checks if USING TIMEOUT works as expected.	2020-12-14 07:50:40 +01:00
Piotr Sarna	27fba35832	storage_proxy: start propagating local timeouts as timeouts A local timeout was previously propagated to the client as WriteFailure, while there exists a more concrete error type for that: WriteTimeout.	2020-12-14 07:50:40 +01:00
Piotr Sarna	ddd9cb1b2a	cql3: allow USING clause for SELECT statement In order to be able to specify a timeout for SELECT statements, it's now possible to use the USING clause with it.	2020-12-14 07:50:40 +01:00
Piotr Sarna	d3896a209b	cql3: add TIMEOUT attribute to the parser It's now possible to specify TIMEOUT as part of the USING clause.	2020-12-14 07:50:40 +01:00
Piotr Sarna	157be33b89	cql3: add per-query timeout to select statement First of all, select statement is extended with an 'attrs' field, which keeps the per-query attributes. Currently, only TIMEOUT parameter is legal to use, since TIMESTAMP and TTL bear no meaning for reads. Secondly, if TIMEOUT attribute is set, it will be used as the effective timeout for a particular query.	2020-12-14 07:50:40 +01:00
Piotr Sarna	20dedd0df7	cql3: add per-query timeout to batch statement If TIMEOUT attribute is set, it will be used as the effective timeout for a particular query.	2020-12-14 07:50:40 +01:00
Piotr Sarna	3c49b6bd88	cql3: add per-query timeout to modification statement If TIMEOUT attribute is set, it will be used as the effective timeout for a particular query.	2020-12-14 07:50:40 +01:00
Piotr Sarna	5bbd0b049b	cql3: add timeout to cql attributes This attribute will be used later to specify per-query timeout.	2020-12-14 07:50:40 +01:00
Benny Halevy	c60da2e90d	cdc: remove _token_metadata from db_context 1. It's unused since `cbe510d1b8` 2. It's unsafe to keep a reference to token_metadata& potentially across yield points. The higher-level motivation is to make storage_service::get_token_metadata() private so we can control better how it's used. For cdc, if the token_metadata is going to be needed to the future, it'd be better get it from db_context::_proxy.get_token_metadata_ptr(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201213162351.52224-2-bhalevy@scylladb.com>	2020-12-13 18:32:17 +02:00
Avi Kivity	0f967f911d	Merge "storage_service: get_token_metadata_ptr to hold on to token_metadata" from Benny " This series fixes use-after-free via token_metadata& We may currently get a token_metadata& via get_token_metadata() and use it across yield points in a couple of sites: - do_decommission_removenode_with_repair - get_new_source_ranges To fix that, get_token_metadata_ptr and hold on to it across yielding. Fixes #7790 Dtest: update_cluster_layout_tests:TestUpdateClusterLayout.simple_removenode_2_test(debug) Test: unit(dev) " * tag 'storage_service-token_metadata_ptr-v2' of github.com:bhalevy/scylla: storage_service: get_new_source_ranges: don't hold token_metadata& across yield point storage_service: get_changed_ranges_for_leaving: no need to maybe_yield for each token_range storage_service: get_changed_ranges_for_leaving: release token_metadata_ptr sooner storage_service: get_changed_ranges_for_leaving: don't hold token_metadata& across yield	2020-12-13 17:37:24 +02:00
Aleksandr Bykov	e74dc311e7	dist: scylla_util: fix aws_instance.ebs_disks method aws_instance.ebs_disks() method should return ebs disk instead of ephemeral Signed-off-by: Aleksandr Bykov <alex.bykov@scylladb.com> Closes #7780	2020-12-13 17:33:37 +02:00
Benny Halevy	1fbc831dae	storage_service: get_new_source_ranges: don't hold token_metadata& across yield point Provide the token_metadata& to get_new_source_ranges by the caller, who keeps it valid throughout the call. Note that there is no need to clone_only_token_map since the token_metadata_ptr is immutable and can be used just as well for calling strat.get_range_addresses. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-13 16:42:00 +02:00
Benny Halevy	f13913d251	storage_service: get_changed_ranges_for_leaving: no need to maybe_yield for each token_range Now that we pass can_yield::yes to calculate_natural_endpoints for each token_range. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-13 16:42:00 +02:00
Benny Halevy	89ed0705e8	storage_service: get_changed_ranges_for_leaving: release token_metadata_ptr sooner No need to hold on to the shared token_metadata_ptr after we got clone_after_all_left(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-13 16:42:00 +02:00
Benny Halevy	684c4143df	storage_service: get_changed_ranges_for_leaving: don't hold token_metadata& across yield When yielding in clone_only_token_map or clone_after_all_left the token_metadata got with get_token_metadata() may go away. Use get_token_metadata_ptr() instead to hold on to it. And with that, we don't need to clone_only_token_map. `metadata` is not modified by calculate_natural_endpoints, so we can just refer to the immutable copy retrieved with get_token_metadata_ptr. Fixes #7790 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-13 16:41:58 +02:00
Avi Kivity	65a0244614	Update tools/jmx submodule * tools/jmx 6174a47...20469bf (1): > column_family: Return proper cardinality for toppartitions requests	2020-12-13 13:51:38 +02:00
Avi Kivity	9265b87610	Merge "Remove get_local_storage_proxy from validation" from Pavel E " The validate_column_family() helper uses the global proxy reference to get database from. Fortunatelly, all the callers of it can provide one via argument. tests: unit(dev) " * 'br-no-proxy-in-validate' of https://github.com/xemul/scylla: validation: Remove get_local_storage_proxy call client_state: Call validate_column_family() with database arg client_state: Add database& arg to has_column_family_access storage_proxy: Add .local_db() getters validate: Mark database argument const	2020-12-13 13:12:57 +02:00
Avi Kivity	19aaf8eb83	Merge "Remove global storage service from index manager" from Pavel E " The initial intent was to remove call for global storage service from secondary index manager's create_view_for_index(), but while fixing it one of intermediate schema table's helper managed to benefit from it by re-using the database reference flying by. The cleanup is done by simply pushing the database reference along the stack from the code that already has it down the create_view_for_index(). tests: unit(dev) " * 'br-no-storages-in-index-and-schema' of https://github.com/xemul/scylla: schema-tables: Use db from make_update_table_mutations in make_update_indices_mutations schema-tables: Add database argument to make_update_table_mutations schema-tables: Factor out calls getting database instance index-manager: Move feature evaluation one level up	2020-12-13 12:41:51 +02:00
Benny Halevy	aae3991246	repair: do_decommission_removenode_with_repair: don't deref ops when null `ops` might be passed as a disengaged shared_ptr when called from `decommission_with_repair`. In this case we need to propagate to sync_data_using_repair a disengaged std::optional<utils::UUID>. Fixes #7788 DTest: update_cluster_layout_tests:TestUpdateClusterLayout.verify_latest_copy_decommission_node_test(debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201213073743.331253-1-bhalevy@scylladb.com>	2020-12-13 12:37:18 +02:00
Avi Kivity	18be57a4e5	Update seastar submodule * seastar 8b400c7b45...2de43eb6bf (3): > core: show span free sizes correctly in diagnostics > Merge "IO queues to share capacities" from Pavel E > file: make_file_impl: determine blockdev using st_mode	2020-12-12 21:57:01 +02:00
Pekka Enberg	c990f2bd34	Merge 'Reinstate [[nodiscard]] support' from Avi Kivity The switch to clang disabled the clang-specific -Wunused-value since it generated some harmless warnings. Unfortunately, that also prevent [[nodiscard]] violations from warning. Fix by clearing all instances of the warning (including [[nodiscard]] violations that crept in while it was disabled) and reinstating the warning. Closes #7767 * github.com:scylladb/scylla: build: reinstate -Wunused-value warning for [[nodiscard]] test: lib: don't ignore future in compare_readers() test: mutation_test: check both ranges when comparing summaries serialializer: silence unused value warning in variant deserializer	2020-12-12 09:54:05 +02:00
Avi Kivity	615b8e8184	dist: rpm: uninstall tuned when installing scylla-kernel-conf tuned 2.11.0-9 and later writes to kerned.sched_wakeup_granularity_ns and other sysctl tunables that we so laboriously tuned, dropping performance by a factor of 5 (due to increased latency). Fix by obsoleting tuned during install (in effect, we are a better tuned, at least for us). Not needed for .deb, since debian/ubunto do not install tuned by default. Fixes #7696 Closes #7776	2020-12-12 09:54:05 +02:00
Pavel Emelyanov	3a025cfa52	schema-tables: Use db from make_update_table_mutations in make_update_indices_mutations Two halves of the tunnel finally connect -- the latter helper needs the local database instance and is only called by the former one which already has it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:23:53 +03:00
Pavel Emelyanov	89fd524c5a	schema-tables: Add database argument to make_update_table_mutations There are 3 callers of this helper (cdc, migration manager and tests) and all of them already have the database object at hands. The argument will be used by next patch to remove call for global storage proxy instance from make_update_indices_mutations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:21:22 +03:00
Pavel Emelyanov	1bcef04c7a	schema-tables: Factor out calls getting database instance The make_update_indices_mutations gets database instance for two things -- to find the cf to work with and to get the value of a feature for index view creation. To suit both and to remove calls for global storage proxy and service instances get the database once in the function entrance. Next patch will clean this further. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:17:11 +03:00
Pavel Emelyanov	6dd10e771d	index-manager: Move feature evaluation one level up The create_view_for_index needs to know the state of the correct-idx-token-in-secondary-index feature. To get one it takes quite a long route through global storage service instance. Since there's only one caller of the method in question, and the method is called in a loop, it's a bit faster to get the feature value in caller and pass it in argument. This will also help to get rid of the call for global storage service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:14:12 +03:00
Pavel Emelyanov	3a3ee45488	size_estimate_reader: Use local db reference not global The get_next_partition uses global proxy instance to get the local database reference. Now it's available in the reader object itself, so it's possible to remove this call for global storage proxy. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 20:38:21 +03:00
Pavel Emelyanov	107dcbfbd6	size_estimate_reader: Keep database reference on mutation reader This reader uses local databse instance in its get_next_partition method to find keyspaces to work with Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 20:34:54 +03:00
Pavel Emelyanov	48e494fb62	size_estimate_reader: Keep database reference on virtual_reader The database will be then used to create the mutation reader Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 20:31:35 +03:00
Pavel Emelyanov	83073f4e8b	validation: Remove get_local_storage_proxy call It is used in validate_column_family. The last caller of it was removed by previous patch, so we may kill the helper itself Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 18:52:42 +03:00
Pavel Emelyanov	12cc539835	client_state: Call validate_column_family() with database arg The previous patch brought the databse reference arg. And since the currently called validate_column_family() overload _just_ gets the database from global proxy, it's better to shortcut. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 18:50:49 +03:00
Pavel Emelyanov	b0c4a9087d	client_state: Add database& arg to has_column_family_access It is called from cql3/statements' check_access methods and from thrift handlers. The former have proxy argument from which they can get the database. The latter already have the database itself on board. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 18:49:16 +03:00
Pavel Emelyanov	4c7bc8a3d1	storage_proxy: Add .local_db() getters To facilitate the next patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 18:48:02 +03:00
Avi Kivity	a11ecfe231	Merge 'types: don't linearize in validate()' from Michał Chojnowski A sequel to #7692. This series gets rid of linearization when validating collections and tuple types. (Other types were already validated without linearizing). The necessary helpers for reading from fragmented buffers were introduced in #7692. All this series does is put them to use in `validate()`. Refs: #6138 Closes #7770 * github.com:scylladb/scylla: types: add single-fragment optimization in validate() utils: fragment_range: add with_simplified() cql3: statements: select_statement: remove unnecessary use of with_linearized cql3: maps: remove unnecessary use of with_linearized cql3: lists: remove unnecessary use of with_linearized cql3: tuples: remove unnecessary use of with_linearized cql3: sets: remove unnecessary use of with_linearized cql3: tuples: remove unnecessary use of with_linearized cql3: attributes: remove unnecessary uses of with_linearized types: validate lists without linearizing types: validate tuples without linearizing types: validate sets without linearizing types: validate maps without linearizing types: template abstract_type::validate on FragmentedView types: validate_visitor: transition from FragmentRange to FragmentedView utils: fragmented_temporary_buffer: add empty() to FragmentedView utils: fragmented_temporary_buffer: don't add to null pointer	2020-12-11 17:33:59 +02:00
Pavel Emelyanov	563b466227	validate: Mark database argument const They are indeed used like that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 18:27:45 +03:00
Michał Chojnowski	150473f074	types: add single-fragment optimization in validate() Manipulating fragmented views is costlier that manipulating contiguous views, so let's detect the common situation when the fragmented view is actually contiguous underneath, and make use of that. Note: this optimization is only useful for big types. For trivial types, validation usually only checks the size of the view.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	e2d17879fc	utils: fragment_range: add with_simplified() Reading from contiguous memory (bytes_view) is significantly simpler runtime-wise than reading from a fragmented view, due to less state and less branching, so we often want to convert a fragmented view to a simple view before processing it, if the fragmented view contains at most one fragment, which is common. with_simplified() does just that.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	51ca5fa4c5	cql3: statements: select_statement: remove unnecessary use of with_linearized We can validate directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	72186bee69	cql3: maps: remove unnecessary use of with_linearized We can validate directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	3f3a10c588	cql3: lists: remove unnecessary use of with_linearized We can validate directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	efa036329d	cql3: tuples: remove unnecessary use of with_linearized We can validate directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	4f359a7a99	cql3: sets: remove unnecessary use of with_linearized We can validate directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	281417917b	cql3: tuples: remove unnecessary use of with_linearized We can validate directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	d1d1a00311	cql3: attributes: remove unnecessary uses of with_linearized We can validate and deserialize directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	0581b3ff31	types: validate lists without linearizing We can validate collections directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	4fe41b69fd	types: validate tuples without linearizing We can validate tuples directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	a7dd736d03	types: validate sets without linearizing We can validate collections directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	1459608375	types: validate maps without linearizing We can validate collections directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	82befbe8c0	types: template abstract_type::validate on FragmentedView This is primarily a stylistic change. It makes the interface more consistent with deserialize(). It will also allow us to call `validate()` for collection elements in `validate_aux()`.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	15dbe00e8a	types: validate_visitor: transition from FragmentRange to FragmentedView This will allow us to easily get rid of linearizations when validating collections and tuples, because the helpers used in validate_aux() already have FragmentedView overloads.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	3647c0ba47	utils: fragmented_temporary_buffer: add empty() to FragmentedView It's redundant with size_bytes(), but sometimes empty() is more readable and reduces churn when replacing other types with FragmentedView.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	b4dd5d3bdb	utils: fragmented_temporary_buffer: don't add to null pointer When fragmented_temporary_buffer::view is created from a bytes_view, _current is null. In that case, in remove_current(), null pointer offset happens, and ubsan complains. Fix that.	2020-12-11 09:53:07 +01:00
Raphael S. Carvalho	e4b55f40f3	sstables: Fix sstable reshaping for STCS The heuristic of STCS reshape is correct, and it built the compaction descriptor correctly, but forgot to return it to the caller, so no reshape was ever done on behalf of STCS even when the strategy needed it. Fixes #7774. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20201209175044.1609102-1-raphaelsc@scylladb.com>	2020-12-10 12:45:25 +02:00
Asias He	829b4c1438	repair: Make removenode safe by default Currently removenode works like below: - The coordinator node advertises the node to be removed in REMOVING_TOKEN status in gossip - Existing nodes learn the node in REMOVING_TOKEN status - Existing nodes sync data for the range it owns - Existing nodes send notification to the coordinator - The coordinator node waits for notification and announce the node in REMOVED_TOKEN Current problems: - Existing nodes do not tell the coordinator if the data sync is ok or failed. - The coordinator can not abort the removenode operation in case of error - Failed removenode operation will make the node to be removed in REMOVING_TOKEN forever. - The removenode runs in best effort mode which may cause data consistency issues. It means if a node that owns the range after the removenode operation is down during the operation, the removenode node operation will continue to succeed without requiring that node to perform data syncing. This can cause data consistency issues. For example, Five nodes in the cluster, RF = 3, for a range, n1, n2, n3 is the old replicas, n2 is being removed, after the removenode operation, the new replicas are n1, n5, n3. If n3 is down during the removenode operation, only n1 will be used to sync data with the new owner n5. This will break QUORUM read consistency if n1 happens to miss some writes. Improvements in this patch: - This patch makes the removenode safe by default. We require all nodes in the cluster to participate in the removenode operation and sync data if needed. We fail the removenode operation if any of them is down or fails. If the user want the removenode operation to succeed even if some of the nodes are not available, the user has to explicitly pass a list of nodes that can be skipped for the operation. $ nodetool removenode --ignore-dead-nodes <list_of_dead_nodes_to_ignore> <host_id> Example restful api: $ curl -X POST "http://127.0.0.1:10000/storage_service/remove_node/?host_id=7bd303e9-4c7b-4915-84f6-343d0dbd9a49&ignore_nodes=127.0.0.3,127.0.0.5" - The coordinator can abort data sync on existing nodes For example, if one of the nodes fails to sync data. It makes no sense for other nodes to continue to sync data because the whole operation will fail anyway. - The coordinator can decide which nodes to ignore and pass the decision to other nodes Previously, there is no way for the coordinator to tell existing nodes to run in strict mode or best effort mode. Users will have to modify config file or run a restful api cmd on all the nodes to select strict or best effort mode. With this patch, the cluster wide configuration is eliminated. Fixes #7359 Closes #7626	2020-12-10 10:14:39 +02:00
Piotr Sarna	20bdeb315a	Merge ' types: add constraint on lexicographical_tri_compare()' from Avi Kivity Verify that the input types are iterators and their value types are compatible with the compare function. Because some of the inputs were not actually valid iterators, they are adjusted too. Closes #7631 * github.com:scylladb/scylla: types: add constraint on lexicographical_tri_compare() composite: make composite::iterator a real input_iterator compound: make compount_type::iterator a real input_iterator	2020-12-09 18:48:01 +01:00
Nadav Har'El	a8fdbf31cd	alternator: fix UpdateItem ADD for non-existent attribute UpdateItem's "ADD" operation usually adds elements to an existing set or adds a number to an existing counter. But it can also be used to create a new set or counter (as if adding to an empty set or zero). We unfortunately did not have a test for this case (creating a new set or counter), and when I wrote such a test now, I discovered the implementation was missing. So this patch adds both the test and the implementation. The new test used to fail before this patch, and passes with it - and passes on DynamoDB. Note that we only had this bug for the newer UpdateItem syntax. For the old AttributeUpdates syntax, we already support ADD actions on missing attributes, and already tested it in test_update_item_add(). I just forgot to test the same thing for the newer syntax, so I missed this bug :-( Fixes #7763. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201207085135.2551845-1-nyh@scylladb.com>	2020-12-09 18:44:30 +01:00
Juliusz Stasiewicz	b150906d39	gossip: Added SNITCH_NAME to `application_state` Snitch name needs to be exchanged within cluster once, on shadow round, so joining nodes cannot use wrong snitch. The snitch names are compared on bootstrap and on normal node start. If the cluster already used mixed snitches, the upgrade to this version will fail. In this case customer needs to add a node with correct snitch for every node with the wrong snitch, then put down the nodes with the wrong snitch and only then do the upgrade. Fixes #6832 Closes #7739	2020-12-09 15:45:25 +02:00
Nadav Har'El	781f9d9aca	alternator: make default timeout configurable Whereas in CQL the client can pass a timeout parameter to the server, in the DynamoDB API there is no such feature; The server needs to choose reasonable timeouts for its own internal operations - e.g., writes to disk, querying other replicas, etc. Until now, Alternator had a fixed timeout of 10 seconds for its requests. This choice was reasonable - it is much higher than we expect during normal operations, and still lower than the client-side timeouts that some DynamoDB libraries have (boto3 has a one-minute timeout). However, there's nothing holy about this number of 10 seconds, some installations might want to change this default. So this patch adds a configuration option, "--alternator-timeout-in-ms", to choose this timeout. As before, it defaults to 10 seconds (10,000ms). In particular, some test runs are unusually slow - consider for example testing a debug build (which is already very slow) in an extremely over-comitted test host. In some cases (see issue #7706) we noticed the 10 second timeout was not enough. So in this patch we increase the default timeout chosen in the "test/alternator/run" script to 30 seconds. Please note that as the code is structured today, this timeout only applies to some operations, such as GetItem, UpdateItem or Scan, but does not apply to CreateTable, for example. This is a pre-existing issue that this patch does not change. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201207122758.2570332-1-nyh@scylladb.com>	2020-12-09 14:30:43 +01:00
Avi Kivity	f802356572	Revert "Revert "Merge "raft: fix replication if existing log on leader" from Gleb"" This reverts commit `dc77d128e9`. It was reverted due to a strange and unexplained diff, which is now explained. The HEAD on the working directory being pulled from was set back, so git thought it was merging the intended commits, plus all the work that was committed from HEAD to master. So it is safe to restore it.	2020-12-08 19:19:55 +02:00
Avi Kivity	1badd315ef	Merge "Speed up devel tests 10 times" from Pavel E " The multishard_mutation_query test is toooo slow when built with clang in dev mode. By reducing the number of scans it's possible to shrink the full suite run time from half an hour down to ~3 minutes. tests: unit(dev) " * 'br-devel-mode-tests' of https://github.com/xemul/scylla: test: Make multishard_mutation_query test do less scans configure: Add -DDEVEL to dev build flags	2020-12-08 15:42:12 +02:00
Pavel Emelyanov	b837cf25b1	test: Make multishard_mutation_query test do less scans When built by clang this dev-mode test takes ~30 minutes to complete. Let's reduce this time by reducing the scale of the test if DEVEL is set. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-08 15:55:04 +03:00
Pavel Emelyanov	703451311f	configure: Add -DDEVEL to dev build flags To let source code tell debug, dev and release builds from each other. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-08 15:54:30 +03:00
Avi Kivity	461c9826de	Merge 'scylla_setup: fix wrong command suggestion' from Takuya ASADA scylla_setup command suggestion does not shows an argument of --io-setup, because we mistakely stores bool value on it (recognized as 'store_true'). We always need to print '--io-setup X' on the suggestion instead. Also, --nic is currently ignored by command suggestion, need to print it just like other options. Related #7395 Closes #7724 * github.com:scylladb/scylla: scylla_setup: print --swap-directory and --swap-size on command suggestion scylla_setup: print --nic on command suggestion scylla_setup: fix wrong command suggestion on --io-setup scylla_setup command suggestion does not shows an argument of --io-setup, because we mistakely stores bool value on it (recognized as 'store_true'). We always need to print '--io-setup X' on the suggestion instead.	2020-12-08 13:58:55 +02:00
Avi Kivity	98271a5c57	Merge 'types: don't linearize in serialize_for_cql()' from Michał Chojnowski A sequel to #7692. This series gets rid of linearization in `serialize_for_cql`, which serializes collections and user types from `collection_mutation_view` to CQL. We switch from `bytes` to `bytes_ostream` as the intermediate buffer type. The only user of of `serialize_for_cql` immediately copies the result to another `bytes_ostream`. We could avoid some copies and allocations by writing to the final `bytes_ostream` directly, but it's currently hidden behind a template. Before this series, `serialize_for_cql_aux()` delegated the actual writing to `collection_type_impl::pack` and `tuple_type_impl::build_value`, by passing them an intermediate `vector`. After this patch, the writing is done directly in `serialize_for_cql_aux()`. Pros: we avoid the overhead of creating an intermediate vector, without bloating the source code (because creating that intermediate vector requires just as much code as serializing the values right away). Cons: we duplicate the CQL collection format knowledge contained in `collection_type_impl::pack` and `tuple_type_impl::build_value`. Refs: #6138 Closes #7771 * github.com:scylladb/scylla: types: switch serialize_for_cql from bytes to bytes_ostream types: switch serialize_for_cql_aux from bytes to bytes_ostream types: serialize user types to bytes_ostream types: serialize lists to bytes_ostream types: serialize sets to bytes_ostream types: serialize maps to bytes_ostream utils: fragment_range: use range-based for loop instead of boost::for_each types: add write_collection_value() overload for bytes_ostream and value_view	2020-12-08 12:38:36 +02:00
Lubos Kosco	a0b1474bba	scylla_util.py: Increase disk to ram ratio for GCP Increase accepted disk-to-RAM ratio to 105 to accomodate even 7.5GB of RAM for one NVMe log various reasons for not recommending the instance type. Fixes #7587 Closes #7600	2020-12-08 11:20:30 +02:00
Piotr Wojtczak	c09ab3b869	api: Add cardinality to toppartitions results This change enhances the toppartitions api to also return the cardinality of the read and write sample sets. It now uses the size() method of space_saving_top_k class, counting the unique operations in the sampled set for up to the given capacity. Fixes #4089 Closes #7766	2020-12-08 09:38:59 +01:00
Nadav Har'El	86779664f4	alternator: fix broken Scan/Query paging with bytes keys When an Alternator table has partition keys or sort keys of type "bytes" (blobs), a Scan or Query which required paging used to fail - we used an incorrect function to output LastEvaluatedKey (which tells the user where to continue at the next page), and this incorrect function was correct for strings and numbers - but NOT for bytes (for bytes, we need to encode them as base-64). This patch also includes two tests - for bytes partition key and for bytes sort key - that failed before this patch and now pass. The test test_fetch_from_system_tables also used to fail after a Limit was added to it, because one of the tables it scans had a bytes key. That test is also fixed by this patch. Fixes #7768 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201207175957.2585456-1-nyh@scylladb.com>	2020-12-08 09:38:23 +01:00
Eliran Sinvani	70770ff7fa	debian pkg: Make deb packages explicitly depend on versioned components Up until now, Scylla's debian packages dependencies versions were unspecified. This was due to a technical difficulty to determine the version of the dependent upon packages (such as scylla-python3 or scylla-jmx). Now, when those packages are also built as part of this repo and are built with a version identical to the server package itself we can depend all of our packages with explicit versions. The motivation for this change is that if a user tries to install a specific Scylla version by installing a specific meta package, it will silently drag in the latest components instead of the ones of the requested versions. The expected change in behavior is that after this change an attempt to install a metapackage with version which is not the latest will fail with an explicit error hinting the user what other packages of the same version should be explicitly included in the command line. Fixes #5514 Closes #7727	2020-12-07 18:58:15 +02:00
Michał Chojnowski	d43fd456cd	types: switch serialize_for_cql from bytes to bytes_ostream Now we can serialize collections from collection_mutation_view_description without linearizations.	2020-12-07 17:55:36 +01:00
Michał Chojnowski	81a55b032d	types: switch serialize_for_cql_aux from bytes to bytes_ostream We will switch serialize_for_cql itself to bytes_ostream soon.	2020-12-07 17:55:35 +01:00
Michał Chojnowski	71183cf0bd	types: serialize user types to bytes_ostream Avoids linearization by serializing to a fragmented type. It's still linearized at the very end, this will be changed in the near future.	2020-12-07 17:52:06 +01:00
Michał Chojnowski	41b889d0c8	types: serialize lists to bytes_ostream Avoids linearization by serializing to a fragmented type. It's still linearized at the very end, this will be changed in the near future.	2020-12-07 17:49:21 +01:00
Michał Chojnowski	2b3d2c193d	types: serialize sets to bytes_ostream Avoids linearization by serializing to a fragmented type. It's still linearized at the very end, this will be changed in the near future.	2020-12-07 17:47:49 +01:00
Michał Chojnowski	35823d12db	types: serialize maps to bytes_ostream Avoids linearization by serializing to a fragmented type. It's still linearized at the very end, this will be changed in the near future.	2020-12-07 17:47:12 +01:00
Botond Dénes	ba7cf2f5fd	tools/scylla-types: update name in description to use - instead of _ The executable was rename from using _ to using - to at one point but apparently the description wasn't updated. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20201207161626.79013-1-bdenes@scylladb.com>	2020-12-07 18:34:52 +02:00
Avi Kivity	7580a93ec8	build: reinstate -Wunused-value warning for [[nodiscard]] The switch to clang disabled the clang-specific -Wunused-value since it generated some harmless warnings. Unfortunately, that also prevent [[nodiscard]] violations from warning. Fix by reinstating the warning, now that all instances of the warning have been fixed.	2020-12-07 16:51:19 +02:00
Avi Kivity	8fc0bbd487	test: lib: don't ignore future in compare_readers() A fast_forward_to() call is not waited on in compare_readers(). Since this is called in a thread, add a future::get() call to wait for it.	2020-12-07 16:50:20 +02:00
Avi Kivity	732d83dc0e	test: mutation_test: check both ranges when comparing summaries A copy/paste error means we ignore the termination of one of the ranges. Change the comma expression to a disjunction to avoid the unused value warning from clang. The code is not perfect, since if the two ranges are not the same size we'll invoke undefined behavior, but it is no worse than before (where we ignored the comparison completely).	2020-12-07 16:47:52 +02:00
Avi Kivity	fc0a45af5f	serialializer: silence unused value warning in variant deserializer The variant deserializer uses a fold expression to implement an if-tree with a short-circuit, producing an intermediate boolean value to terminate evaluation. This intermediate value is unneeded, but evokes a warning from clang when -Wunused-value is enabled. Since we want to enable the warning, add a cast to void to ignore the intermediate value.	2020-12-07 16:45:20 +02:00
Michał Chojnowski	60a3cecfea	utils: fragment_range: use range-based for loop instead of boost::for_each We want to pass bytes_ostream to this loop in later commits. bytes_ostream does not conform to some boost concepts required by boost::for_each, so let's just use C++'s native loop.	2020-12-07 12:50:36 +01:00
Piotr Sarna	1cc4ed50c1	db: fix getting local ranges for size estimates table When getting local ranges, an assumption is made that if a range does not contain an end or when its end is a maximum token, then it must contain a start. This assumption proven not true during manual tests, so it's now fortified with an additional check. Here's a gdb output for a set of local ranges which causes an assertion failure when calling `get_local_ranges` on it: (gdb) p ranges $1 = std::vector of length 2, capacity 2 = {{_interval = {_start = std::optional<interval_bound<dht::token>> = {[contained value] = {_value = {_kind = dht::token_kind::before_all_keys, _data = 0}, _inclusive = false}}, _end = std::optional<interval_bound<dht::token>> [no contained value], _singular = false}}, {_interval = { _start = std::optional<interval_bound<dht::token>> [no contained value], _end = std::optional<interval_bound<dht::token>> = {[contained value] = {_value = { _kind = dht::token_kind::before_all_keys, _data = 0}, _inclusive = true}}, _singular = false}}} Closes #7764	2020-12-07 12:08:31 +02:00
Takuya ASADA	c3abba1913	scylla_setup: print --swap-directory and --swap-size on command suggestion We need to print --swap-directory and --swap-size on command suggestion just like other options. Related #7395	2020-12-07 18:40:59 +09:00
Takuya ASADA	582a3ffb2f	scylla_setup: print --nic on command suggestion We need to print --nic on command suggestion just like other options. Related #7395	2020-12-07 18:40:59 +09:00
Nadav Har'El	220d6dde17	alternator, test: make test_fetch_from_system_tables faster The test test_fetch_from_system_tables tests Alternator's system-table feature by reading from all system tables. The intention was to confirm we don't crash reading any of them - as they have different schemas and can run into different problems (we had such problems in the initial implementation). The intention was not to read a lot from each table - we only make a single "Scan" call on each, to read one page of data. However, the Scan call did not set a Limit, so the single page can get pretty big. This is not normally a problem, but in extremely slow runs - such as when running the debug build on an extremely overcommitted test machine (e.g., issue #7706) reading this large page may take longer than our default timeout. I'll send a separate patch for the timeout issue, but for now, there is really no reason why we need to read a big page. It is good enough to just read 50 rows (with Limit=50). This will still read all the different types and make the test faster. As an example, in the debug run on my laptop, this test spent 2.4 seconds to read the "compaction_history" table before this patch, and only 0.1 seconds after this patch. 2.4 seconds is close to our default timeout (10 seconds), 0.1 is very far. Fixes #7706 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201207075112.2548178-1-nyh@scylladb.com>	2020-12-07 08:52:31 +01:00
Michał Chojnowski	1fe7490970	types: add write_collection_value() overload for bytes_ostream and value_view We will use it to serialize collections to bytes_ostream in serialize_for_cql().	2020-12-07 08:48:31 +01:00
Nadav Har'El	0cd05dd0fd	cql-pytest: add tests for ALLOW FILTERING The original goal of this patch was to replace the two single-node dtests allow_filtering_test and allow_filtering_secondary_indexes_test, which recently caused us problems when we wanted to change the ALLOW FILTERING behavior but the tests were outside the tree. I'm hoping that after this patch, those two tests could be removed from dtest. But this patch actually tests more cases then those original dtest, and moreover tests not just whether ALLOW FILTERING is required or not, but also that the results of the filtering is correct. Currently, four of the included tests are expected to fail ("xfail") on Scylla, reproducing two issues: 1. Refs #5545: "WHERE x IN ..." on indexed column x wrongly requires ALLOW FILTERING 2. Refs #7608: "WHERE c=1" on clustering key c should require ALLOW FILTERING, but doesn't. All tests, except the one for issue #5545, pass on Cassandra. That one fails on Cassandra because doesn't support IN on an indexed column at all (regardless of whether ALLOW FILTERING is used or not). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201115124631.1224888-1-nyh@scylladb.com>	2020-12-06 19:51:25 +02:00
Pavel Solodovnikov	56c0fcfcb2	cql_query_test: handle `bounce_to_shard` msg in `test_null_value_tuple_floating_types_and_uuids` Use `prepared_on_shard` helper function to handle `bounce_to_shard` messages that can happen when using LWT statements. Fixes: #7757 Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20201204172944.601730-1-pa.solodovnikov@scylladb.com>	2020-12-06 19:34:13 +02:00
Amos Kong	6b1659ee80	schema.cc/describe: fix invalid compaction options in schema There is a typo in schema.cql of snapshot, lack of comma after compaction strategy. It will fail to restore schema by the file. AND compaction = {'class': 'SizeTieredCompactionStrategy''max_compaction_threshold': '32'} map_as_cql_param() function has a `first` parameter to smartly add comma, the compaction_strategy_options is always not the first. Fixes #7741 Signed-off-by: Amos Kong <amos@scylladb.com> Closes #7734	2020-12-06 17:40:05 +02:00
Avi Kivity	ca950e6f08	Merge "Remove get_local_storage_service() from counters" from Pavel E " The storage service is called there to get the cached value of db::system_keyspace::get_local_host_id(). Keeping the value on database decouples it from storage service and kills one more global storage service reference. tests: unit(dev) " * 'br-remove-storage-service-from-counters-2' of https://github.com/xemul/scylla: counters: Drop call to get_local_storage_service and related counters: Use local id arg in transform_counter_update_to_shards database: Have local id arg in transform_counter_updates_to_shards() storage_service: Keep local host id to database	2020-12-06 16:15:21 +02:00
Avi Kivity	6e460e121a	Merge 'docs: Add Sphinx and ScyllaDB theme' from David Garcia This PR adds the Sphinx documentation generator and the custom theme ``sphinx-scylladb-theme``. Once merged, the GitHub Actions workflow should automatically publish the developer notes stored under ``docs`` directory on http://scylladb.github.io/scylla 1. Run the command ``make preview`` from the ``docs`` directory. 3. Check the terminal where you have executed the previous command. It should not raise warnings. 3. Open in a new browser tab http://127.0.0.1:5500/ to see the generated documentation pages. The table of contents displays the files sorted as they appear on GitHub. In a subsequent iteration, @lauranovich and I will submit an additional PR proposing a new folder organization structure. Closes #7752 * github.com:scylladb/scylla: docs: fixed warnings docs: added theme	2020-12-06 15:26:57 +02:00
Benny Halevy	64a4ffc579	large_data_handler: do not delete records in the absence of large_data_stats The previous way of deleting records based on the whole sstatble data_size causes overzealous deletions (#7668) and inefficiency in the rows cache due to the large number of range tombstones created. Therefore we'd be better of by juts letting the records expire using he 30 days TTL. Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201206083725.1386249-1-bhalevy@scylladb.com>	2020-12-06 11:34:37 +02:00
Avi Kivity	dc77d128e9	Revert "Merge "raft: fix replication if existing log on leader" from Gleb" This reverts commit `0aa1f7c70a`, reversing changes made to `72c59e8000`. The diff is strange, including unrelated commits. There is no understanding of the cause, so to be safe, revert and try again.	2020-12-06 11:34:19 +02:00
Pavel Emelyanov	df0e26035f	counters: Drop call to get_local_storage_service and related The local host id is now passed by argument, so we don't need the counter_id::local() and some other methods that call or are called by it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-04 16:31:12 +03:00
Pavel Emelyanov	914613b3c3	counters: Use local id arg in transform_counter_update_to_shards Only few places in it need the uuid. And since it's only 16 bytes it's possibvle to safely capture it by value in the called lambdas. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-04 16:30:31 +03:00
Pavel Emelyanov	62214e2258	database: Have local id arg in transform_counter_updates_to_shards() There are two places that call it -- database code itself and tests. The former already has the local host id, so just pass one. The latter are a bit trickier. Currently they use the value from storage_service created by storage_service_for_tests, but since this version of service doesn't pass through prepare_to_join() the local_host_id value there is default-initialized, so just default-initialize the needed argument in place. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-04 15:09:30 +03:00
Pavel Emelyanov	5a286ee8d4	storage_service: Keep local host id to database The value in question is cached from db::system_keyspace for places that want to have it without waiting for futures. So far the only place is database counters code, so keep the value on database itself. Next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-04 15:09:29 +03:00
Piotr Sarna	2015988373	Merge 'types: get rid of linearization in deserialize()' from Michał Chojnowski Citing #6138: > In the past few years we have converted most of our codebase to work in terms of fragmented buffers, instead of linearised ones, to help avoid large allocations that put large pressure on the memory allocator. > One prominent component that still works exclusively in terms of linearised buffers is the types hierarchy, more specifically the de/serialization code to/from CQL format. Note that for most types, this is the same as our internal format, notable exceptions are non-frozen collections and user types. > > Most types are expected to contain reasonably small values, but texts, blobs and especially collections can get very large. Since the entire hierarchy shares a common interface we can either transition all or none to work with fragmented buffers. This series gets rid of intermediate linearizations in deserialization. The next steps are removing linearizations from serialization, validation and comparison code. Series summary: - Fix a bug in `fragmented_temporary_buffer::view::remove_prefix`. (Discovered while testing. Since it wasn't discovered earlier, I guess it doesn't occur in any code path in master.) - Add a `FragmentedView` concept to allow uniform handling of various types of fragmented buffers (`bytes_view`, `temporary_fragmented_buffer::view`, `ser::buffer_view` and likely `managed_bytes_view` in the future). - Implement `FragmentedView` for relevant fragmented buffer types. - Add helper functions for reading from `FragmentedView`. - Switch `deserialize()` and all its helpers from `bytes_view` to `FragmentedView`. - Remove `with_linearized()` calls which just became unnecessary. - Add an optimization for single-fragment cases. The addition of `FragmentedView` might be controversial, because another concept meant for the same purpose - `FragmentRange` - is already used. Unfortunately, it lacks the functionality we need. The main (only?) thing we want to do with a fragmented buffer is to extract a prefix from it and `FragmentRange` gives us no way to do that, because it's immutable by design. We can work around that by wrapping it into a mutable view which will track the offset into the immutable `FragmentRange`, and that's exactly what `linearizing_input_stream` is. But it's wasteful. `linearizing_input_stream` is a heavy type, unsuitable for passing around as a view - it stores a pair of fragment iterators, a fragment view and a size (11 words) to conform to the iterator-based design of `FragmentRange`, when one fragment iterator (4 words) already contains all needed state, just hidden. I suggest we replace `FragmentRange` with `FragmentedView` (or something similar) altogether. Refs: #6138 Closes #7692 * github.com:scylladb/scylla: types: collection: add an optimization for single-fragment buffers in deserialize types: add an optimization for single-fragment buffers in deserialize cql3: tuples: don't linearize in in_value::from_serialized cql3: expr: expression: replace with_linearize with linearized cql3: constants: remove unneeded uses of with_linearized cql3: update_parameters: don't linearize in prefetch_data_builder::add_cell cql3: lists: remove unneeded use of with_linearized query-result-set: don't linearize in result_set_builder::deserialize types: remove unneeded collection deserialization overloads types: switch collection_type_impl::deserialize from bytes_view to FragmentedView cql3: sets: don't linearize in value::from_serialized cql3: lists: don't linearize in value::from_serialized cql3: maps: don't linearize in value::from_serialized types: remove unused deserialize_aux types: deserialize: don't linearize tuple elements types: deserialize: don't linearize collection elements types: switch deserialize from bytes_view to FragmentedView types: deserialize tuple types from FragmentedView types: deserialize set type from FragmentedView types: deserialize map type from FragmentedView types: deserialize list type from FragmentedView types: add FragmentedView versions of read_collection_size and read_collection_value types: deserialize varint type from FragmentedView types: deserialize floating point types from FragmentedView types: deserialize decimal type from FragmentedView types: deserialize duration type from FragmentedView types: deserialize IP address types from FragmentedView types: deserialize uuid types from FragmentedView types: deserialize timestamp type from FragmentedView types: deserialize simple date type from FragmentedView types: deserialize time type from FragmentedView types: deserialize boolean type from FragmentedView types: deserialize integer types from FragmentedView types: deserialize string types from FragmentedView types: remove unused read_simple_opt types: implement read_simple* versions for FragmentedView utils: fragmented_temporary_buffer: implement FragmentedView for view utils: fragment_range: add single_fragmented_view serializer: implement FragmentedView for buffer_view utils: fragment_range: add linearized and with_linearized for FragmentedView utils: fragment_range: add FragmentedView utils: fragmented_temporary_buffer: fix view::remove_prefix	2020-12-04 09:46:20 +01:00
Michał Chojnowski	a1f7fabb3d	types: collection: add an optimization for single-fragment buffers in deserialize Helpers parametrized with single_fragmented_view should compile to better code, so let's use them when possible.	2020-12-04 09:21:05 +01:00
Michał Chojnowski	08c394726e	types: add an optimization for single-fragment buffers in deserialize Values usually come in a single fragment, but we pay the cost of fragmented deserialization nevertheless: bigger view objects (4 words instead of 2 words) more state to keep updated (i.e. total view size in addition to current fragment size) and more branches. This patch adds a special case for single-fragment buffers to abstract_type::deserialize. They are converted to a single_fragmented_view before doing anything else. Templates instantiated with single_fragmented_view should compile to better code than their multi-fragmented counterparts. If abstract_type::deserialize is inlined, this patch should completely prevent any performance penalties for switching from with_linearized to fragmented deserialization.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	f75db1fcf5	cql3: tuples: don't linearize in in_value::from_serialized We can deserialize directly from fragmented buffers now.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	68177a6721	cql3: expr: expression: replace with_linearize with linearized with_linearized creates an additional internal `bytes` when the input is fragmented. linearized copies the data directly to the output `bytes`, so it's more efficient.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	5ffe40d5a2	cql3: constants: remove unneeded uses of with_linearized We can deserialize directly from fragmented buffers now.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	3c98806df9	cql3: update_parameters: don't linearize in prefetch_data_builder::add_cell We can deserialize directly from fragmented buffers now.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	c43ef3951b	cql3: lists: remove unneeded use of with_linearized We can deserialize directly from fragmented buffers now.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	0d5c5b8645	query-result-set: don't linearize in result_set_builder::deserialize We can deserialize directly from fragmented buffers now.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	04786dee30	types: remove unneeded collection deserialization overloads Inherit the method from base class rather than reimplementing it in every child.	2020-12-04 09:19:39 +01:00
Michał Chojnowski	c08419e28d	types: switch collection_type_impl::deserialize from bytes_view to FragmentedView Devirtualizes collection_type_impl::deserialize (so it can be templated) and adds a FragmentedView overload. This will allow us to deserialize collections with explicit cql_serialization_format directly from fragmented buffers.	2020-12-04 09:19:37 +01:00
dgarcia360	1304f6a0bb	docs: fixed warnings docs: fixed warnings	2020-12-03 17:40:34 +01:00
dgarcia360	a340b46a79	docs: added theme	2020-12-03 17:37:18 +01:00
Michał Chojnowski	d731b34d95	cql3: sets: don't linearize in value::from_serialized We can deserialize directly from fragmented buffers now.	2020-12-03 10:57:07 +01:00
Michał Chojnowski	64e64fd2b3	cql3: lists: don't linearize in value::from_serialized We can deserialize directly from fragmented buffers now.	2020-12-03 10:57:07 +01:00
Michał Chojnowski	536a2f8c8d	cql3: maps: don't linearize in value::from_serialized We can deserialize directly from fragmented buffers now.	2020-12-03 10:57:07 +01:00
Michał Chojnowski	58d9f52363	types: remove unused deserialize_aux Dead code.	2020-12-03 10:57:07 +01:00
Michał Chojnowski	8440279130	types: deserialize: don't linearize tuple elements We can deserialize directly from fragmented buffers now.	2020-12-03 10:57:07 +01:00
Michał Chojnowski	a216b0545f	types: deserialize: don't linearize collection elements We can deserialize directly from fragmented buffers now.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	1ccdfc7a90	types: switch deserialize from bytes_view to FragmentedView The final part of the transition of deserialize from bytes_view to FragmentedView. Adds a FragmentedView overload to abstract_type::deserialize and switches deserialize_visitor from bytes_view to FragmentedView, allowing deserialization of all types with no intermediate linearization.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	898cea4cde	types: deserialize tuple types from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	507883f808	types: deserialize set type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	9b211a7285	types: deserialize map type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	5f1939554c	types: deserialize list type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	ad7ab73cd0	types: add FragmentedView versions of read_collection_size and read_collection_value We will need those to deserialize collections from FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	495bf5c431	types: deserialize varint type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	0f8ad89740	types: deserialize floating point types from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	0bb0291e50	types: deserialize decimal type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	760bc5fd60	types: deserialize duration type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	75a56f439b	types: deserialize IP address types from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	9f668929db	types: deserialize uuid types from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	3e1a24ca0d	types: deserialize timestamp type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	a4bc43ab19	types: deserialize simple date type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	24bd986aea	types: deserialize time type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	c03ad52513	types: deserialize boolean type from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	2f351928e2	types: deserialize integer types from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	28b727082f	types: deserialize string types from FragmentedView A part of the transition of deserialize from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	426308f526	types: remove unused read_simple_opt Dead code.	2020-12-03 10:57:06 +01:00
Michał Chojnowski	e1145fe410	types: implement read_simple* versions for FragmentedView We will need those to switch deserialize() from bytes_view to FragmentedView.	2020-12-03 10:57:06 +01:00
Botond Dénes	71722d8b41	frozen_mutation: add partition context to errors coming from deserializing	2020-12-02 15:08:49 +02:00
Botond Dénes	8d944ff755	partition_builder: accept_row(): use append_clustering_row() The partition builder doesn't expect the looked-up row to exist. In fact it already existing is a sign of a bug. Currently bugs resulting in duplicate rows will manifest by tripping an assert in `row::append_cell()`. This however results in poor diagnostics, so we want to catch these errors sooner to be able to provide higher level diagnostics. To this end, switch to the freshly introduced `append_clustering_row()` so that duplicate rows are found early and in a context where their identity is known.	2020-12-02 15:08:49 +02:00
Botond Dénes	63ea36e277	mutation_partition: add append_clustered_row() A variant of `clutered_row()` which throws if the row already exists, or if any greater row already exists.	2020-12-02 15:08:32 +02:00
Benny Halevy	c7311d1080	docs: sstable-scylla-format: document large_data_type in more details This adds details about large_data_type on top of `ca5184052d` and introduces structured indentation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201202110539.634880-1-bhalevy@scylladb.com>	2020-12-02 13:25:49 +02:00
Avi Kivity	a95c2a946c	Merge 'mutation_reader: introduce clustering_order_reader_merger' from Kamil Braun This abstraction is used to merge the output of multiple readers, each opened for a single partition query, into a non-decreasing stream of mutation_fragments. It is similar to `mutation_reader_merger`, but an important difference is that the new merger may select new readers in the middle of a partition after it already returned some fragments from that partition. It uses the new `position_reader_queue` abstraction to select new readers. It doesn't support multi-partition (ring range) queries. The new merger will be later used when reading from sstable sets created by TimeWindowCompactionStrategy. This strategy creates many sstables that are mostly disjoint w.r.t the contained clustering keys, so we can delay opening sstable readers when querying a partition until after we have processed all mutation fragments with positions before the keys contained by these sstables. A microbenchmark was added that compares the existing combining reader (which uses `mutation_reader_merger` underneath) with a new combining reader built using the new `clustering_order_reader_merger` and a simple queue of readers that returns readers from some supplied set. The used set of readers is built from the following ranges of keys (each range corresponds to a single reader): `[0, 31]`, `[30, 61]`, `[60, 91]`, `[90, 121]`, `[120, 151]`. The microbenchmark runs the reader and divides the result by the number of mutation fragments. The results on my laptop were: ``` $ build/release/test/perf/perf_mutation_readers -t clustering_combined.* -r 10 single run iterations: 0 single run duration: 1.000s number of runs: 10 test iterations median mad min max clustering_combined.ranges_generic 2911678 117.598ns 0.685ns 116.175ns 119.482ns clustering_combined.ranges_specialized 3005618 111.015ns 0.349ns 110.063ns 111.840ns ``` `ranges_generic` denotes the existing combining reader, `ranges_specialized` denotes the new reader. Split from https://github.com/scylladb/scylla/pull/7437. Closes #7688 * github.com:scylladb/scylla: tests: mutation_source_test for clustering_order_reader_merger perf: microbenchmark for clustering_order_reader_merger mutation_reader_test: test clustering_order_reader_merger in memory test: generalize `random_subset` and move to header mutation_reader: introduce clustering_order_reader_merger	2020-12-02 12:15:35 +02:00
Kamil Braun	502ed2e9f7	tests: mutation_source_test for clustering_order_reader_merger	2020-12-02 11:13:58 +01:00
Nadav Har'El	fae2ba60e9	cql-pytest: start to port Cassandra's CQL unit tests In issue #7722, it was suggested that we should port Cassandra's CQL unit tests into our own repository, by translating the Java tests into Python using the new cql-pytest framework. Cassandra's CQL unit test framework is orders of magnitude faster than dtest, and in-tree, so Cassandra have been moving many CQL correctness tests there, and we can also benefit from their test cases. In this patch, we take the first step in a long journey: 1. I created a subdirectory, test/cql-pytest/cassandra_tests, where all the translated Cassandra tests will reside. The structure of this directory will mirror that of the test/unit/org/apache/cassandra/cql3 directory in the Cassandra repository. pytest conveniently looks for test files recursively, so when all the cql-pytest are run, the cassandra_tests files will be run as well. As usual, one can also run only a subset of all the tests, e.g., "test/cql-pytest/run -vs cassandra_tests" runs only the tests in the cassandra_tests subdirectory (and its subdirectories). 2. I translated into Python two of the smallest test files - validation/entities/{TimeuuidTest,DataTypeTest}.java - containing just three test functions. The plan is to translate entire Java test files one by one, and to mirror their original location in our own repository, so it will be easier to remember what we already translated and what remains to be done. 3. I created a small library, porting.py, of functions which resemble the common functions of the Java tests (CQLTester.java). These functions aim to make porting the tests easier. Despite the resemblence, the ported code is not 100% identical (of course) and some effort is still required in this porting. As we continue this porting effort, we'll probably need more of these functions, can can also continue to improve them to reduce the porting effort. Refs #7722. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201201192142.2285582-1-nyh@scylladb.com>	2020-12-02 09:29:22 +01:00
Avi Kivity	77466177ab	Merge 'Use large_data_counters in scylla_metadata to decide when to delete large_data records' from Benny Halevy This series introduces a `large_data_counters` element to `scylla_metadata` component to explicitly count the number of `large_{partitions,rows,cells}` and `too_many_rows` in the sstable. These are accounted for in the sstable writer whenever the respective large data entry is encountered. It is taken into account in `large_data_handler::maybe_delete_large_data_entries`, when engaged. Otherwise, if deleting a legacy sstable that has no such entry in `scylla_metadata`, just revert to using the current method of comparing the sstable's `data_size` to the various thresholds. Fixes #7668 Test: unit(dev) Dtest: wide_rows_test.py (in progress) Closes #7669 * github.com:scylladb/scylla: docs: sstable-scylla-format: add large_data_stats subcomponent large_data_handler: maybe_delete_large_data_entries: use sstable large data stats large_data_handler: maybe_delete_large_data_entries: accept shared_sstable large_data_handler: maybe_delete_large_data_entries: move out of line sstables: load large_data_stats from scylla_metadata sstables: store large_data_stats in scylla_metadata sstables: writer: keep track of large data stats large_data_handler: expose methods to get threshold sstables: kl/writer: never record too many rows large_data_handler: indicate recording of large data entries large_data_handler: move constructor out of line	2020-12-02 10:08:18 +02:00
Nadav Har'El	5c08489569	cql-pytest: don't run tests if Scylla boot timed out In test/cql-pytest/run.py we have a 200 second timeout to boot Scylla. I never expected to reach this timeout - it normally takes (in dev build mode) around 2 seconds, but in one run on Jenkins we did reach it. It turns out that the code does not recognize this timeout correctly, thought that Scylla booted correctly - and then failed all the subtests when they fail to connect to Scylla. This patch fixes the timeout logic. After the timeout, if Scylla's CQL port is still not responsive, the test run is failed - without trying to run many individual tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201201150927.2272077-1-nyh@scylladb.com>	2020-12-02 08:48:44 +02:00
Kamil Braun	2da723b9c8	cdc: produce postimage when inserting with no regular columns When a row was inserted into a table with no regular columns, and no such row existed in the first place, postimage would not be produced. Fix this. Fixes #7716. Closes #7723	2020-12-01 18:01:23 +02:00
Benny Halevy	ca5184052d	docs: sstable-scylla-format: add large_data_stats subcomponent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00
Benny Halevy	4406a2514e	large_data_handler: maybe_delete_large_data_entries: use sstable large data stats If the sstable has scylla_metadata::large_data_stats use them to determine whether to delete the corresponding large data records. Otherwise, defer to the current method of comparing the sstable data_size to the respective thresholds. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00
Benny Halevy	8cebe7776f	large_data_handler: maybe_delete_large_data_entries: accept shared_sstable Since the actual deletion if the large data entries is done in the background, and we don't captures the shared_sstable, we can safely pass it to maybe_delete_large_data_entries when deleting the sstable in sstable::unlink and it will be release as soon as maybe_delete_large_data_entries returns. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00
Benny Halevy	f7d0ae3d10	large_data_handler: maybe_delete_large_data_entries: move out of line It is called on the cold path, when the sstable is deleted. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00
Benny Halevy	be4a58c34c	sstables: load large_data_stats from scylla_metadata Load the large data stats from the scylla_metadata component if they are present. Otherwise, if we're opening a legacy sstable that has scylla_metadata_type::LargeDataStats, leave sstable::_large_data_stats disengaged. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00
Benny Halevy	92443ed71c	sstables: store large_data_stats in scylla_metadata Store the large data statistics in the scylla_metadata component. These will be retrieved when loading the sstable and be used for determining whether to delete the corresponding large data entries upon sstable deletion. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00
Benny Halevy	79c19a166c	sstables: writer: keep track of large data stats In the next patch, this is will be written to the sstable's scylla_metadata component. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:41 +02:00
Benny Halevy	8ab053bd44	large_data_handler: expose methods to get threshold To be used for keeping large_data statistics in sstable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:18:14 +02:00
Benny Halevy	f1257dfdc0	sstables: kl/writer: never record too many rows rows_count is not tracked prior to the mc format. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:18:14 +02:00
Benny Halevy	dd7422a713	large_data_handler: indicate recording of large data entries Return true from the maybe_{record,log}_* methods if a large data record or log entry were emitted. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:18:14 +02:00
Benny Halevy	873107821b	large_data_handler: move constructor out of line No need for it to be inlined. Also, add debug logging to the large data handler options. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:18:14 +02:00
Dejan Mircevski	e45af3b9b8	index: Ensure restriction is supported in find_idx Previously, statement_restrictions::find_idx() would happily return an index for a non-EQ restriction (because it checked only the column name, not the operator). This is incorrect: when the selected index is for a non-EQ restriction, it is impossible to query that index table. Fixes #7659. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #7665	2020-12-01 15:16:48 +02:00
Avi Kivity	df572b41ae	Update seastar submodule * seastar 010fb0df1e...8b400c7b45 (6): > append_challenged_posix_file_impl::read_dma: allow iovec to cross _logical_size > Merge "Extend per task-queue timing statistics" from Pavel E > tls_test: Create test certs at build time > cook: upgrade hwloc version > memory: rate-limit diagnostics messages > util/log: add rate-limited version of writer version of log()	2020-12-01 15:12:25 +02:00
Tomasz Grabiec	0c5d23d274	thrift: Validate cell names when constructing clustering keys Currently, if the user provides a cell name with too many components, we will accept it and construct an invalid clusterin key. This may result in undefined behavior down the stream. It was caught by ASAN in a debug build when executing dtest cql_tests.py:MiscellaneousCQLTester.cql3_insert_thrift_test with nodetool flush manually added after the write. Triggered during sstable writing to an MC-format sstable: seastar::shared_ptr<abstract_type const>::operator*() const at ././seastar/include/seastar/core/shared_ptr.hh:577 sstables::mc::clustering_blocks_input_range::next() const at ./sstables/mx/writer.cc:180 To prevent corrupting the state in this way, we should fail early. This patch addds validation which will fail thrift requests which attempt to create invalid clustering keys. Fixes #7568. Example error: Internal server error: Cell name of ks.test has too many components, expected 1 got 2 in 0x0004000000040000017600 Message-Id: <1605550477-24810-1-git-send-email-tgrabiec@scylladb.com>	2020-12-01 15:12:08 +02:00
Avi Kivity	2fd895a367	Merge 'dist/common/scripts/scylla_setup: Optionally config rsyslog destination' from Amnon Heiman This patch adds an option to scylla_setup to configure an rsyslog destination. The monitoring stack has an option to get information from rsyslog it requires that rsyslog on the scylla machines will send the trace line to it. The configuration will be in a Scylla configuration file, so it is safe to run it multiple times. Fixes #7589 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes #7634 * github.com:scylladb/scylla: dist/common/scripts/scylla_setup: Optionally config rsyslog destination Adding dist/common/scripts/scylla_rsyslog_setup utility	2020-12-01 13:12:32 +02:00
Amnon Heiman	4036cecdea	dist/common/scripts/scylla_setup: Optionally config rsyslog destination This patch adds an option to scylla_setup to configure an rsyslog destination. The monitoring stack has an option to get information from rsyslog, it requires that rsyslog on the scylla machines will send the trace line to it. If the /etc/rsyslog.d/ directory exists (that means the current system runs rsyslog) it will ask if to add rsyslog configuration and if yes, it would run scylla_rsyslog_setup. Fixes #7589 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-12-01 12:33:37 +02:00
Takuya ASADA	572d6b2a4e	scylla_setup: fix wrong command suggestion on --io-setup scylla_setup command suggestion does not shows an argument of --io-setup, because we mistakely stores bool value on it (recognized as 'store_true'). We always need to print '--io-setup X' on the suggestion instead. Related #7395	2020-12-01 07:23:55 +09:00
Tomasz Grabiec	f8f81ec322	Merge "raft: various snapshot fixes" from Gleb * scylla-dev/snapshot_fixes_v1: raft: ignore append_reply from a peer in SNAPSHOT state raft: Ignore outdated snapshots raft: set next_idx to correct value after snapshot transfer	2020-11-30 21:34:31 +01:00
Alejo Sanchez	72a64b05ea	raft: replication test: fix total entries for initial snapshot Since now total expected entries are updated by load snapshot, do not trim the total entries expected values with the initial snapshot on test state machine initialization. reported by @gleb Branch URL: https://github.com/alecco/scylla/tree/raft-ale-tests-06-snapshot-total-entries Tests: unit ({dev}), unit ({debug}), unit ({release}) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20201125171232.321992-1-alejo.sanchez@scylladb.com>	2020-11-30 21:34:31 +01:00
Kamil Braun	af49a95627	perf: microbenchmark for clustering_order_reader_merger	2020-11-30 11:55:44 +01:00
Kamil Braun	4f7e2bf920	mutation_reader_test: test clustering_order_reader_merger in memory	2020-11-30 11:55:44 +01:00
Kamil Braun	b22aa6dbde	test: generalize `random_subset` and move to header	2020-11-30 11:55:44 +01:00
Kamil Braun	0b36c5e116	mutation_reader: introduce clustering_order_reader_merger This abstraction is used to merge the output of multiple readers, each opened for a single partition query, into a non-decreasing stream of mutation_fragments. It is similar to `mutation_reader_merger`, an important difference is that the new merger may select new readers in the middle of a partition after it already returned some fragments from that partition. It uses the new `position_reader_queue` abstraction to select new readers. It doesn't support multi-partition (ring range) queries. The new merger will be later used when reading from sstable sets created by TimeWindowCompactionStrategy. This strategy creates many sstables that are mostly disjoint w.r.t the contained clustering keys, so we can delay opening sstable readers when querying a partition until after we have processed all mutation fragments with positions before the keys contained by these sstables.	2020-11-30 11:55:44 +01:00
Avi Kivity	ea9c058be3	Merge 'Don't use secondary indices for multi-column restrictions' from Dejan Mircevski Fix #7680 by never using secondary index for multi-column restrictions. Modify expr::is_supported_by() to handle multi-column correctly. Tests: unit (dev) Closes #7699 * github.com:scylladb/scylla: cql3/expr: Clarify multi-column doesn't use indexing cql3: Don't use index for multi-column restrictions test: Add eventually_require_rows	2020-11-30 12:38:26 +02:00
Avi Kivity	12c20c4101	Merge 'test/cql-pytest: tests for string validation (UTF-8 and ASCII)' from Nadav Har'El The first two patches in this series are small improvements to cql-pytest to prepare for the third and main patch. This third patch adds cql-pytest tests which check that we fail CQL queries that try to inject non-ASCII and non-UTF-8 strings for ascii and text columns, respectively. The tests do not discover any unknown bug in Scylla, however, they do show that Scylla is more strict in its definition of "valid UTF-8" compared to Cassandra. Closes #7719 * github.com:scylladb/scylla: test/cql-pytest: add tests for validation of inserted strings test/cql-pytest: add "scylla_only" fixture test/cpy-pytest: enable experimental features	2020-11-30 12:26:25 +02:00
Piotr Wojtczak	3560acd311	cql_metrics: Add metrics for CQL errors This change adds tracking of all the CQL errors that can be raised in response to a CQL message from a client, as described in the CQL v4 protocol and with Scylla's CDC_WRITE_FAILUREs included. Fixes #5859 Closes #7604	2020-11-30 12:18:37 +02:00
Takuya ASADA	6238d105d9	dist/redhat: drop Conflicts with older kernel We have "Conflicts: kernel < 3.10.0-514" on rpm package to make sure the environment is running newer kernel. However, user may use non-standard kernel which has different package name, like kernel-ml or kernel-uek. On such environment Conflicts tag does not works correctly. Even the system running with newer kernel, rpm only checks "kernel" package version number. To avoid such issue, we need to drop Conflicts tag. Fixes #7675	2020-11-30 11:38:42 +02:00
Nadav Har'El	48c78ade33	test/cql-pytest: add tests for validation of inserted strings This patch adds comprehensive cql-pytest tests for checking the validation of strings - ASCII or UTF-8 - in CQL. Strings can be represented in CQL using several methods - a strings can be a string literal as part of the statement, can be encoded as a blob (0x...), or can be a binding parameter for a prepared statement, or returned by user-defined functions - and these tests check all of them. We already have low-level unit tests for UTF-8 parsing in test/boost/utf8_test.cc, but the new tests here confirms that we really call these low-level functions in the correct way. Moreover, since these are CQL tests, they can also be run against Cassandra, and doing that demonstrated that Scylla's UTF-8 parsing is stricter than Cassandra's - Scylla's UTF-8 parser rejects the following sequences which Cassandra's accepts: 1. \xC0\x80 as another non-minimal representation of null. Note that other non-minimal encodings are rejected by Cassandra, as expected. 2. Characters beyond the official Unicode range (or what Scylla considers the end of the range). 3. UTF-16 surrogates - these are not considered valid UTF-8, but Cassandra accepts them, and Scylla does not. In the future, we should consider whether Scylla is more correct than Cassandra here (so we're fine), or whether compatibility is more important than correctness (so this exposed a bug). The ASCII tests reproduces issue #5421 - that trying to insert a non-ASCII string into an "ascii" column should produce an error on insert - not later when fetching the string. This test now passes, because issue 5421 was already fixed. These tests did not exposed any bug in Scylla (other than the differences with Cassandra mentioned a bug), so all of them pass on Scylla. Two of the tests fail on Cassandra, because Cassandra does not recognize some invalid UTF-8 (according to Scylla's definition) as invalid. Refs #5421. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-11-29 17:43:20 +02:00
Dejan Mircevski	5bc7e31284	restrictions: Forbid mixing ck=0 and (ck)=(0) Reject the previously accepted case where the multi-column restriction applied to just a single column, as it causes a crash downstream. The user can drop the parentheses to avoid the rejection. Fixes #7710 Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #7712	2020-11-29 17:06:41 +02:00
Avi Kivity	0584db1eb3	Merge "Unstall cleanup_compaction::get_ranges_for_invalidation" from Benny " This series adds maybe_yield called from cleanup_compaction::get_ranges_for_invalidation to avoid reactor stalls. To achieve that, we first extract bool_class can_yield to utils/maybe_yield.hh, and add a convience helper: utils::maybe_yield(can_yield) that conditionally calls seastar::thread::maybe_yield if it can (when called in a seastar thread). With that, we add a can_yield parameter to dht::to_partition_ranges and dht::partition_range::deoverlap (defaults to false), and use it from cleanup_compaction::get_ranges_for_invalidation, as the latter is always called from `consume_in_thread`. Fixes #7674 Test: unit(dev) " * tag 'unstall-get_ranges_for_invalidation-v2' of github.com:bhalevy/scylla: compaction: cleanup_compaction: get_ranges_for_invalidation: add yield points dht/i_partitioner: to_partition_ranges: support yielding locator: extract can_yield to utils/maybe_yield.hh	2020-11-29 14:10:39 +02:00
Asias He	0a3a2a82e1	api: Add force_remove_endpoint for gossip It is used to force remove a node from gossip membership if something goes wrong. Note: run the force_remove_endpoint api at the same time on _all_ the nodes in the cluster in order to prevent the removed nodes come back. Becasue nodes without running the force_remove_endpoint api cmd can gossip around the removed node information to other nodes in 2 * ring_delay (2 * 30 seconds by default) time. For instance, in a 3 nodes cluster, node 3 is decommissioned, to remove node 3 from gossip membership prior the auto removal (3 days by default), run the api cmd on both node 1 and node 2 at the same time. $ curl -X POST --header "Accept: application/json" "http://127.0.0.1:10000/gossiper/force_remove_endpoint/127.0.0.3" $ curl -X POST --header "Accept: application/json" "http://127.0.0.2:10000/gossiper/force_remove_endpoint/127.0.0.3" Then run 'nodetool gossipinfo' on all the nodes to check the removed nodes are not present. Fixes #2134 Closes #5436	2020-11-29 13:58:46 +02:00
Nadav Har'El	0864933d4d	test/cql-pytest: add "scylla_only" fixture This patch adds a fixture "scylla_only" which can be used to mark tests for Scylla-specific features. These tests are skipped when running against other CQL servers - like Apache Cassandra. We recognize Scylla by looking at whether any system table exists with the name "scylla" in its name - Scylla has several of those, and Cassandra has none. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-11-29 10:18:58 +02:00
Nadav Har'El	91ccb2afb5	test/cpy-pytest: enable experimental features Enable experimental features, and in particular UDF, so we can test those features in our tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-11-29 10:18:58 +02:00
Michał Chojnowski	fcb258cb01	utils: fragmented_temporary_buffer: implement FragmentedView for view fragmented_temporary_buffer::view is one of the types we want to directly deserialize from.	2020-11-27 15:26:13 +01:00
Michał Chojnowski	f6cc2b6a48	utils: fragment_range: add single_fragmented_view bytes_view is one of the types we want to deserialize from (at least for now), so we want to be able to pass it to deserialize() after it's transitioned to FragmentView. single_fragmented_view is a wrapper implementing FragmentedView for bytes_view. It's constructed from bytes_view explicitly, because it's typically used in context where we want to phase linearization (and by extension, bytes_view) out.	2020-11-27 15:26:13 +01:00
Michał Chojnowski	0b20c7ef65	serializer: implement FragmentedView for buffer_view buffer_view is one of the types we want to directly deserialize from.	2020-11-27 15:26:13 +01:00
Michał Chojnowski	2008c0f62f	utils: fragment_range: add linearized and with_linearized for FragmentedView We would like those helpers to disappear one day but for now we still need them until everything can handle fragmented buffers.	2020-11-27 15:26:13 +01:00
Michał Chojnowski	fc90bd5190	utils: fragment_range: add FragmentedView This patch introduces FragmentedView - a concept intented as a general-purpose interface for fragmented buffers. Another concept made for this purpose, FragmentedRange, already exists in the codebase. However, it's unwieldy. The iterator-based design of FragmentRange is harder to implement and requires more code, but more importantly it makes FragmentRange immutable. Usually we want to read the beginning of the buffer and pass the rest of it elsewhere. This is impossible with FragmentRange. FragmentedView can do everything FragmentRange can do and more, except for playing nicely with iterator-based collection methods, but those are useless for fragmented buffers anyway.	2020-11-27 15:26:13 +01:00
Lubos Kosco	4d0587ed11	scylla_util.py: fix metadata gcp call for disks to get details disk parsing expects output from recursive listing of GCP metadata REST call, the method used to do it by default, but now it requires a boolean flag to run in recursive mode Fixes #7684 Closes #7685	2020-11-27 15:20:56 +02:00
Pekka Enberg	c84754a634	Update tools/java submodule * tools/java ad48b44a26...8080009794 (1): > sstableloader: Fix command line parsing of "ignore-missing-columns"	2020-11-27 15:19:48 +02:00
Avi Kivity	390e07d591	dist: sysctl: configure more inotify instances Since `f3bcd4d205` ("Merge 'Support SSL Certificate Hot Reloading' from Calle"), we reload certificates as they are modified on disk. This uses inotify, which is limited by a sysctl fs.inotify.max_user_instances, with a default of 128. This is enough for 64 shards only, if both rpc and cql are encrypted; above that startup fails. Increase to 1200, which is enough for 6 instances * 200 shards. Fixes #7700. Closes #7701	2020-11-26 23:44:48 +02:00
Takuya ASADA	5f81f97773	install.sh: apply sysctl.d files on non-packaging installation We don't apply sysctl.d files on non-packaging installation, apply them just like rpm/deb taking care of that. Fixes #7702 Closes #7705	2020-11-26 09:52:14 +02:00
Takuya ASADA	ba4d54efa3	dist/redhat: packaging dependencies.conf as normal file, not ghost When we introduced dependencies.conf, we mistakenly added it on rpm as %ghost, but it should be normal file, should be installed normally on package installation. Fixes #7703 Closes #7704	2020-11-26 09:50:05 +02:00
Dejan Mircevski	7f8ed811c1	cql3/expr: Clarify multi-column doesn't use indexing Although not currently used, the old code was wrong and confusing to readers. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-11-25 10:59:13 -05:00
Avi Kivity	956f031a68	Merge 'Add missing shaded<>::stop in exceptional startup code for CQL/redis' from Calle Wilund Fixes #7211 If we start a sharded<> object, then proceed to do potentially exceptional stuff, we should destroy it on said exception. Otherwise, the exception propagation will abort on RAII destruction of the sharded<>. And we get no exception logging. Closes #7697 * github.com:scylladb/scylla: redis::service: Shut down sharded<> subobject on startup exception transport::controller: Shut down distributed object on startup exception	2020-11-25 17:57:53 +02:00
Calle Wilund	55acf09662	redis::service: Shut down sharded<> subobject on startup exception Refs #7211 If we start a sharded<> object, then proceed to do potentially exceptional stuff, we should destroy it on said exception. Otherwise, the exception propagation will abort on RAII destruction of the sharded<>. And we get no exception logging.	2020-11-25 15:52:47 +00:00
Calle Wilund	ae4d5a60ca	transport::controller: Shut down distributed object on startup exception Fixes #7211 If we start a sharded<> object, then proceed to do potentially exceptional stuff, we should destroy it on said exception. Otherwise, the exception propagation will abort on RAII destruction of the sharded<>. And we get no exception logging.	2020-11-25 15:52:47 +00:00
Dejan Mircevski	db63b40347	cql3: Don't use index for multi-column restrictions The downstream code expects a single-column restriction when using an index. We could fix it, but we'd still have to filter the rows fetched from the index table, unlike the code that queries the base table directly. For instance, WHERE (c1,c2,c3) = (1,2,3) with an index on c3 can fetch just the right rows from the base table but all the c3=3 rows from the index table. Fixes #7680 Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-11-25 10:39:04 -05:00
Dejan Mircevski	ab7aa57b24	test: Add eventually_require_rows Makes it easier to combine eventually{assert_that} with useful error messages. Refs #7573. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-11-25 10:34:44 -05:00
Benny Halevy	e1fe1f18c7	compaction: cleanup_compaction: get_ranges_for_invalidation: add yield points Avoid reactor stalls by allowing yielding in long-running loops as seen in #7674. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-25 13:46:32 +02:00
Gleb Natapov	be6119b350	raft: ignore append_reply from a peer in SNAPSHOT state If append_reply is received from a node that currently gets snapshot transferred to it ignore it, it is a stray reply.	2020-11-25 12:36:41 +02:00
Gleb Natapov	851e3000c4	raft: Ignore outdated snapshots Do not try to install snapshots that are older than current one.	2020-11-25 12:36:41 +02:00
Gleb Natapov	2ce9473037	raft: set next_idx to correct value after snapshot transfer After snapshot is transferred progress::next_idx is set to its index, but the code uses current snapshot to set it instead of the snapshot that was transferred. Those can be different snapshots.	2020-11-25 11:34:49 +02:00
Tomasz Grabiec	0aa1f7c70a	Merge "raft: fix replication if existing log on leader" from Gleb * scylla-dev/add_dummy_v2: raft: test: replication works on leader change without adding an entry raft: commit a dummy entry after leader change raft: test: fix snapshot correctness check sstables: add `may_have_partition_tombstones` method	2020-11-24 11:35:18 +01:00
Gleb Natapov	51d1d20687	raft: test: replication works on leader change without adding an entry Check that a newly elected leader commits all the entries in its log without waiting for more entries to be submitted.	2020-11-24 11:35:18 +01:00
Gleb Natapov	6130fb8b39	raft: commit a dummy entry after leader change After a node becomes leader it needs to do two things: send an append message to establish its leadership and commit one entry to make sure all previous entries with smaller terms are committed as well.	2020-11-24 11:35:18 +01:00
Gleb Natapov	e3a886738b	raft: test: fix snapshot correctness check Snapshot index cannot be used to check snapshot correctness since some entries may not be command and thus do not affect snapshot value. Lest use applied entries count instead.	2020-11-24 11:35:18 +01:00
Benny Halevy	37e971ad87	dht/i_partitioner: to_partition_ranges: support yielding Allow yielding to prevent reactor stalls when called with a long vector of ranges. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-24 12:23:56 +02:00
Benny Halevy	157a964a63	locator: extract can_yield to utils/maybe_yield.hh Move the definition of bool_class can_yield to a standalone header file and define there a maybe_yield(can_yield) helper. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-24 12:23:56 +02:00
Asias He	1b2155eb1d	repair: Use same description for the same metric In commit `9b28162f88` (repair: Use label for node ops metrics), we switched to use label for different node operations. We should use the same description for the same metric name. Fixes #7681 Closes #7682	2020-11-24 09:35:39 +02:00
Avi Kivity	e8ff77c05f	Merge 'sstables: a bunch of refactors' from Kamil Braun 1. sstables: move `sstable_set` implementations to a separate module All the implementations were kept in sstables/compaction_strategy.cc which is quite large even without them. `sstable_set` already had its own header file, now it gets its own implementation file. The declarations of implementation classes and interfaces (`sstable_set_impl`, `bag_sstable_set`, and so on) were also exposed in a header file, sstable_set_impl.hh, for the purposes of potential unit testing. 2. mutation_reader: move `mutation_reader::forwarding` to flat_mutation_reader.hh Files which need this definition won't have to include mutation_reader.hh, only flat_mutation_reader.hh (so the inclusions are in total smaller; mutation_reader.hh includes flat_mutation_reader.hh). 3. sstables: move sstable reader creation functions to `sstable_set` Lower level functions such as `create_single_key_sstable_reader` were made methods of `sstable_set`. The motivation is that each concrete sstable_set may decide to use a better sstable reading algorithm specific to the data structures used by this sstable_set. For this it needs to access the set's internals. A nice side effect is that we moved some code out of table.cc and database.hh which are huge files. 4. sstables: pass `ring_position` to `create_single_key_sstable_reader` instead of `partition_range`. It would be best to pass `partition_key` or `decorated_key` here. However, the implementation of this function needs a `partition_range` to pass into `sstable_set::select`, and `partition_range` must be constructed from `ring_position`s. We could create the `ring_position` internally from the key but that would involve a copy which we want to avoid. 5. sstable_set: refactor `filter_sstable_for_reader_by_pk` Introduce a `make_pk_filter` function, which given a ring position, returns a boolean function (a filter) that given a sstable, tells whether the sstable may contain rows with the given position. The logic has been extracted from `filter_sstable_for_reader_by_pk`. Split from #7437. Closes #7655 * github.com:scylladb/scylla: sstable_set: refactor filter_sstable_for_reader_by_pk sstables: pass ring_position to create_single_key_sstable_reader sstables: move sstable reader creation functions to `sstable_set` mutation_reader: move mutation_reader::forwarding to flat_mutation_reader.hh sstables: move sstable_set implementations to a separate module	2020-11-24 09:23:57 +02:00
Michał Chojnowski	9bceaac44c	utils: fragmented_temporary_buffer: fix view::remove_prefix This piece of logic was wrong for two unrelated reasons: 1. When fragmented_temporary_buffer::view is constructed from bytes_view, _current is null. When remove_prefix was used on such view, null pointer dereference happened. 2. It only worked for the first remove_prefix call. A second call would put a wrong value in _current_position.	2020-11-24 03:05:13 +01:00
Kamil Braun	d158921966	sstables: add `may_have_partition_tombstones` method For sstable versions greater or equal than md, the `min_max_column_names` sstable metadata gives a range of position-in-partitions such that all clustering rows stored in this sstable have positions in this range. Partition tombstones in this context are understood as covering the entire range of clustering keys; thus, if the sstable contains at least one partition tombstone, the sstable position range is set to be the range of all clustered rows. Therefore, by checking that the position range is not the range of all clustered rows we know that the sstable cannot have any partition tombstones. Closes #7678	2020-11-23 23:30:19 +02:00
Kamil Braun	72c59e8000	flat_mutation_reader: document assumption about fast_forward_to It is not legal to fast forward a reader before it enters a partition. One must ensure that there even is a partition in the first place. For this one must fetch a `partition_start` fragment. Closes #7679	2020-11-23 17:39:46 +01:00
Pavel Emelyanov	fea4a5492f	system-keyspace: Remove dead code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20201123151453.27341-1-xemul@scylladb.com>	2020-11-23 17:16:15 +02:00
Tomasz Grabiec	36f9da6420	Merge "raft: testing: snapshots and partitioning elections" from Alejo Fixes, features needed for testing, snapshot testing. Free election after partitioning (replication test) . * https://github.com/alecco/scylla/tree/raft-ale-tests-05e: raft: replication test: partitioning with leader raft: replication test: run free election after partitioning raft: expose fsm tick() to server for testing raft: expose is_leader() for testing raft: replication test: test take and load snapshot raft: fix a bug in leader election raft: fix default randomized timeout raft: replication test: fix custom next leader raft: replication test: custom next leader noop for same raft: replication test: fix failure detector for disconnected	2020-11-23 14:36:39 +01:00
Kamil Braun	6c8b0af505	sstable_set: refactor filter_sstable_for_reader_by_pk Introduce a `make_pk_filter` function, which given a ring position, returns a boolean function (a filter) that given a sstable, tells whether the sstable may contain rows with the given position. The logic has been extracted from `filter_sstable_for_reader_by_pk`.	2020-11-23 12:35:10 +01:00
Kamil Braun	68663d0de0	sstables: pass ring_position to create_single_key_sstable_reader instead of partition_range. It would be best to pass `partition_key` or `decorated_key` here. However, the implementation of this function needs a `partition_range` to pass into `sstable_set::select`, and `partition_range` must be constructed from `ring_position`s. We could create the `ring_position` internally from the key but that would involve a copy which we want to avoid.	2020-11-23 12:33:24 +01:00
Takuya ASADA	b90ddc12c9	scylla_prepare: add --tune system when SET_CLOCKSOURCE=yes perftune.py only run clocksource setup when --tune system specified, so we need to add it on the parameter when SET_CLOCKSOURCE=yes. Fixes #7672	2020-11-23 10:51:16 +02:00
Avi Kivity	f8e0517bc7	cql: do not advance timeouts on internal pages Currently, each internal page fetched during aggregating gets a timeout based on the time the page fetch was started, rather than the query start time. This means the query can continue processing long after the client has abandoned it due to its own timeout, which is based on the query start time. Fix by establishing the timeout once when the query starts, and not advancing it. Test: manual (SELECT count(*) FROM a large table). Fixes #1175. Closes #7662	2020-11-23 08:14:18 +01:00
Alejo Sanchez	1f8ca4e06d	raft: replication test: partitioning with leader For test simplicity support partition{leader{A},B,C,D} Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-22 22:39:00 -04:00
Avi Kivity	3eac976e24	build: remove non-C/C++ jobs from submodule_pools The C and C++ sub-builds were placed in submodule_pool to reduce concurrency, as they are memory intensive (well, at least the C++ jobs are), and we choose build concurrency based on memory. But the other submodules are not memory intensives, and certainly the packaging jobs are not (and they are single-threaded too). To allow these simple jobs to utilize multicores more efficiently, remove them from submodule_pool so they can run in parallel. Closes #7671	2020-11-23 00:32:41 +02:00
Avi Kivity	bcced9f56b	build: compress unified package faster The unified package is quite large (1GB compressed), and it is the last step in the build so its build time cannot be parallized with other tasks. Compress it with pigz to take advantage of multiple cores and speed up the build a little. Closes #7670	2020-11-23 00:31:04 +02:00
Takuya ASADA	3fefa520bd	dist/common/scripts: drop run() and out(), swtich to subprocess.run() We initially implemented run() and out() functions because we couldn't use subprocess.run() since we were on Python 3.4. But since we moved to relocatable python3, we don't need to implement it ourselves. Why we keep using these functions are, because we needed to set environemnt variable to set PATH. Since we recently moved away these codes to python thunk, we finally able to drop run() and out(), switch to subprocess.run().	2020-11-22 17:59:27 +02:00
Alejo Sanchez	f12fed0809	raft: replication test: run free election after partitioning When partitioning without keeping the existing leader, run an election without forcing a particular leader. To force a leader after partitioning, a test can just set it with new_leader{X}. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-22 10:32:34 -04:00
Alejo Sanchez	d610d5a7b8	raft: expose fsm tick() to server for testing For tests to advance servers they need to invoke tick(). This is needed to advance free elections. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-22 10:32:34 -04:00
Alejo Sanchez	9e7e14fc50	raft: expose is_leader() for testing Expose fsm leader check to allow tests to find out the leader after an election. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-22 10:32:34 -04:00
Alejo Sanchez	f4d0131f02	raft: replication test: test take and load snapshot Through configuration trigger automatic snapshotting. For now, handle expected log index within the test's state machine and pass it with snapshot_value (within the test file). Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-22 10:32:34 -04:00
Konstantin Osipov	bce8cb11a7	raft: fix a bug in leader election If a server responds favourably to RequestVote RPC, it should reset its election timer, otherwise it has very high chances of becoming a candidate with an even newer term, despite successful elections. A candidate with a term larger than the leader rejects AppendEntries RPCs and can not become a leader itself (because of protection against of disruptive leaders), so is stuck in this state.	2020-11-22 10:32:34 -04:00
Alejo Sanchez	08f8c418df	raft: fix default randomized timeout Range after election timeout should start at +1. This matches existing update_current_term() code adding dist(1, 2*n). Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-22 10:32:34 -04:00
Alejo Sanchez	ab3a8b7bcd	raft: replication test: fix custom next leader Adjustments after changes due to free election in partitioning and changes in the code. Elapse previous leader after isolating it. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-22 10:32:22 -04:00
Alejo Sanchez	3bff7d1d21	raft: replication test: custom next leader noop for same If custom specified leader is same do nothing. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-22 10:15:20 -04:00
Amnon Heiman	9e116d136e	Adding dist/common/scripts/scylla_rsyslog_setup utility scylla_rsyslog_setup adds a configuration file to rsyslog to forward the trances to a remote server. It will override any existing file, so it is safe to run it multiple times. It takes an ip, or ip and port from the users for that configuration, if no port is provided, the default port of Scylla-Monitoring promtail is used. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-11-22 15:48:48 +02:00
Avi Kivity	1e170ebfc1	Merge 'Changing hints configuration followup' from Piotr Dulikowski Follow-up to https://github.com/scylladb/scylla/pull/6916. - Fixes wrong usage of `resource_manager::prepare_per_device_limits`, - Improves locking in `resource_manager` so that it is more safe to call its methods concurrently, - Adds comments around `resource_manager::register_manager` so that it's more clear what this method does and why. Closes #7660 * github.com:scylladb/scylla: hints/resource_manager: add comments to register_manager hints/resource_manager: fix indentation hints/resource_manager: improve mutual exclusion hints/resource_manager: correct prepare_per_device_limits usage	2020-11-22 15:06:35 +02:00
Alejo Sanchez	1436e4a323	raft: replication test: fix failure detector for disconnected For a disconnected server all other servers is_alive() is false. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-22 09:04:58 -04:00
Pekka Enberg	2c8dcbe5c5	reloc: Remove "build_reloc.sh" script as obsolete The "ninja dist-server-tar" command is a full replacement for "build_reloc.sh" script. We release engineering infrastructure has been switched to ninja, so let's remove "build_reloc.sh" as obsolete.	2020-11-20 22:41:26 +02:00
Piotr Sarna	5a9dc6a3cc	Merge 'Cleanup CDC tests after CDC became GA' from Piotr Jastrzębski Now that CDC is GA, it should be enabled in all the tests by default. To achieve that the PR adds a special db::config::add_cdc_extension() helper which is used in cql_test_envm to make sure CDC is usable in all the tests that use cql_test_env.m As a result, cdc_tests can be simplified. Finally, some trailing whitespaces are removed from cdc_tests. Tests: unit(dev) Closes #7657 * github.com:scylladb/scylla: cdc: Remove trailing whitespaces from cdc_tests cdc: Remove mk_cdc_test_config from tests config: Add add_cdc_extension function for testing cdc: Add missing includes to cdc_extension.hh	2020-11-20 13:56:29 +01:00
Konstantin Osipov	269c049a16	test.py: enable back CQL based tests The patch which introduces build-dependent testing has a regression: it quietly filters out all tests which are not part of ninja output. Since ninja doesn't build any CQL tests (including CQL-pytest), all such tests were quietly disabled. Fix the regression by only doing the filtering in unit and boost test suites. test: dev (unit), dev + --build-raft Message-Id: <20201119224008.185250-1-kostja@scylladb.com>	2020-11-20 11:45:15 +02:00
Pekka Enberg	6a04ae69a2	Update seastar submodule * seastar c861dbfb...010fb0df (3): > build: clean up after failed -fconcepts detection > logger: issue std::endl to output stream > util/log: improve discoverability of log rate-limiting	2020-11-20 11:43:11 +02:00
Avi Kivity	82b508250e	tools: toolchain: dbuild: don't confine with seccomp Some systems (at least, Centos 7, aarch64) block the membarrier() syscall via seccomp. This causes Scylla or unit tests to burn cpu instead of sleeping when there is nothing to do. Fix by instructing podman/docker not to block any syscalls. I tested this with podman, and it appears [1] to be supported on docker. [1] https://docs.docker.com/engine/security/seccomp/#run-without-the-default-seccomp-profile Closes #7661	2020-11-20 09:11:52 +02:00
Kamil Braun	40d8bfa394	sstables: move sstable reader creation functions to `sstable_set` Lower level functions such as `create_single_key_sstable_reader` were made methods of `sstable_set`. The motivation is that each concrete sstable_set may decide to use a better sstable reading algorithm specific to the data structures used by this sstable_set. For this it needs to access the set's internals. A nice side effect is that we moved some code out of table.cc and database.hh which are huge files.	2020-11-19 17:52:39 +01:00
Kamil Braun	708093884c	mutation_reader: move mutation_reader::forwarding to flat_mutation_reader.hh Files which need this definition won't have to include mutation_reader.hh, only flat_mutation_reader.hh (so the inclusions are in total smaller; mutation_reader.hh includes flat_mutation_reader.hh).	2020-11-19 17:52:39 +01:00
Kamil Braun	b02b441c2e	sstables: move sstable_set implementations to a separate module All the implementations were kept in sstables/compaction_strategy.cc which is quite large even without them. `sstable_set` already had its own header file, now it gets its own implementation file. The declarations of implementation classes and interfaces (`sstable_set_impl`, `bag_sstable_set`, and so on) were also exposed in a header file, sstable_set_impl.hh, for the purposes of potential unit testing.	2020-11-19 17:52:37 +01:00
Avi Kivity	70689088fd	Merge "Remove reference on database from global qctx" from Pavel E " The qctx is global object that references query processor and database to let the rest of the code query system keyspace. As the first step of de-globalizing it -- remove the database reference from it. After the set the qctx remains a simple wrapper over the query processor (which is already de-globalized) and the query processor in turn is mostly needed only to parse the query string into prepared statement only. This, in turn, makes it possible to remove the qctx later by parsing the query strings on boot and carrying _them_ around, not the qctx itself. tests: unit(dev), dtest(simple_cluster_driver_test:dev), manual start/stop " * 'br-remove-database-from-qctx' of https://github.com/xemul/scylla: query-context: Remove database from qctx schema-tables: Use query processor referece in save_system(_keyspace)?_schema system-keyspace: Rewrite force_blocking_flush system-keyspace: Use cluster_name string in check_health system-keyspace: Use db::config in setup_version query-context: Kill global helpers test: Use cql_test_env::evecute_cql instead of qctx version code: Use qctx::evecute_cql methods, not global ones system-keyspace: Do not call minimal_setup for the 2nd time system-keyspace: Fix indentation after previous patch system-keyspace: Do not do invoke_on_all by hands system-keyspace: Remove dead code	2020-11-19 18:31:51 +02:00
Pavel Emelyanov	689fd029a1	query-context: Remove database from qctx No users of qctx::db are left. One global database reference less. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	464c8990d4	schema-tables: Use query processor referece in save_system(_keyspace)?_schema The save_system_schema and save_system_keyspace_schema are both called on start and can the needed get query processor reference from arguments. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	66dcc47571	system-keyspace: Rewrite force_blocking_flush The method is called after query_processor::execute_internal to flush the cf. Encapsulating this flush inside database and getting the database from query_processor lets removing database reference from global qctx object. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	6cad18ad33	system-keyspace: Use cluster_name string in check_health The check_help needs global qctx to get db.config.cluster_name, which is already available at the caller side. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	36a3ee6ad4	system-keyspace: Use db::config in setup_version This is the beginning of de-globalizing global qctx thing. The setup_version() needs global qctx to get config from. It's possible to get the config from the caller instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	43039a0812	query-context: Kill global helpers Now the db::execute_cql* callers are patched, the global helpers can be removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	64eef0a4f7	test: Use cql_test_env::evecute_cql instead of qctx version Similar to previous patch, but for tests. Since cql_test_env does't have qctx on board, the patch makes one step forward and calls what is called by qctx::execute_cql. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	303ebe4a36	code: Use qctx::evecute_cql methods, not global ones There are global db::execute_cql() helpers that just forward the args into qctx::execute_cql(). The former are going away, so patch all callers to use qctx themselves. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	8bf6b1298c	system-keyspace: Do not call minimal_setup for the 2nd time THe system_keyspace::minimal_setup is called by main.cc by hands already, some steps before the regular ::setup(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	7b82ec2f9e	system-keyspace: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	1773dadc72	system-keyspace: Do not do invoke_on_all by hands The cache_truncation_record needs to run cf.cache_truncation_record on each shard's DB, so the invoke_on_all can be used. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	fb20d9cd1e	system-keyspace: Remove dead code Not called anywhare. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Piotr Dulikowski	60ac68b7a2	hints/resource_manager: add comments to register_manager Adds more comments to resource_manager::register_manager in order to better explain what this function is doing.	2020-11-19 16:34:37 +01:00
Piotr Dulikowski	c0c10b918c	hints/resource_manager: fix indentation Fixes indentation in prepare_per_device_limits.	2020-11-19 16:34:37 +01:00
Piotr Dulikowski	ead6a3f036	hints/resource_manager: improve mutual exclusion This commit causes start, stop and register_manager methods of the resource_manager to be serialized with respect to each other using the _operation_lock. Those function modify internal state, so it's best if they are protected with a semaphore. Additionally, those function are not going to be used frequently, therefore it's perfectly fine to protect them in such a coarse manner. Now, space_watchdog has a dedicated lock for serializing its on_timer logic with resource_manager::register_manager. The reason for separate lock is that resource_manager::stop cannot use the same lock as the space_watchdog - otherwise a situation could occur in which space_watchdog waits for semaphore units held by resource_manager::stop(), and resource_manager::stop() waits until the space_watchdog stops its asynchronous event loop.	2020-11-19 16:34:37 +01:00
Piotr Dulikowski	362aebee7b	hints/resource_manager: correct prepare_per_device_limits usage The resource_manager::prepare_per_device_limits function calculates disk quota for registered hints managers, and creates an association map: from a storage device id to those hints manager which store hints on that device (_per_device_limits_map) This function was used with an assumption that it is idempotent - which is a wrong assumption. In resource_manager::register_manager, if the resource_manager is already started, prepare_per_device_limits would be called, and those hints managers which were previously added to the _per_device_limits_map would be added again. This would cause the space used by those managers to be calculated twice, which would artificially lower the limit which we impose on the space hints are allowed to occupy on disk. This patch fixes this problem by changing the prepare_per_device_limits function to operate on a hints manager passed by argument. Now, we make sure that this function is called on each hints manager only once.	2020-11-19 16:34:37 +01:00
Piotr Jastrzebski	debd10cc55	cdc: Remove trailing whitespaces from cdc_tests The change was performed automatically using vim and :%s/\s\+$//e Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-19 16:25:22 +01:00
Piotr Jastrzebski	6bdbfbafb7	cdc: Remove mk_cdc_test_config from tests Now that CDC is GA and enabled by default, there's no longer a need for a specific config in CDC tests. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-19 16:21:32 +01:00
Avi Kivity	2deb8e6430	Merge 'mutation_reader: generalize `combined_mutation_reader`' from Kamil Braun It is now called `merging_reader`, and is used to change a `FragmentProducer` that produces a non-decreasing stream of mutation fragments batches into a `flat_mutation_reader` producing a non-decreasing stream of fragments. The resulting stream of fragments is increasing except for places where we encounter range tombstones (multiple range tombstones may be produced with the same position_in_partition) `merging_reader` is a simple adapter over `mutation_fragment_merger`. The old `combined_mutation_reader` is simply a specialization of `merging_reader` where the used `FragmentProducer` is `mutation_reader_merger`, an abstraction that merges the output of multiple readers into one non-decreasing stream of fragment batches. There is no separate class for `combined_mutation_reader` now. Instead, `make_combined_reader` works directly with `merging_reader`. The PR also improves some comments. Split from https://github.com/scylladb/scylla/pull/7437. Closes #7656 * github.com:scylladb/scylla: mutation_reader: `generalize combined_mutation_reader` mutation_reader: fix description of mutation_fragment_merger	2020-11-19 17:19:01 +02:00
Piotr Jastrzebski	9ede193f0a	config: Add add_cdc_extension function for testing and use it in cql_test_env to enable cdc extension for all tests that use it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-19 16:16:07 +01:00
Piotr Jastrzebski	89f4298670	cdc: Add missing includes to cdc_extension.hh Without those additional includes, a .cc file that includes cdc_extension.hh won't compile. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-19 16:11:33 +01:00
Nadav Har'El	5f37c1ef33	Merge 'Don't add delay to the timestamp of the first CDC generation' from Piotr Jastrzębski After the concept of the seed nodes was removed we can distinguish whether the node is the first node in the cluster or not. Thanks to this we can avoid adding delay to the timestamp of the first CDC generation. The delay is added to the timestamp to make sure that all the nodes in the cluster manage to learn about it before the timestamp becomes in the past. It is safe to not add the delay for the first node because we know it's the only node in the cluster and no one else has to learn about the timestamp. Fixes #7645 Tests: unit(dev) Closes #7654 * github.com:scylladb/scylla: cdc: Don't add delay to the timestamp of the first generation cdc: Change for_testing to add_delay in make_new_cdc_generation	2020-11-19 16:47:16 +02:00
Kamil Braun	857911d353	mutation_reader: `generalize combined_mutation_reader` It is now called `merging_reader`, and is used to change a `FragmentProducer` that produces a non-decreasing stream of mutation fragments batches into a `flat_mutation_reader` producing a non-decreasing stream of fragments. The resulting stream of fragments is increasing except for places where we encounter range tombstones (multiple range tombstones may be produced with the same position_in_partition) `merging_reader` is a simple adapter over `mutation_fragment_merger`. The old `combined_mutation_reader` is simply a specialization of `merging_reader` where the used `FragmentProducer` is `mutation_reader_merger`, an abstraction that merges the output of multiple readers into one non-decreasing stream of fragment batches. There is no separate class for `combined_mutation_reader` now. Instead, `make_combined_reader` works directly with `merging_reader`.	2020-11-19 14:35:11 +01:00
Kamil Braun	60adee6900	mutation_reader: fix description of mutation_fragment_merger The resulting sequence is not necessarily strictly increasing (e.g. if there are range tombstones).	2020-11-19 14:29:04 +01:00
Avi Kivity	a1be71b388	Merge "Harden network_topology_strategy_test.calculate_natural_endpoints" from Benny " We've recently seen failures in this unit test as follows: ``` test/boost/network_topology_strategy_test.cc(0): Entering test case "testCalculateEndpoints" unknown location(0): fatal error: in "testCalculateEndpoints": std::out_of_range: _Map_base::at ./seastar/src/testing/seastar_test.cc(43): last checkpoint test/boost/network_topology_strategy_test.cc(0): Leaving test case "testCalculateEndpoints"; testing time: 15192us test/boost/network_topology_strategy_test.cc(0): Entering test case "test_invalid_dcs" network_topology_strategy_test: ./seastar/include/seastar/core/future.hh:634: void seastar::future_state<seastar::internal::monostate>::set(A &&...) [T = seastar::internal::monostate, A = <>]: Assertion `_u.st == state::future' failed. Aborting on shard 0. ``` This series fixes 2 issues in this test: 1. The core issue where std::out_of_range exception is not handled in calculate_natural_endpoints(). 2. A secondary issue where the static `snitch_inst` isn't stopped when the first exception is hit, failing the next time the snitch is started, as it wasn't stopped properly. Test: network_topology_strategy_test(release) " * tag 'nts_test-harden-calculate_natural_endpoints-v1' of github.com:bhalevy/scylla: test: network_topology_strategy_test: has_sufficient_replicas: handle empty dc endpoints case test: network_topology_strategy_test: fixup indentation test: network_topology_strategy_test: always stop_snitch after create_snitch	2020-11-19 14:11:42 +02:00
Piotr Jastrzebski	93a7f7943c	cdc: Don't add delay to the timestamp of the first generation After the concept of the seed nodes was removed we can distinguish whether the node is the first node in the cluster or not. Thanks to this we can avoid adding delay to the timestamp of the first CDC generation. Fixes #7645 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-19 13:03:18 +01:00
Tomasz Grabiec	d3a5814f4f	api: Connect nodetool resetlocalschema to schema version recalculation It doesn't really do what the nodetool command is docuemented to do, which is to truncate local schema tables, but it is still an improvement. Message-Id: <1605740190-30332-1-git-send-email-tgrabiec@scylladb.com>	2020-11-19 13:55:09 +02:00
Piotr Jastrzebski	3024795507	cdc: Change for_testing to add_delay in make_new_cdc_generation The meaning of the parameter changes from defining whether the function is called in testing environment to deciding whether a delay should be added to a timestamp of a newly created CDC generation. This is a preparation for improvement in the following patch that does not always add delay to every node but only to non-first node. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-19 12:19:42 +01:00
Pekka Enberg	ba39bfa1be	dist-check: Fix script name to work on Windows filesystem Asias He reports that git on Windows filesystem is unhappy about the colon character (":") present in dist-check files: $ git reset --hard origin/master error: invalid path 'tools/testing/dist-check/docker.io/centos:7.sh' fatal: Could not reset index file to revision 'origin/master'. Rename the script to use a dash instead. Closes #7648	2020-11-19 13:16:30 +02:00
Gleb Natapov	43dc5e7dc2	test: add support for different state machines Current tests uses hash state machine that checks for specific order of entries application. The order is not always guaranty though. Backpressure may delay some entires to be submitted and when they are released together they may be reordered in the debug mode due to SEASTAR_SHUFFLE_TASK_QUEUE. Introduce an ability for test to choose state machine type and implement commutative state machine that does not care about ordering.	2020-11-18 19:14:37 +01:00
Gleb Natapov	8d9b6f588e	raft: stop accepting requests on a leader after the log reaches the limit To prevent the log to take too much memory introduce a mechanism that limits the log to a certain size. If the size is reached no new log entries can be submitted until previous entries are committed and snapshotted.	2020-11-18 19:14:37 +01:00
Evgeniy Naydanov	587b909c5c	scylla_raid_setup: try /dev/md[0-9] if no --raiddev provided If scylla_raid_setup script called without --raiddev argument then try to use any of /dev/md[0-9] devices instead of only one /dev/md0. Do it in this way because on Ubuntu 20.04 /dev/md0 used by OS already. Closes #7628	2020-11-18 18:42:31 +02:00
Pavel Emelyanov	dbb2722e46	auth: Fix class name vs field name compilation by gcc gcc fails to compile current master like this In file included from ./service/client_state.hh:44, from ./cql3/cql_statement.hh:44, from ./cql3/statements/prepared_statement.hh:47, from ./cql3/statements/raw/select_statement.hh:45, from build/dev/gen/cql3/CqlParser.hpp:64, from build/dev/gen/cql3/CqlParser.cpp:44: ./auth/service.hh:188:21: error: declaration of ‘const auth::resource& auth::command_desc::resource’ changes meaning of ‘resource’ [-fpermissive] 188 \| const resource& resource; ///< Resource impacted by this command. \| ^~~~~~~~ In file included from ./auth/authenticator.hh:57, from ./auth/service.hh:33, from ./service/client_state.hh:44, from ./cql3/cql_statement.hh:44, from ./cql3/statements/prepared_statement.hh:47, from ./cql3/statements/raw/select_statement.hh:45, from build/dev/gen/cql3/CqlParser.hpp:64, from build/dev/gen/cql3/CqlParser.cpp:44: ./auth/resource.hh:98:7: note: ‘resource’ declared here as ‘class auth::resource’ 98 \| class resource final { \| ^~~~~~~~ clang doesn't fail Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20201118155905.14447-1-xemul@scylladb.com>	2020-11-18 18:40:55 +02:00
Asias He	f7c954dc1e	repair: Use decorated_key::tri_compare to compare keys It is faster than the legacy_equal because it compares the token first. Fixes #7643 Closes #7644	2020-11-18 14:12:59 +02:00
Piotr Sarna	c0d72b4491	db,view: remove duplicate entries from the list of target endpoints If a list of target endpoints for sending view updates contains duplicates, it results in benign (but annoying) broken promise errors happening due to duplicated write response handlers being instantiated for a single endpoint. In order to avoid such errors, target remote endpoints are deduplicated from the list of pending endpoints. A similar issue (#5459) solved the case for duplicated local endpoints, but that didn't solve the general case. Fixes #7572 Closes #7641	2020-11-18 13:43:49 +02:00
Avi Kivity	d612ca78f3	Merge 'Allow changing hinted handoff configuration in runtime' from Piotr Dulikowski This PR allows changing the hinted_handoff_enabled option in runtime, either by modifying and reloading YAML configuration, or through HTTP API. This PR also introduces an important change in semantics of hinted_handoff_enabled: - Previously, hinted_handoff_enabled controlled whether _both writing and sending_ hints is allowed at all, or to particular DCs, - Now, hinted_handoff_enabled only controls whether _writing hints_ is enabled. Sending hints from disk is now always enabled. Fixes: #5634 Tests: - unit(dev) for each commit of the PR - unit(debug) for the last commit of the PR Closes #6916 * github.com:scylladb/scylla: api: allow changing hinted handoff configuration storage_proxy: fix wrong return type in swagger hints_manager: implement change_host_filter storage_proxy: always create hints manager config: plug in hints::host_filter object into configuration db/hints: introduce host_filter hints/resource_manager: allow registering managers after start hints: introduce db::hints::directory_initializer directories.cc: prepare for use outside main.cc	2020-11-18 13:41:02 +02:00
Calle Wilund	9f48dc7dac	locator::ec2_multi_region_snitch: Handle ipv6 broadcast/public ip Fixes #7064 Iff broadcast address is set to ipv6 from main (meaning prefer ipv6), determine the "public" ipv6 address (which should be the same, but might not be), via aws metadata query. Closes #7633	2020-11-18 12:48:25 +02:00
Asias He	9b28162f88	repair: Use label for node ops metrics Make it easier to be consumed by the scylla-monitor. Fixes #7270 Closes #7638	2020-11-18 10:12:39 +02:00
Avi Kivity	f55b522c1b	database: detect misconfigured unit tests that don't set available_memory available_memory is used to seed many caches and controllers. Usually it's detected from the environment, but unit tests configure it on their own with fake values. If they forget, then the undefined behavior sanitizer will kick in in random places (see `8aa842614a` ("test: gossip_test: configure database memory allocation correctly") for an example. Prevent this early by asserting that available_memory is nonzero. Closes #7612	2020-11-18 08:49:32 +02:00
Avi Kivity	13c6c90d8c	Merge 'Remove std::iterator usage' from Piotr Jastrzębski std::iterator is deprecated since C++17 so define all the required iterator_traits directly and stop using std::iterator at all. More context: https://www.fluentcpp.com/2018/05/08/std-iterator-deprecated Tests: unit(dev) Closes #7635 * github.com:scylladb/scylla: log_heap: Remove std::iterator from hist_iterator types: Remove std::iterator from tuple_deserializing_iterator types: Remove std::iterator from listlike_partial_deserializing_iterator sstables: remove std::iterator from const_iterator token_metadata: Remove std::iterator from tokens_iterator size_estimates_virtual_reader: Remove std::iterator token_metadata: Remove std::iterator from tokens_iterator_impl counters: Remove std::iterator from iterators compound_compat: Remove std::iterator from iterators compound: Remove std::iterator from iterator clustering_interval_set: Remove std::iterator from position_range_iterator cdc: Remove std::iterator from collection_iterator cartesian_product: Remove std::iterator from iterator bytes_ostream: Remove std::iterator from fragment_iterator	2020-11-17 19:22:17 +02:00
Benny Halevy	5171590d83	test: network_topology_strategy_test: has_sufficient_replicas: handle empty dc endpoints case We saw this intermittent failure in testCalculateEndpoints: ``` unknown location(0): fatal error: in "testCalculateEndpoints": std::out_of_range: _Map_base::at ``` It turns out that there are no endpoints associated with the dc passed to has_sufficient_replicas in the `all_endpoints` map. Handle this case by returning true. The dc is still required to appear in `dc_replicas`, so if it's not found there, fail the test gracefully. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-17 18:57:19 +02:00
Piotr Jastrzebski	2fe9d879df	log_heap: Remove std::iterator from hist_iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	957d4c3532	types: Remove std::iterator from tuple_deserializing_iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	5f64e57b10	types: Remove std::iterator from listlike_partial_deserializing_iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	bacda100ec	sstables: remove std::iterator from const_iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	661b52c7df	token_metadata: Remove std::iterator from tokens_iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	c0bc6b5795	size_estimates_virtual_reader: Remove std::iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	87bf577450	token_metadata: Remove std::iterator from tokens_iterator_impl std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	651849e0c1	counters: Remove std::iterator from iterators std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	742b5b7fc5	compound_compat: Remove std::iterator from iterators std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	493c2bfc96	compound: Remove std::iterator from iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	c5d6ee0e45	clustering_interval_set: Remove std::iterator from position_range_iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	6b1167ea0d	cdc: Remove std::iterator from collection_iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	a2fa10a0bc	cartesian_product: Remove std::iterator from iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	0605d9e8ed	bytes_ostream: Remove std::iterator from fragment_iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Benny Halevy	a38709b6bb	test: network_topology_strategy_test: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-17 16:10:35 +02:00
Benny Halevy	5c73d4f65b	test: network_topology_strategy_test: always stop_snitch after create_snitch Currently stop_snitch is not called if the test fails on exception. This causes a failure in create_snitch where snitch_inst fails to start since it wasn't stopped earlier. For example: ``` test/boost/network_topology_strategy_test.cc(0): Entering test case "testCalculateEndpoints" unknown location(0): fatal error: in "testCalculateEndpoints": std::out_of_range: _Map_base::at ./seastar/src/testing/seastar_test.cc(43): last checkpoint test/boost/network_topology_strategy_test.cc(0): Leaving test case "testCalculateEndpoints"; testing time: 15192us test/boost/network_topology_strategy_test.cc(0): Entering test case "test_invalid_dcs" network_topology_strategy_test: ./seastar/include/seastar/core/future.hh:634: void seastar::future_state<seastar::internal::monostate>::set(A &&...) [T = seastar::internal::monostate, A = <>]: Assertion `_u.st == state::future' failed. Aborting on shard 0. Backtrace: 0x0000000002825e94 0x000000000282ffa9 0x00007fd065f971df /lib64/libc.so.6+0x000000000003dbc4 /lib64/libc.so.6+0x00000000000268a3 /lib64/libc.so.6+0x0000000000026788 /lib64/libc.so.6+0x0000000000035fc5 0x0000000000b484cf 0x0000000002a7c69f 0x0000000002a7c62f 0x0000000000b47b9e 0x0000000002595da2 0x0000000002595913 0x0000000002a83a31 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-17 16:09:43 +02:00
Piotr Jastrzebski	f2b98b0aad	Replace disable_failure_guard with scoped_critical_alloc_section scoped_critical_alloc_section was recently introduced to replace disable_failure_guard and made the old class deprecated. This patch replaces all occurences of disable_failure_guard with scoped_critical_alloc_section. Without this patch the build prints many warnings like: warning: 'disable_failure_guard' is deprecated: Use scoped_critical_section instead [-Wdeprecated-declarations] Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <ca2a91aaf48b0f6ed762a6aa687e6ac5e936355d.1605621284.git.piotr@scylladb.com>	2020-11-17 16:01:25 +02:00
Avi Kivity	006e0e4fe0	Merge "Add scylla specific information to the OOM diagnostics report" from Botond " Use the recently introduced seastar mechanism which allows the application running on top of seastar to add its own part to the diagnostics report to add scylla specific information to said report. The report now closely resembles that produced by `scylla memory` from `scylla-gdb.py`, with the exception of coordinator-specific information. This should greatly speed up the debugging of OOM, as the diagnostics report will be available from the logs, without having to obtain a coredump and set up a debugging environment in which it can be opened. Example report: INFO 2020-11-10 12:02:44,182 [shard 0] testlog - Dumping seastar memory diagnostics Used memory: 2029M Free memory: 19M Total memory: 2G LSA allocated: 1770M used: 1766M free: 3M Cache: total: 1770M used: 1716M free: 54M Memtables: total: 0B Regular: real dirty: 0B virt dirty: 0B System: real dirty: 0B virt dirty: 0B Replica: Read Concurrency Semaphores: user: 100/100, 33M/41M, queued: 477 streaming: 0/10, 0B/41M, queued: 0 system: 0/100, 0B/41M, queued: 0 compaction: 0/∞, 0B/∞ Execution Stages: data query stage: statement 987 Total: 987 mutation query stage: Total: 0 apply stage: Total: 0 Tables - Ongoing Operations: Pending writes (top 10): 0 Total (all) Pending reads (top 10): 1564 ks.test 1564 Total (all) Pending streams (top 10): 0 Total (all) Small pools: objsz spansz usedobj memory unused wst% 8 4K 11k 88K 6K 6 10 4K 10 8K 8K 98 12 4K 2 8K 8K 99 14 4K 4 8K 8K 99 16 4K 15k 244K 5K 2 32 4K 2k 52K 3K 5 32 4K 20k 628K 2K 0 32 4K 528 20K 4K 17 32 4K 5k 144K 480B 0 48 4K 17k 780K 3K 0 48 4K 3k 140K 3K 2 64 4K 50k 3M 6K 0 64 4K 66k 4M 7K 0 80 4K 131k 10M 1K 0 96 4K 37k 3M 192B 0 112 4K 65k 7M 10K 0 128 4K 21k 3M 2K 0 160 4K 38k 6M 3K 0 192 4K 15k 3M 12K 0 224 4K 3k 720K 10K 1 256 4K 148 56K 19K 33 320 8K 13k 4M 14K 0 384 8K 3k 1M 20K 1 448 4K 11k 5M 5K 0 512 4K 2k 1M 39K 3 640 12K 163 144K 42K 29 768 12K 1k 832K 59K 7 896 8K 131 144K 29K 20 1024 4K 643 732K 89K 12 1280 20K 11k 13M 26K 0 1536 12K 12 128K 110K 85 1792 16K 12 144K 123K 85 2048 8K 601 1M 14K 1 2560 20K 70 224K 48K 21 3072 12K 13 240K 201K 83 3584 28K 6 288K 266K 92 4096 16K 10k 39M 88K 0 5120 20K 7 416K 380K 91 6144 24K 24 480K 336K 70 7168 28K 27 608K 413K 67 8192 32K 256 3M 736K 26 10240 40K 11k 105M 550K 0 12288 48K 21 960K 708K 73 14336 56K 59 1M 378K 31 16384 64K 8 1M 1M 89 Page spans: index size free used spans 0 4K 48M 48M 12k 1 8K 6M 6M 822 2 16K 41M 41M 3k 3 32K 18M 18M 579 4 64K 108M 108M 2k 5 128K 1774M 2G 14k 6 256K 512K 0B 2 7 512K 2M 2M 4 8 1M 0B 0B 0 9 2M 2M 0B 1 10 4M 0B 0B 0 11 8M 0B 0B 0 12 16M 16M 0B 1 13 32M 32M 32M 1 14 64M 0B 0B 0 15 128M 0B 0B 0 16 256M 0B 0B 0 17 512M 0B 0B 0 18 1G 0B 0B 0 19 2G 0B 0B 0 20 4G 0B 0B 0 21 8G 0B 0B 0 22 16G 0B 0B 0 23 32G 0B 0B 0 24 64G 0B 0B 0 25 128G 0B 0B 0 26 256G 0B 0B 0 27 512G 0B 0B 0 28 1T 0B 0B 0 29 2T 0B 0B 0 30 4T 0B 0B 0 31 8T 0B 0B 0 Fixes: #6365 " * 'dump-memory-diagnostics-oom/v1' of https://github.com/denesb/scylla: database: hook-in to the seastar OOM diagnostics report generation database: table: add accessors to the operation counts of the phasers utils: logalloc: add lsa_global_occupancy_stats() utils: phased_barrier: add operations_in_progress() mutation_query: mutation_query_stage: add get_stats() reader_concurrency_semaphore: add is_unlimited()	2020-11-17 15:50:21 +02:00
Avi Kivity	1cf02cb9d8	types: add constraint on lexicographical_tri_compare() Verify that the input types are iterators and their value types are compatible with the compare function.	2020-11-17 15:19:46 +02:00
Avi Kivity	71e93d63c5	composite: make composite::iterator a real input_iterator Iterators require a default constructor, so add one. This helps a later patch use std::input_iterator to constrain template parameters.	2020-11-17 15:19:46 +02:00
Avi Kivity	867b41b124	compound: make compount_type::iterator a real input_iterator Iterators require a default constructor, so add one. This helps a later patch use std::input_iterator to constrain template parameters.	2020-11-17 15:19:38 +02:00
Botond Dénes	34c213f9bb	database: hook-in to the seastar OOM diagnostics report generation Use the mechanism provided by seastar to add scylla specific information to the memory diagnostics report. The information added is mostly the same contained in the output of `scylla memory` from `scylla-gdb.py`, with the exception of the coordinator-specific metrics. The report is generated in the database layer, where the storage-proxy is not available and it is not worth pulling it in just for this purpose. An example report: INFO 2020-11-10 12:02:44,182 [shard 0] testlog - Dumping seastar memory diagnostics Used memory: 2029M Free memory: 19M Total memory: 2G LSA allocated: 1770M used: 1766M free: 3M Cache: total: 1770M used: 1716M free: 54M Memtables: total: 0B Regular: real dirty: 0B virt dirty: 0B System: real dirty: 0B virt dirty: 0B Replica: Read Concurrency Semaphores: user: 100/100, 33M/41M, queued: 477 streaming: 0/10, 0B/41M, queued: 0 system: 0/100, 0B/41M, queued: 0 compaction: 0/∞, 0B/∞ Execution Stages: data query stage: statement 987 Total: 987 mutation query stage: Total: 0 apply stage: Total: 0 Tables - Ongoing Operations: Pending writes (top 10): 0 Total (all) Pending reads (top 10): 1564 ks.test 1564 Total (all) Pending streams (top 10): 0 Total (all) Small pools: objsz spansz usedobj memory unused wst% 8 4K 11k 88K 6K 6 10 4K 10 8K 8K 98 12 4K 2 8K 8K 99 14 4K 4 8K 8K 99 16 4K 15k 244K 5K 2 32 4K 2k 52K 3K 5 32 4K 20k 628K 2K 0 32 4K 528 20K 4K 17 32 4K 5k 144K 480B 0 48 4K 17k 780K 3K 0 48 4K 3k 140K 3K 2 64 4K 50k 3M 6K 0 64 4K 66k 4M 7K 0 80 4K 131k 10M 1K 0 96 4K 37k 3M 192B 0 112 4K 65k 7M 10K 0 128 4K 21k 3M 2K 0 160 4K 38k 6M 3K 0 192 4K 15k 3M 12K 0 224 4K 3k 720K 10K 1 256 4K 148 56K 19K 33 320 8K 13k 4M 14K 0 384 8K 3k 1M 20K 1 448 4K 11k 5M 5K 0 512 4K 2k 1M 39K 3 640 12K 163 144K 42K 29 768 12K 1k 832K 59K 7 896 8K 131 144K 29K 20 1024 4K 643 732K 89K 12 1280 20K 11k 13M 26K 0 1536 12K 12 128K 110K 85 1792 16K 12 144K 123K 85 2048 8K 601 1M 14K 1 2560 20K 70 224K 48K 21 3072 12K 13 240K 201K 83 3584 28K 6 288K 266K 92 4096 16K 10k 39M 88K 0 5120 20K 7 416K 380K 91 6144 24K 24 480K 336K 70 7168 28K 27 608K 413K 67 8192 32K 256 3M 736K 26 10240 40K 11k 105M 550K 0 12288 48K 21 960K 708K 73 14336 56K 59 1M 378K 31 16384 64K 8 1M 1M 89 Page spans: index size free used spans 0 4K 48M 48M 12k 1 8K 6M 6M 822 2 16K 41M 41M 3k 3 32K 18M 18M 579 4 64K 108M 108M 2k 5 128K 1774M 2G 14k 6 256K 512K 0B 2 7 512K 2M 2M 4 8 1M 0B 0B 0 9 2M 2M 0B 1 10 4M 0B 0B 0 11 8M 0B 0B 0 12 16M 16M 0B 1 13 32M 32M 32M 1 14 64M 0B 0B 0 15 128M 0B 0B 0 16 256M 0B 0B 0 17 512M 0B 0B 0 18 1G 0B 0B 0 19 2G 0B 0B 0 20 4G 0B 0B 0 21 8G 0B 0B 0 22 16G 0B 0B 0 23 32G 0B 0B 0 24 64G 0B 0B 0 25 128G 0B 0B 0 26 256G 0B 0B 0 27 512G 0B 0B 0 28 1T 0B 0B 0 29 2T 0B 0B 0 30 4T 0B 0B 0 31 8T 0B 0B 0	2020-11-17 15:13:21 +02:00
Botond Dénes	4d7f2f45c2	database: table: add accessors to the operation counts of the phasers	2020-11-17 15:13:21 +02:00
Botond Dénes	7b56ed6057	utils: logalloc: add lsa_global_occupancy_stats() Allows querying the occupancy stats of all the lsa memory.	2020-11-17 15:13:21 +02:00
Botond Dénes	f69942424d	utils: phased_barrier: add operations_in_progress() Allows querying the number of operations in-flight in the current phase.	2020-11-17 15:13:21 +02:00
Botond Dénes	f097bf3005	mutation_query: mutation_query_stage: add get_stats()	2020-11-17 15:13:21 +02:00
Botond Dénes	8c083c17fc	reader_concurrency_semaphore: add is_unlimited() Allows determining whether the semaphore was created without limits.	2020-11-17 15:13:21 +02:00
Avi Kivity	100ad4db38	Merge 'Allow ALTERing the properties of system_auth tables' from Dejan Mircevski As requested in #7057, allow certain alterations of system_auth tables. Potentially destructive alterations are still rejected. Tests: unit (dev) Closes #7606 * github.com:scylladb/scylla: auth: Permit ALTER options on system_auth tables auth: Add command_desc auth: Add tests for resource protections	2020-11-17 12:15:20 +02:00
Botond Dénes	318b0ef259	reader_concurrency_semaphore: rate-limit diagnostics messages And since now there is no danger of them filling the logs, the log-level is promoted to info, so users can see the diagnostics messages by default. The rate-limit chosen is 1/30s. Refs: #7398 Tests: manual Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20201117091253.238739-1-bdenes@scylladb.com>	2020-11-17 11:57:51 +02:00
Piotr Dulikowski	0fd36e2579	api: allow changing hinted handoff configuration This commit makes it possible to change hints manager's configuration at runtime through HTTP API. To preserve backwards compatibility, we keep the old behavior of not creating and checking hints directories if they are not enabled at startup. Instead, hint directories are lazily initialized when hints are enabled for the first time through HTTP API.	2020-11-17 10:24:43 +01:00
Piotr Dulikowski	6465dd160b	storage_proxy: fix wrong return type in swagger The GET `hinted_handoff_enabled_by_dc` endpoint had an incorrect return type specified. Although it does not have an implementation, yet, it was supposed to return a list of strings with DC names for which generating hints is enabled - not a list of string pairs. Such return type is expected by the JMX.	2020-11-17 10:24:43 +01:00
Piotr Dulikowski	220a2ca800	hints_manager: implement change_host_filter Implements a function which is responsible for changing hints manager configuration while it is running. It first starts new endpoint managers for endpoints which weren't allowed by previous filter but are now, and then stops endpoint managers which are rejected by the new filter. The function is blocking and waits until all relevant ep managers are started or stopped.	2020-11-17 10:24:43 +01:00
Piotr Dulikowski	1302f1b5bf	storage_proxy: always create hints manager Now, the hints manager object for regular hints is always created, even if hints are disabled in configuration. Please note that the behavior of hints will be unchanged - no hints will be sent when they are disabled. The intent of this change is to make enabling and disabling hints in runtime easier to implement.	2020-11-17 10:24:43 +01:00
Piotr Dulikowski	cefe5214ff	config: plug in hints::host_filter object into configuration Uses db::hints::host_filter as the type of hinted_handoff_enabled configuration option. Previously, hinted_handoff_enabled used to be a string option, and it was parsed later in a separate function during startup. The function returned a std::optional<std::unordered_set<sstring>>, whose meaning in the context of hints is rather enigmatic for an observer not familiar with hints. Now, hinted_handoff_enabled has type of db::hints::host_filter, and it is plugged into the config parsing framework, so there is no need for later post-processing.	2020-11-17 10:24:42 +01:00
Piotr Dulikowski	5c3c7c946b	db/hints: introduce host_filter Adds a db::hints::host_filter structure, which determines if generating hints towards a given target is currently allowed. It supports serialization and deserialization between the hinted_handoff_enabled configuration/cli option. This patch only introduces this structure, but does not make other code use it. It will be plugged into the configuration architecture in the following commits.	2020-11-17 10:15:47 +01:00
Piotr Dulikowski	a4f03d72b3	hints/resource_manager: allow registering managers after start This change modifies db::hints::resource_manager so that it is now possible to add hints::managers after it was started. This change will make it possible to register the regular hints manager later in runtime, if it wasn't enabled at boot time.	2020-11-17 10:15:47 +01:00
Piotr Dulikowski	40710677d0	hints: introduce db::hints::directory_initializer Introduces a db::hints::directory_initializer object, which encapsulates the logic of initializing directories for hints (creating/validating directories, segment rebalancing). It will be useful for lazy initialization of hints manager.	2020-11-17 10:15:47 +01:00
Piotr Dulikowski	81a568c57a	directories.cc: prepare for use outside main.cc Currently, the `directories` class is used exclusively during initialization, in the main() function. This commit refactors this class so that it is possible to use it to initialize directories much later after startup. The intent of this change is to make it possible for hints manager to create directories for hints lazily. Currently, when Scylla is booted with hinted handoff disabled, the `hints_directory` config parameter is ignored and directories for hints are neither created nor verified. Because we would like to preserve this behavior and introduce possibility to switch hinted handoff on in runtime, the hints directories will have to be created lazily the first time hinted handoff is enabled.	2020-11-17 10:15:47 +01:00
Piotr Sarna	5c66291ab9	Update seastar submodule * seastar 043ecec7...c861dbfb (3): > Merge "memory: allow configuring when to dump memory diagnostics on allocation failures" from Botond > perftune.py: support kvm-clock on tune-clock > execution_stage: inheriting_concrete_execution_stage: add get_stats()	2020-11-17 08:37:39 +01:00
Dejan Mircevski	1beb57ad9d	auth: Permit ALTER options on system_auth tables These alterations cannot break the database irreparably, so allow them. Expand command_desc as required. Add a type (rather than command_desc) parameter to has_column_family_access() to minimize code changes. Fixes #7057 Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-11-16 22:32:32 -05:00
Dejan Mircevski	9a6c1b4d50	auth: Add command_desc Instead of passing various bits of the command around, pass one command_desc object. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-11-16 20:23:52 -05:00
Kamil Braun	d74f303406	cdc: ensure that CDC generation write is flushed to commitlog before ack When a node bootstraps or upgrades from a pre-CDC version, it creates a new CDC generation, writes it to a distributed table (system_distributed.cdc_generation_descriptions), and starts gossiping its timestamp. When other nodes see the timestamp being gossiped, they retrieve the generation from the table. The bootstrapping/upgrading node therefore assumes that the generation is made durable and other nodes will be able to retrieve it from the table. This assumption could be invalidated if periodic commitlog mode was used: replicas would acknowledge the write and then immediately crash, losing the write if they were unlucky (i.e. commitlog wasn't synced to disk before the write was acknowledged). This commit enforces all writes to the generations table to be synced to commitlog immediately. It does not matter for performance as these writes are very rare. Fixes https://github.com/scylladb/scylla/issues/7610. Closes #7619	2020-11-17 00:01:13 +02:00
Gleb Natapov	df197e36fb	raft: store an entry as a shared ptr in an outgoing message An entry can be snapshotted, before the outgoing message is sent, so the message has to hold to it to avoid use after free. Message-Id: <20201116113323.GA1024423@scylladb.com>	2020-11-16 17:54:21 +01:00
Piotr Sarna	fc8ffe08b9	storage_proxy: unify retiring view response handlers Materialized view updates participate in a retirement program, which makes sure that they are immediately taken down once their target node is down, without having to wait for timeout (since views are a background operation and it's wasteful to wait in the background for minutes). However, this mechanism has very delicate lifetime issues, and it already caused problems more than once, most recently in #5459. In order to make another bug in this area less likely, the two implementations of the mechanism, in on_down() and drain_on_shutdown(), are unified. Possibly refs #7572 Closes #7624	2020-11-16 18:50:49 +02:00
Avi Kivity	5d45662804	database, streaming: remove remnants of memtable-base streaming Commit `e5be3352cf` ("database, streaming, messaging: drop streaming memtables") removed streaming memtables; this removes the mechanisms to synchronize them: _streaming_flush_gate and _streaming_flush_phaser. The memory manager for streaming is removed, and its 10% reserve is evenly distributed between memtables and general use (e.g. cache). Note that _streaming_flush_phaser and _streaming_flush_date are no longer used to syncrhonize anything - the gate is only used to protect the phaser, and the phaser isn't used for anything. Closes #7454	2020-11-16 14:32:19 +01:00
Takuya ASADA	2ce8ca0f75	dist/common/scripts/scylla_util.py: move DEBIAN_FRONTEND environment variable to apt_install()/apt_uninstall() DEBIAN_FRONTEND environment variable was added just for prevent opening dialog when running 'apt-get install mdadm', no other program depends on it. So we can move it inside of apt_install()/apt_uninstall() and drop scylla_env, since we don't have any other environment variables. To passing the variable, added env argument on run()/out().	2020-11-16 14:21:36 +02:00
Avi Kivity	fcec68b102	Merge "storage_service: add mutate_token_metadata helper" from Benny " This is a follow-up on `052a8d036d` "Avoid stalls in token_metadata and replication strategy" The added mutate_token_metadata helper combines: - with_token_metadata_lock - get_mutable_token_metadata_ptr - replicate_to_all_cores Test: unit(dev) " * tag 'mutate_token_metadata-v1' of github.com:bhalevy/scylla: storage_service: fixup indentation storage_service: mutate_token_metadata: do replicate_to_all_cores storage_service: add mutate_token_metadata helper	2020-11-15 20:00:19 +02:00
Benny Halevy	51e4d6490b	storage_service: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-15 15:18:48 +02:00
Benny Halevy	e861c352f8	storage_service: mutate_token_metadata: do replicate_to_all_cores Replicate the mutated token_metadata to all cores on success. This moves replication out of update_pending_ranges(mutable_token_metadata_ptr, sstring), so add explicit call to replicate_to_all_cores where it is called outside of mutate_token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-15 14:34:20 +02:00
Benny Halevy	25b5db0b72	storage_service: add mutate_token_metadata helper Replace a repeating pattern of: with_token_metadata_lock([] { return get_mutable_token_metadata_ptr([] (mutable_token_metadata_ptr tmptr) { // mutate token_metadata via tmptr }); }); With a call to mutate_token_metadata that does both and calls the function with then mutable_token_metadata_ptr. A following patch will also move the replication to all cores to mutate_token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-15 14:31:39 +02:00
Pekka Enberg	31389d1724	configure.py: Fix unified-package version and release to unbreak "dist" target The "dist" target fails as follows: $ ./tools/toolchain/dbuild ninja dist ninja: error: 'build/dev/scylla-unified-package-..tar.gz', needed by 'dist-unified-tar', missing and no known rule to make it Fix two issues: - Fix Python variable references to "scylla_version" and "scylla_release", broken by commit `bec0c15ee9` ("configure.py: Add version to unified tarball filename"). The breakage went unnoticed because ninja default target does not call into dist... - Remove dependencies to build/<mode>/scylla-unified-package.tar.gz. The file is now in build/<mode>/dist/tar/ directory and contains version and release in the filename. Message-Id: <20201113110706.150533-1-penberg@scylladb.com>	2020-11-15 11:10:26 +02:00
Dejan Mircevski	d554610f32	auth: Add tests for resource protections Try to mess up system_auth tables and verify that Scylla rejects that. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-11-13 21:18:38 -05:00
Tomasz Grabiec	0a2adf4555	Merge "raft: replication test: simple partitioning" from Alejo To test handling of connectivity issues and recovery add support for disconnecting servers. This is not full partitioning yet as it doesn't allow connectivity across the disconnected servers (having multiple active partitions. * https://github.com/alecco/scylla/pull/new/raft-ale-partition-simple-v3: raft: replication test: connectivity partitioning support raft: replication test: block rpc calls to disconnected servers raft: replication test: add is_disconnected helper raft: replication test: rename global variable raft: replication test: relocate global connection state map	2020-11-13 13:49:33 +01:00
Pekka Enberg	f57b894d42	configure.py: Remove duplicate scylla-package.tar.gz artifact We currently keep a copy of scylla-package.tar.gz in "build/<mode>" for compatibility. However, we've long since switched our CI system over to the new location, so let's remove the duplicate and use the one from "build/<mode>/dist/tar" instead. Message-Id: <20201113075146.67265-1-penberg@scylladb.com>	2020-11-13 11:27:39 +01:00
Nadav Har'El	62551b3bd3	docs/alternator: mention that Alternator Streams is experimental Add to the DynamoDB compatibility document, docs/alternator/compatibility.md, a mention that Alternator streams are still an experimental features, and how to turn it on (at this point CDC is no longer an experimental feature, but Alternator Streams are). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201112184436.940497-1-nyh@scylladb.com>	2020-11-12 21:20:04 +02:00
Nadav Har'El	450de2d89d	docs/alternator: Alternator is no longer "experimental" Drop the adjective "experimental" used to describe Alternator in docs/alternator/getting-started.md. In Scylla, the word "experimental" carries a specific meaning - no support for upgrades, not enough QA, not ready for general use) and Alternator is no longer experimental in that sense. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201112185249.941484-1-nyh@scylladb.com>	2020-11-12 21:20:03 +02:00
Nadav Har'El	e40fa4b7fd	test/cql-pytest: remove xfail mark from passing secondary-index test Issue #7443 (the wrong sort order of partitions in a secondary index) was already fixed in commit `7ff72b0ba5`. So the test for it is now passing, and we can remove its "xfail" mark. Refs #7443 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201112183441.939604-1-nyh@scylladb.com>	2020-11-12 20:43:59 +02:00
Pekka Enberg	274717c97d	cql-pytest/test_keyspace.py: Add ALTER KEYSPACE test cases This adds some test cases for ALTER KEYSPACE: - ALTER KEYSPACE happy path - ALTER KEYSPACE wit invalid options - ALTER KEYSPACE for non-existing keyspace - CREATE and ALTER KEYSPACE using NetworkTopologyStrategy with non-existing data center in configuration, which triggers a bug in Scylla: https://github.com/scylladb/scylla/issues/7595 Message-Id: <20201112073110.39475-1-penberg@scylladb.com>	2020-11-12 20:07:12 +02:00
Alejo Sanchez	5d8752602b	raft: replication test: connectivity partitioning support Introduce partition update command consisting of nodes still seeing each other. Nodes not included are disconnected from everything else. If the previous leader is not part of the new partition, the first node specified in the partition will become leader. For other nodes to accept a new leader it has to have a committed log. For example, if the desired leader is being re-connected and it missed entries other nodes saw it will not win the election. Example A B C: partition{A,C},entries{2},partition{B,C} In this case node C won't accept B as a new leader as it's missing 2 entries. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-12 10:01:17 -04:00
Alejo Sanchez	2fc5b3a620	raft: replication test: block rpc calls to disconnected servers Use global connection state with rpc, too. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-12 10:01:05 -04:00
Alejo Sanchez	c9e593a6d7	raft: replication test: add is_disconnected helper Simplify disconnection logic with helper is_disconnected() function Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-12 10:00:58 -04:00
Alejo Sanchez	e1b0aad149	raft: replication test: rename global variable Lowercase for global disconnection map. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-12 09:59:06 -04:00
Alejo Sanchez	7a2c6d08a1	raft: replication test: relocate global connection state map Needed for using by rpc class. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-12 09:58:48 -04:00
Piotr Dulikowski	5b12375842	main.cc: wait for hints manager to start In main.cc, we spawn a future which starts the hints manager, but we don't wait for it to complete. This can have the following consequences: - The hints manager does some asynchronous operations during startup, so it can take some time to start. If it is started after we start handling requests, and we admit some requests which would result in hints being generated, those hints will be dropped instead because we check if hints manager is started before writing them. - Initialization of hints manager may fail, and Scylla won't be stopped because of it (e.g. we don't have permissions to create hints directories). The consequence of this is that hints manager won't be started, and hints will be dropped instead of being written. This may affect both regular hints manager, and the view hints manager. This commit causes us to wait until hints manager start and see if there were any errors during initialization. Fixes #7598 Closes #7599	2020-11-12 14:17:10 +02:00
Nadav Har'El	78649c2322	Merge 'Mark CDC as GA' from Piotr Jastrzębski CDC is ready to be a non-experimental feature so remove the experimental flag for it. Also, guard Alternator Streams with their own experimental flag. Previously, they were using CDC experimental flag as they depend on CDC. Tests: unit(dev) Closes #7539 * github.com:scylladb/scylla: alternator: guard streams with an experimental flag Mark CDC as GA cdc: Make it possible for CDC generation creation to fail	2020-11-12 13:49:27 +02:00
Piotr Jastrzebski	d2897d8f8b	alternator: guard streams with an experimental flag Add new alternator-streams experimental flag for alternator streams control. CDC becomes GA and won't be guarded by an experimental flag any more. Alternator Streams stay experimental so now they need to be controlled by their own experimental flag. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-12 12:36:16 +01:00
Piotr Jastrzebski	e9072542c1	Mark CDC as GA Enable CDC by default. Rename CDC experimental feature to UNUSED_CDC to keep accepting cdc flag. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-12 12:36:13 +01:00
Piotr Jastrzebski	2091408478	cdc: Make it possible for CDC generation creation to fail Following patch enables CDC by default and this means CDC has to work will all the clusters now. There is a problematic case when existing cluster with no CDC support is stopped, all the binaries are updated to newer version with CDC enabled by default. In such case, nodes know that they are already members of the cluster but they can't find any CDC generation so they will try to create one. This creation may fail due to lack of QUORUM for the write. Before this patch such situation would lead to node failing to start. After the change, the node will start but CDC generation will be missing. This will mean CDC won't be able to work on such cluster before nodetool checkAndRepairCdcStreams is run to fix the CDC generation. We still fail to bootstrap if the creation of CDC generation fails. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-12 12:29:31 +01:00
Lubos Kosco	5c488b6e9a	scylla_util.py: properly parse GCP instances without size fixes #7577 Closes #7592	2020-11-12 13:01:40 +02:00
Piotr Sarna	d43ac783c6	db,view: degrade helper message from error to warn When a missing base column happens to be named `idx_token`, an additional helper message is printed in logs. This additional message does not need to have `error` severity, since the previous, generic message is already marked as `error`. This patch simply makes it easier to write tests, because in case this error is expected, only one message needs to be explicitly ignored instead of two. Closes #7597	2020-11-12 12:28:26 +02:00
Avi Kivity	6091dc9b79	Merge 'Add more overload-related metrics' from Piotr Sarna This miniseries adds metrics which can help the users detect potential overloads: * due to having too many in-flight hints * due to exceeding the capacity of the read admission queue, on replica side Closes #7584 * github.com:scylladb/scylla: reader_concurrency_semaphore: add metrics for shed reads storage_proxy: add metrics for too many in-flight hints failures	2020-11-12 12:27:31 +02:00
Raphael S. Carvalho	13fa2bec4c	compaction: Make sure a partition is filtered out only by producer If interposer consumer is enabled, partition filtering will be done by the consumer instead, but that's not possible because only the producer is able to skip to the next partition if the current one is filtered out, so scylla crashes when that happens with a bad function call in queue_reader. This is a regression which started here: `55a8b6e3c9` To fix this problem, let's make sure that partition filtering will only happen on the producer side. Fixes #7590. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20201111221513.312283-1-raphaelsc@scylladb.com>	2020-11-12 12:22:10 +02:00
Avi Kivity	052a8d036d	Merge "Avoid stalls in token_metadata and replication strategy" from Benny " This series is a rebased version of 3 patchsets that were sent separately before: 1. [PATCH v4 00/17] Cleanup storage_service::update_pending_ranges et al. This patchset cleansup service/storage_service use of update_pending_ranges and replicate_to_all_cores. It also moves some functionality from gossiping_property_file_snitch::reload_configuration into a new method - storage_service::update_topology. This prepares storage_service for using a shared ptr to token_metadata, updating a copy out of line under a semaphore that serializes writers, and eventually replicating to updated copy to all shards and releasing the lock. This is a follow up to #7044. 2. [PATCH v8 00/20] token_metadata versioned shared ptr Rather than keeping references on token_metadata use a shared_token_metadata containing a lw_shared_ptr<token_metadata> (a.k.a token_metadata_ptr) to keep track of the token_metadata. Get token_metadata_ptr for a read-only snapshot of the token_metadata or clone one for a mutable snapshot that is later used to safely update the base versioned_shared_object. token_metadata_ptr is used to modify token_metadata out of line, possibly with multiple calls, that could be preeempted in-between so that readers can keep a consistent snapshot of it while writers prepare an updated version. Introduce a token_metadata_lock used to serialize mutators of token_metadata_ptr. It's taken by the storage_service before cloning token_metadata_ptr and held until the updated copy is replicated on all shards. In addition, this series introduces token_metadata::clone_async() method to copy the tokne_metadata class using a asynchronous function with continuations to avoid reactor stalls as seen in #7220. Fixes #7044 3. [PATCH v3 00/17] Avoid stalls in token_metadata and replication strategy This series uses the shared_token_metadata infrastructure. First patches in the series deal wth cloning token_metadata using continuations to allow preemption while cloning (See #7220). Then, the rest of the series makes sure to always run `update_pending_ranges` and `calculate_pending_ranges_for_` in a thread, it then adds a `can_yield` parameter to the token_metadata and abstract_replication_strategy `get_pending_ranges` and friends, and finally it adds `maybe_yield` calls in potentially long loops. Fixes #7313 Fixes #7220 Test: unit (dev) Dtest: gating(dev) " tag 'replication_strategy_can_yield-v4' of github.com:bhalevy/scylla: (54 commits) token_metadata_impl: set_pending_ranges: add can_yield_param abstract_replication_strategy: get rid of get_ranges_in_thread repair: call get_ranges_in_thread where possible abstract_replication_strategy: add can_yield param to get_pending_ranges and friends abstract_replication_strategy: define can_yield bool_class token_metadata_impl: calculate_pending_ranges_for_* reindent token_metadata_impl: calculate_pending_ranges_for_* pass new_pending_ranges by ref token_metadata_impl: calculate_pending_ranges_for_* call in thread token_metadata: update_pending_ranges: create seastar thread abstract_replication_strategy: add get_address_ranges method for specific endpoint token_metadata_impl: clone_after_all_left: sort tokens only once token_metadata: futurize clone_after_all_left token_metadata: futurize clone_only_token_map token_metadata: use mutable_token_metadata_ptr in calculate_pending_ranges_for_* repair: replace_with_repair: use token_metadata::clone_async storage_service: reindent token_metadata blocks token_metadata: add clone_async abstract_replication_strategy: accept a token_metadata_ptr in get_pending_address_ranges methods abstract_replication_strategy: accept a token_metadata_ptr in get_ranges methods boot_strapper: get_*_tokens: use token_metadata_ptr ...	2020-11-12 11:56:05 +02:00
Nadav Har'El	b01bdcf910	alternator streams: add test for StartingSequenceNumber Add a test that better clarifies what StartingSequenceNumber returned by DescribeStream really guarantees (this question was raised in a review of a different patch). The main thing we can guarantee is that reading a shard from that position returns all the information in that shard - similar to TRIM_HORIZON. This test verifies this, and it passes. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201112081250.862119-1-nyh@scylladb.com>	2020-11-12 10:40:41 +01:00
Piotr Sarna	3ce7848bdf	reader_concurrency_semaphore: add metrics for shed reads When the admission queue capacity reaches its limits, excessive reads are shed in order to avoid overload. Each such operation now bumps the metrics, which can help the user judge if a replica is overloaded.	2020-11-11 19:01:38 +01:00
Piotr Wojtczak	d9810ec8eb	cql_metrics: Add counters for CQL request messages This change adds metrics for counting request message types listed in the CQL v.4 spec under section 4.1 (https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec). To organize things properly, we introduce a new cql_server::transport_stats object type for aggregating the message and server statistics. Fixes #4888 Closes #7574	2020-11-11 20:00:17 +02:00
Avi Kivity	d5a6aa4533	Merge 'cql3: Rewrite the need_filtering logic' from Dejan Mircevski Rewrite in a more readable way that will later allow us to split the WHERE expression in two: a storage-reading part and a post-read filtering part. Tests: unit (dev,debug) Closes #7591 * github.com:scylladb/scylla: cql3: Rewrite need_filtering() from scratch cql3: Store index info in statement_restrictions	2020-11-11 20:00:17 +02:00
Nadav Har'El	940ac80798	cql-pytest: rename test_object_name() function The name of the utility function test_object_name() is confusing - by starting with the word "test", pytest can think (if it's imported to the top-level namespace) that it is a test... So this patch gives it a better name - unique_name(). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201111140638.809189-1-nyh@scylladb.com>	2020-11-11 20:00:17 +02:00
Nadav Har'El	90eba0ce04	alternator, docs: add a new compatibility.md document This patch adds a new document, docs/alternator/compatibility.md, which focuses on what users switching from DynamoDB to Alternator need to know about where Alternator differs from DynamoDB and which features are missing. The compatibility information in the old alternator.md is not deleted yet. It probably should. Fixes #7556 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201110180242.716295-1-nyh@scylladb.com>	2020-11-11 20:00:17 +02:00
Avi Kivity	06c949b452	Update seastar submodule * seastar a62a80ba1d...043ecec732 (8): > semaphore: make_expiry_handler: explicitly use this lambda capture > configure: add --{enable,disable}-debug-shared-ptr option > cmake: add SEASTAR_DEBUG_SHARED_PTR also in dev mode > tls_test: Update the certificates to use sha256 > logger: allow applying a rate-limit to log messages > Merge "Handle CPUs not attached to any NUMA nodes" from Pavel E > memory: fix malloc_usable_size() during early initialization > Merge "make semaphore related functions noexcept" from Benny	2020-11-11 20:00:17 +02:00
Dejan Mircevski	9150a967c6	cql3: Rewrite need_filtering() from scratch Makes it easier to understand, in preparation for separating the WHERE expression into filtering and storage-reading parts. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-11-11 08:25:36 -05:00
Dejan Mircevski	e754026010	cql3: Store index info in statement_restrictions To rewrite need_filtering() in a more readable way, we need to store info on found indexes in statement_restrictions data members. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-11-11 08:25:36 -05:00
Benny Halevy	275fe30628	token_metadata_impl: set_pending_ranges: add can_yield_param To prevent a > 10 ms stall when inserting to boost::icl::interval_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	1e2138e8ef	abstract_replication_strategy: get rid of get_ranges_in_thread Use the can_yield param to get_ranges instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	e4e0e71b50	repair: call get_ranges_in_thread where possible To prevent reactor stalls during repair-based operations. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	ba31350239	abstract_replication_strategy: add can_yield param to get_pending_ranges and friends To prevent reactor stalls as seen in #7313. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	6c2a089a6f	abstract_replication_strategy: define can_yield bool_class To be used by convention by several other methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	7fb489d338	token_metadata_impl: calculate_pending_ranges_for_* reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	6ce2436a4c	token_metadata_impl: calculate_pending_ranges_for_* pass new_pending_ranges by ref We can use the seastar thread to keep the vector rather thna creating a lw_shared_ptr for it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	0ca423dcfc	token_metadata_impl: calculate_pending_ranges_for_* call in thread The functions can be simplified as they are all now being called from a seastar thread. Make them sequential, returning void, and yielding if necessary. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	84d086dc77	token_metadata: update_pending_ranges: create seastar thread So we can yield in this path to prevent reactor stalls. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	1e6c181678	abstract_replication_strategy: add get_address_ranges method for specific endpoint Some of the callers of get_address_ranges are interested in the ranges of a specific endpoint. Rather than building a map for all endpoints and then traversing it looking for this specific endpoint, build a multimap of token ranges relating only to the specified endpoint. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	2ce6773dae	token_metadata_impl: clone_after_all_left: sort tokens only once Currently the sorted tokens are copied needlessly by on this path by `clone_only_token_map` and then recalculated after calling remove_endpoint for each leaving endpoint. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	0abd8e62cd	token_metadata: futurize clone_after_all_left Call the futurized clone_only_token_map and remove the _leaving_endpoints from the cloned token_metadata_impl. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	4a622c14e1	token_metadata: futurize clone_only_token_map Does part of clone_async() using continuations to prevent stalls. Rename synchronous variant to clone_only_token_map_sync that is going to be deprecated once all its users will be futurized. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	d1a73ec7b3	token_metadata: use mutable_token_metadata_ptr in calculate_pending_ranges_for_* Replacing old code using lw_shared_ptr<token_metadata> with the "modern" mutable_token_metadata_ptr alias. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	6af7b689f3	repair: replace_with_repair: use token_metadata::clone_async Clone the input token_metadata asynchronously using clone_async() before modifying it using update_normal_tokens. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	d4d9f3e8a9	storage_service: reindent token_metadata blocks Many code blocks using with_token_metadata_lock and get_mutable_token_metadata_ptr now need re-indenting. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	4fc5997949	token_metadata: add clone_async Clone token_metadata object using async continuation to prevent reactor stalls. Refs https://github.com/scylladb/scylla/issues/7220 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	5ab7b0b2ea	abstract_replication_strategy: accept a token_metadata_ptr in get_pending_address_ranges methods Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	349aa966ba	abstract_replication_strategy: accept a token_metadata_ptr in get_ranges methods In preparation to returning future<dht::token_range_vector> from async variants. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	1cbe54a9cf	boot_strapper: get_*_tokens: use token_metadata_ptr To facilitate preempting of long running loops if needed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	63137b35ea	range_streamer: convert to token_metadata_ptr Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	6cba82a792	repair: accept a token_metadata_ptr in repair based node ops Only replace_with_repair needs to clone the token_metadata and update the local copy, so we can safely pass a read-only snapshot of the token_metadata rather than copying it in all cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	7697c0f129	cdc: generation: use token_metadata_ptr So it could be safely held across continuations. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	ecda21224e	storage_service: replicate_to_all_cores: make exception safe Perform replication in 2 phases. First phase just clones the mutable_token_metadata_ptr on all shards. Second phase applies the cloned copies onto each local_ss._shared_token_metadata. That phase should never fail. To add suspenders over the belt, in the impossible case we do get an exception, it is logged and we abort. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	41c7efd0c0	storage_service: convert to token_metadata_ptr clone _token_metadata for updating into _updated_token_metadata and use it to update the local token_metadata on all shard via do_update_pending_ranges(). Adjust get_token_metadata to get either the update the updated_token_metadata, if available, or the base token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	fa880439c9	storage_service: use token_metadata_lock to serialize updates to token_metadata Rather than using `serialized_action`, grab a lock before mutating _token_metadata and hold it until its replicated to all shards. A following patch will use a mutable token_metadata_ptr that is updated out of line under the lock. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	476b4daa48	storage_service: convert to shared_token_metadata In preparation to using token_metadata_ptr and token_metadata_lock. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	88a4c6de13	storage_service: init_server: replicate_to_all_cores after updating token_metadata Currently the replication to other shards happens later in `prepare_to_join` that is called in `init_server`. We should isolate the changes made by init_server and update them first to all shards so that we can serialize them easily using a lock and a mutable_token_metadata_ptr, otherwise the lock and the mutable_token_metadata_ptr will have to be handed over (from this call path) to `prepare_to_join`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	b13156de7d	storage_service: use get_token_metadata and get_mutable_token_metadata methods In preparation to converting to using shared_token_metadata internally. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	572638671c	storage_proxy: query_ranges_to_vnodes_generator ranges_to_vnodes: use token_metadata_ptr Fixes use-after-free seen with putget_with_reloaded_certificates_test: ``` ==215==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000a8b180 at pc 0x000012eb5a83 bp 0x7ffd2c16d4c0 sp 0x7ffd2c16d4b0 READ of size 8 at 0x603000a8b180 thread T0 #0 0x12eb5a82 in std::__uniq_ptr_impl<locator::token_metadata_impl, std::default_delete<locator::token_metadata_impl> >::_M_ptr() const /usr/include/c++/10/bits/unique_ptr.h:173 #1 0x12ea230d in std::unique_ptr<locator::token_metadata_impl, std::default_delete<locator::token_metadata_impl> >::get() const /usr/include/c++/10/bits/unique_ptr.h:422 #2 0x12e8d3e8 in std::unique_ptr<locator::token_metadata_impl, std::default_delete<locator::token_metadata_impl> >::operator->() const /usr/include/c++/10/bits/unique_ptr.h:416 #3 0x12e5d0a2 in locator::token_metadata::ring_range(std::optional<interval_bound<dht::ring_position> > const&, bool) const locator/token_metadata.cc:1712 #4 0x112d0126 in service::query_ranges_to_vnodes_generator::process_one_range(unsigned long, std::vector<nonwrapping_interval<dht::ring_position>, std::allocator<nonwrapping_interval<dht::ring_position> > >&) service/storage_proxy.cc:4658 #5 0x112cf3c5 in service::query_ranges_to_vnodes_generator::operator()(unsigned long) service/storage_proxy.cc:4616 #6 0x112b2261 in service::storage_proxy::query_partition_key_range_concurrent(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >, std::vector<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, std::allocator<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&, seastar::lw_shared_ptr<query::read_command>, db::consistency_level, service::query_ranges_to_vnodes_generator&&, int, tracing::trace_state_ptr, unsigned long, unsigned int, std::unordered_map<nonwrapping_interval<dht::token>, std::vector<utils::UUID, std::allocator<utils::UUID> >, std::hash<nonwrapping_interval<dht::token> >, std::equal_to<nonwrapping_interval<dht::token> >, std::allocator<std::pair<nonwrapping_interval<dht::token> const, std::vector<utils::UUID, std::allocator<utils::UUID> > > > >, service_permit) service/storage_proxy.cc:4023 #7 0x112b094e in operator() service/storage_proxy.cc:4160 #8 0x1139c8bb in invoke<service::storage_proxy::query_partition_key_range_concurrent(seastar::lowres_clock::time_point, std::vector<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >&&, seastar::lw_shared_ptr<query::read_command>, db::consistency_level, service::query_ranges_to_vnodes_generator&&, int, tracing::trace_state_ptr, uint64_t, uint32_t, service::replicas_per_token_range, service_permit)::<lambda(seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:2088 #9 0x1136625b in futurize_invoke<service::storage_proxy::query_partition_key_range_concurrent(seastar::lowres_clock::time_point, std::vector<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >&&, seastar::lw_shared_ptr<query::read_command>, db::consistency_level, service::query_ranges_to_vnodes_generator&&, int, tracing::trace_state_ptr, uint64_t, uint32_t, service::replicas_per_token_range, service_permit)::<lambda(seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:2119 #10 0x11366372 in operator()<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1480 #11 0x1139cc3b in call /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:145 #12 0x116f4944 in seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>::operator()(seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&) const /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:201 #13 0x116b3397 in seastar::future<service::query_partition_key_range_concurrent_result> std::__invoke_impl<seastar::future<service::query_partition_key_range_concurrent_result>, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >(std::__invoke_other, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&) /usr/include/c++/10/bits/invoke.h:60 #14 0x1165c3a6 in std::__invoke_result<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::type std::__invoke<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&) /usr/include/c++/10/bits/invoke.h:96 #15 0x115e6542 in decltype(auto) std::__apply_impl<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >, 0ul>(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >&&, std::integer_sequence<unsigned long, 0ul>) /usr/include/c++/10/tuple:1724 #16 0x115e6663 in decltype(auto) std::apply<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >&&) /usr/include/c++/10/tuple:1736 #17 0x115e63f9 in seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}::operator()(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&) const::{lambda()#1}::operator()() const /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1530 #18 0x1165c4b9 in void seastar::futurize<seastar::future<service::query_partition_key_range_concurrent_result> >::satisfy_with_result_of<seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}::operator()(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:2073 #19 0x115e61f5 in seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}::operator()(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&) const /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1528 #20 0x1176e9cc in seastar::continuation<seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::run_and_dispose() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:746 #21 0x16a9a455 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2196 #22 0x16a9e691 in seastar::reactor::run_some_tasks() /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2575 #23 0x16aa390e in seastar::reactor::run() /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2730 #24 0x168ae4f7 in seastar::app_template::run_deprecated(int, char, std::function<void ()>&&) /local/home/bhalevy/dev/scylla/seastar/src/core/app-template.cc:207 #25 0x168ac541 in seastar::app_template::run(int, char, std::function<seastar::future<int> ()>&&) /local/home/bhalevy/dev/scylla/seastar/src/core/app-template.cc:115 #26 0xd6cd3c4 in main /local/home/bhalevy/dev/scylla/main.cc:504 #27 0x7f8d905d8041 in __libc_start_main (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libc.so.6+0x27041) #28 0xd67c9ed in _start (/local/home/bhalevy/.dtest/dtest-o0qoqmkr/test/node3/bin/scylla+0xd67c9ed) 0x603000a8b180 is located 16 bytes inside of 24-byte region [0x603000a8b170,0x603000a8b188) freed by thread T0 here: #0 0x7f8d92a190cf in operator delete(void, unsigned long) (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libasan.so.6+0xb30cf) #1 0xd7ebe54 in seastar::internal::lw_shared_ptr_accessors_no_esft<locator::token_metadata>::dispose(seastar::lw_shared_ptr_counter_base) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:213 #2 0x112b155d in seastar::lw_shared_ptr<locator::token_metadata const>::~lw_shared_ptr() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:300 #3 0x112b155d in ~<lambda> service/storage_proxy.cc:4137 #4 0x1132e92d in ~<lambda> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1479 #5 0x1139cc91 in destroy /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:148 #6 0x11565673 in seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>::~noncopyable_function() /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:181 #7 0x1176e783 in seastar::continuation<seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::~continuation() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:729 #8 0x1176ea06 in seastar::continuation<seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::run_and_dispose() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:750 #9 0x16a9a455 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2196 #10 0x16a9e691 in seastar::reactor::run_some_tasks() /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2575 #11 0x16aa390e in seastar::reactor::run() /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2730 #12 0x168ae4f7 in seastar::app_template::run_deprecated(int, char, std::function<void ()>&&) /local/home/bhalevy/dev/scylla/seastar/src/core/app-template.cc:207 #13 0x168ac541 in seastar::app_template::run(int, char, std::function<seastar::future<int> ()>&&) /local/home/bhalevy/dev/scylla/seastar/src/core/app-template.cc:115 #14 0xd6cd3c4 in main /local/home/bhalevy/dev/scylla/main.cc:504 #15 0x7f8d905d8041 in __libc_start_main (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libc.so.6+0x27041) previously allocated by thread T0 here: #0 0x7f8d92a18067 in operator new(unsigned long) (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libasan.so.6+0xb2067) #1 0x13cf7132 in seastar::lw_shared_ptr<locator::token_metadata> seastar::lw_shared_ptr<locator::token_metadata>::make<locator::token_metadata>(locator::token_metadata&&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:266 #2 0x13cc3bfa in seastar::lw_shared_ptr<locator::token_metadata> seastar::make_lw_shared<locator::token_metadata>(locator::token_metadata&&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:422 #3 0x13ca3007 in seastar::lw_shared_ptr<locator::token_metadata> locator::make_token_metadata_ptr<locator::token_metadata>(locator::token_metadata) locator/token_metadata.hh:338 #4 0x13c9bdd4 in locator::shared_token_metadata::clone() const locator/token_metadata.hh:358 #5 0x13c9c18a in service::storage_service::get_mutable_token_metadata_ptr() service/storage_service.hh:184 #6 0x13a5a445 in service::storage_service::handle_state_normal(gms::inet_address) service/storage_service.cc:1129 #7 0x13a6371c in service::storage_service::on_change(gms::inet_address, gms::application_state, gms::versioned_value const&) service/storage_service.cc:1421 #8 0x12a86269 in operator() gms/gossiper.cc:1639 #9 0x12ad3eea in call /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:145 #10 0x12be2aff in seastar::noncopyable_function<void (seastar::shared_ptr<gms::i_endpoint_state_change_subscriber>)>::operator()(seastar::shared_ptr<gms::i_endpoint_state_change_subscriber>) const /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:201 #11 0x12bb8e98 in atomic_vector<seastar::shared_ptr<gms::i_endpoint_state_change_subscriber> >::for_each(seastar::noncopyable_function<void (seastar::shared_ptr<gms::i_endpoint_state_change_subscriber>)>) utils/atomic_vector.hh:62 #12 0x12a8662b in gms::gossiper::do_on_change_notifications(gms::inet_address, gms::application_state const&, gms::versioned_value const&) gms/gossiper.cc:1638 #13 0x12a9387c in operator() gms/gossiper.cc:1978 #14 0x12b49b20 in __invoke_impl<void, gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()> > /usr/include/c++/10/bits/invoke.h:60 #15 0x12b21fd6 in __invoke<gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()> > /usr/include/c++/10/bits/invoke.h:95 #16 0x12b02865 in __apply_impl<gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()>, std::tuple<> > /usr/include/c++/10/tuple:1723 #17 0x12b028d8 in apply<gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()>, std::tuple<> > /usr/include/c++/10/tuple:1734 #18 0x12b02967 in apply<gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:2052 #19 0x12ad866a in operator() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/thread.hh:258 #20 0x12b609c2 in call /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:116 #21 0xdfabb5f in seastar::noncopyable_function<void ()>::operator()() const /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:201 #22 0x16e21bb4 in seastar::thread_context::main() /local/home/bhalevy/dev/scylla/seastar/src/core/thread.cc:297 #23 0x16e2190f in seastar::thread_context::s_main(int, int) /local/home/bhalevy/dev/scylla/seastar/src/core/thread.cc:275 #24 0x7f8d9060322f (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libc.so.6+0x5222f) ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	3fab0f8694	storage_proxy: convert to shared_token_metadata get() the latest token_metadata_ptr from the shared_token_metadata before each use. expose get_token_metadata_ptr() rather than get_token_metadata() so that caller can keep it across continuations. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	a0436ea324	gossiper: convert to shared_token_metadata get() the latest token_metadata& from the shared_token_metadata before each use. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	6d06853e6c	abstract_replication_strategy: convert to shared_token_metadata To facilitate that, keep a const shared_token_metadata& in class database rather than a const token_metadata& Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	f5f28e9b36	test: network_topology_strategy_test: constify calculate_natural_endpoints In preparation to chaging network_topology_strategy to accept a const shared_token_metadata& rather than token_metadata&. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	45fb57a2ec	abstract_replication_strategy: pass token_metadata& to get_cached_endpoints Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	ade8c77a7c	abstract_replication_strategy: pass token_metadata& to do_get_natural_endpoints Rather than accessing abstract_replication_strategy::_token_metedata directly. In preparation to changing it to a shared_token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	29ed59f8c4	main: start a shared_token_metadata And use it to get a token_metadata& compatible with current usage, until the services are converted to use token_metadata_ptr. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	9d2cffe7ab	storage_service: make class a peering_storage_service No need to call the global service::get_storage_service() from within the class non-static methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	b41a1cf472	storage_service: report all errors from update_pending_ranges and replicate_to_all_cores Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	4188a0b384	storage_service: do_replicate_to_all_cores: call on_internal_error if failed Now that `replicate_tm_only` doesn't throw, we handle all errors in `replicate_tm_only().handle_exception`. We can't just proceed with business as usual if we failed to replicate token_metadata on all shards and continue working with inconsistent copies. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	585a447168	storage_service: make replicate_tm_only noexcept And with that mark also do_replicate_to_all_cores as noexcept. The motivation to do so is to catch all errors in replicate_tm_only and calling on_internal_error in the `handle_exception` continuation in do_replicate_to_all_cores. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	f287346186	storage_service: update_topology: use replicate_to_all_cores Rather than calling invalidate_cached_rings and update_topology on all shards do that only on shard 0 and then replicate to all other shards using replicate_to_all_cores as we do in all other places that modify token_metadata. Do this in preparation to using a token_metadata_ptr with which updating of token_metadata is done on a cloned copy (serialized under a lock) that becomes visible only when applied with replicate_to_all_cores. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	9217d5661a	storage_service: make get_mutable_token_metadata private Now that update_topology was moved to class storage_service. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	0e739aa801	storage_service: add update_topology method Move the functionality from gossiping_property_file_snitch::reload_configuration to the storage_service class. With that we can make get_mutable_token_metadata private. TODO: update token_metadata on shard 0 and then replicate_to_all_cores rather than updating on all shards in parallel. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	d629aa22f5	storage_service: keyspace_changed invoke update_pending_ranges on shard 0 keyspace_changed just calls update_pending_ranges (and ignoring any errors returned from it), so invoke it on shard 0, and with that update_pending_ranges() is always called on shard 0 and it doesn't need to use `invoke_on` shard 0 by itself. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	ffee694a43	storage_service: make keyspace_changed and update_pending_ranges private Both are called only internally in the class. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	6eb20c529c	storage_service: init_server must be called on shard 0 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	a7df2c215f	storage_service: simplify shard 0 sanity checks We need to assert in only 2 places: do_update_pending_ranges, that updates token metadata, and replicate_tm_only, that copies the token metadata to all other shards. Currently we throw errors if this is violated but it should never happen and it's not really recoverable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	1c16bee81d	storage_service: do_replicate_to_all_cores in do_update_pending_ranges Currently update_pending_ranges involves 2 serialized actions: do_update_pending_ranges, and then replicate_to_all_cores. These can be combind by calling do_replicate_to_all_cores directly from do_update_pending_ranges. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	d6805348ff	storage_service: get rid of update_pending_ranges_nowait It was introduced in `74b4035611` As part of the fix for #3203. However, the reactor stalls have nothing to do with gossip waiting for update_pending_ranges - they are related to it being synchronous and quadratic in the number of tokens (e.g. get_address_ranges calling calculate_natural_endpoints for every token then simple_strategy::calculate_natural_endpoints calling get_endpoint for every token) There is nothing special in handle_state_leaving that requires moving update_pending_ranges to the background, we call update_pending_ranges in many other places and wait for it so if gossip loop waiting on it was a real problem, then it'd be evident in many other places. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	b6c1dffe88	storage_service: handle_state_normal: update_pending_ranges earlier Currently _update_pending_ranges_action is called only on shard 0 and only later update_pending_ranges() updates shard 0 again and replicates the result to all shards. There is no need to wait between the two, and call _update_pending_ranges_action again, so just call update_pending_ranges() in the first place. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	aa8bdc2c0f	storage_service: handle_state_bootstrap: update_pending_ranges only after updating host_id so that the updated host_id (on shard 0) will get replicated to all shards via update_pending_ranges. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	c2c7baef3b	storage_service: on_change: no need to call replicate_to_all_cores It's already done by each handle_state_* function either by directly calling replicate_to_all_cores or indirectly, via update_pending_renages. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	ebfc4c6f4b	storage_service: join_token_ring: replicate_to_all_cores early Currently the updates to token_metadata are immediately visible on shard 0, but not to other shards until replicate_to_all_cores syncs them. To prepare for converting to using shared token_metadata. In the new world the updated token_metadata is not visible until committed to the shared_token_metadata, so commit it here and replicate to all other shards. It is not clear this isn't needed presently too. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Botond Dénes	f5323b29d9	mutation_reader: queue_reader: don't set EOS flag on abort If the consumer happens to check the EOS flag before it hits the exception injected by the abort (by calling fill_buffer()), they can think the stream ended normally and expect it to be valid. However this is not guaranteed when the reader is aborted. To avoid consumers falsely thinking the stream ended normally, don't set the EOS flag on abort at all. Additionally make sure the producer is aborted too on abort. In theory this is not needed as they are the one initiating the abort, but better to be safe then sorry. Fixes: #7411 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20201102100732.35132-1-bdenes@scylladb.com>	2020-11-11 13:44:25 +02:00
Pekka Enberg	ba6a2b68d1	cql-pytest/test_keyspace.py: Add test case for double WITH issue Let's add a test case for CASSANDRA-9565, similar to the unit test in Apache Cassandra: https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/validation/operations/CreateTest.java#L546 Message-Id: <20201111104251.19932-1-penberg@scylladb.com>	2020-11-11 13:39:57 +02:00
Avi Kivity	5b312a1238	Merge "sstables: make move_to_new_dir idempotent" from Benny " Today, if scylla crashes mid-way in sstable::idempotent-move-sstable or sstable::create_links we may end up in an inconsistent state where it refuses to restart due to the presence of the moved- sstable component files in both the staging directory and main directory. This series hardens scylla against this scenario by: 1. Improving sstable::create_links to identify the replay condition and support it. 2. Modifying the algorithm for moving sstables between directories to never be in a state where we have two valid sstable with the same generation, in both the source and destination directories. Instead, it uses the temporary TOC file as a marker for rolling backwards or forewards, and renames it atomically from the destination directory back to the source directory as a commit point. Before which it is preparing the sstable in the destination dir, and after which it starts the process of deleting the sstable in the source dir. Fixes #7429 Refs #5714 " * tag 'idempotent-move-sstable-v3' of github.com:bhalevy/scylla: sstable: create_links: support for move sstable_directory: support sstables with both TemporaryTOC and TOC sstable: create_links: move automatic sstring variables sstable: create_links: use captured comps sstable: create_links: capture dir by reference sstable: create_links: fix indentation sstable: create_links: no need to roll-back on failure anymore sstable: create_links: support idempotent replay sstable: create_links: cleanup style sstable: create_links: add debug/trace logging sstable: move_to_new_dir: rm TOC last sstable: move_to_new_dir: io check remove calls test: add sstable_move_test	2020-11-11 12:57:39 +02:00
Avi Kivity	017174670b	Update frozen toolchain for python3-urwid-2.1.2 urwid 2.1.0 struggles with some locale settings. 2.1.2 fixes the problem. Fixes #7487.	2020-11-11 11:54:05 +02:00
Nadav Har'El	44e0cb177e	cql-pytest: convert also run-cassandra to Python Previously, test/cql-pytest/run was a Python script, while test/cql-pytest/run-cassandra (to run the tests against Cassandra) was still a shell script - modeled after test/alternator/run. This patch makes rewrites run-cassandra in Python. A lot of the same code is needed for both run and run-cassandra tools. test/cql-pytest/run was already written in a way that this common code was separate functions. For example, functions to start a server in a temporary directory, to check when it finishes booting, and to clean up at the end. This patch moves this common code to a new file, "run.py" - and the tools "run" and "cassandra-run" are very short programs which mostly use functions from run.py (run-cassandra also has some unique code to run Cassandra, that no other test runner will need). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201110215210.741753-1-nyh@scylladb.com>	2020-11-11 10:57:21 +02:00
Takuya ASADA	5867af4edd	install.sh: set PATH for relocatable CLI tools in python thunk We currently set PATH for relocatable CLI tools in scylla_util.run() and scylla_util.out(), but it doesn't work for perftune.py, since it's not part of Scylla, does not use scylla_util module. We can set PATH in python thunk instead, it can set PATH for all python scripts. Fixes #7350	2020-11-11 10:27:08 +02:00
Tomasz Grabiec	5fb3650c67	storage_service: Unify token_metadata update paths when replacing a node After full cluster shutdown, the node which is being replaced will not have its STATUS set to NORMAL (bug #6088), so listeners will not update _token_metadata. The bootstrap procedure of replacing node has a workaround for this and calls update_normal_tokens() on token metadata on behalf of the replaced node based on just its TOKENS state obtained in the shadow round. It does this only for the replacing_a_node_with_same_ip case, but not for replacing_a_node_with_diff_ip. As a result, replacing the node with the same ip after full cluster shutdown fails. We can always call update_normal_tokens(). If the cluster didn't crash, token_metadata would get the tokens. Fixes #4325 Message-Id: <1604675972-9398-1-git-send-email-tgrabiec@scylladb.com>	2020-11-11 10:25:56 +02:00
Nadav Har'El	475d8721a5	test: new "cql-pytest" test suite This patch introduces a new way to do functional testing on Scylla, similar to Alternator's test/alternator but for the CQL API: The new tests, in test/cql-pytest, are written in Python (using the pytest framework), and use the standard Python CQL driver to connect to any CQL implementation - be it Scylla, Cassandra, Amazon Keyspaces, or whatever. The use of standard CQL allows the test developer to easily run the same test against both Scylla and Cassandra, to confirm that the behaviour that our test expects from Scylla is really the "correct" (meaning Cassandra- compatible) behavior. A developer can run Scylla or Cassandra manually, and run "pytest" to connect to them (see README.md for more instructions). But even more usefully, this patch also provides two scripts: test/cql-pytest/run and test/cql-pytest/run-cassandra. These scripts automate the task of running Scylla or Cassandra (respectively) in a random IP address and temporary directory, and running the tests against it. The script test/cql-pytest/run is inspired by the existing test run scripts of Alternator and Redis, but rewritten in Python in a way that will make it easy to rewrite - in a future patch - all these other run scripts to use the same common code to safely run a test server in a temporary directory. "run" is extremely quick, taking around two seconds to boot Scylla. "run-cassandra" is slower, taking 13 seconds to boot Cassandra (maybe this can be improved in the future, I still don't know how). The tests themselves take milliseconds. Although the 'run' script runs a single Scylla node, the developer can also bring up any size of Scylla or Cassandra cluster manually and run the tests (with "pytest") against this cluster. This new test framework differs from the existing alternatives in the following ways: dtest: dtest focuses on testing correctness of distributed behavior, involving clusters of multiple nodes and often cluster changes during the test. In contrast, cql-pytest focuses on testing the functionality of a large number of small CQL features - which can usually be tested on a single-node cluster. Additionally, dtest is out-of-tree, while cql-pytest is in-tree, making it much easier to add or change tests together with code patches. Finally, dtest tests are notoriously slow. Hundreds of tests in the new framework can finish faster than a single dtest. Slow and out-of-tree tests are difficult to write, and I believe this explains why no developer loves writing dtests and maintainers do not insist on having them. I hope cql-pytest can change that. test/cql: The defining difference between the existing test/cql suite and the new test/cql-pytest is the new framework is programmatic, Python code, not a text file with desired output. Tests written with ` code allow things like looping, repeating the same test with different parameters. Also, when a test fails, it makes it easier to understand why it failed beyond just the fact that the output changed. Moreover, in some cases, the output changes benignly and cql-pytest may check just the desired features of the output. Beyond this, the current version of test/cql cannot run against Cassandra. test/cql-pytest can. The primary motivation for this new framework was https://github.com/scylladb/scylla/issues/7443 - where we had an esoteric feature (sort order of partitions when an index is addded), which can be shown in Cqlsh to have what we think is incorrect behavior, and yet: 1. We didn't catch this bug because we never wrote a test for it, possibly because it too difficult to contribute tests, and 2. We thought that we knew what Cassandra does in this case, but nobody actually tested it. Yes, we can test it manually with cqlsh, but wouldn't everything be better if we could just run the same test that we wrote for Scylla against Cassandra? So one of the tests we add in this patch confirms issue #7443 in Scylla, and that our hunch was correct and Cassandra indeed does not have this problem. I also add a few trivial tests for keyspace create and drop, as additional simple examples. Refs #7443. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201110110301.672148-1-nyh@scylladb.com>	2020-11-10 19:48:23 +02:00
Benny Halevy	bc64ee5410	reloc: add ubsan-suppressions.supp to relocatable package So we can use it to suppress false-positive ubsan error when running scylla in debug mode. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201110165214.1467027-1-bhalevy@scylladb.com>	2020-11-10 19:14:27 +02:00
Benny Halevy	f36e5edd50	install.sh: add support for ubsan-suppressions Install ubsan-suppressions.supp into libexec and use it in UBSAN_OPTIONS when running scylla to suppress unwanted ubsan errors. Test: With scylla-ccm fix https://github.com/scylladb/scylla-ccm/pull/278 $ ccm create scylla-reloc-1 -n 1 --scylla --version unstable/master:latest --scylla-core-package-uri=../scylla/build/{debug,dev}/dist/tar/scylla-package.tar.gz $ ccm start Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201110165214.1467027-2-bhalevy@scylladb.com>	2020-11-10 19:14:26 +02:00
Piotr Sarna	e5f2fb2a4d	codeowners: add a couple of Botonds since he's our resident readers specialist. Closes #7585	2020-11-10 18:22:52 +02:00
Avi Kivity	756b14f309	Merge 'cql3: Drop unneeded filtering when continuous clustering-key is selected' from Dejan Mircevski I noticed that we require filtering for continuous clustering key, which is not necessary. I dropped the requirement and made sure the correct data is read from the storage proxy. The corresponding dtest PR: https://github.com/scylladb/scylla-dtest/pull/1727 Tests: unit (dev,debug), dtest (next-gating, cqlpy) Closes #7460 github.com:scylladb/scylla: cql3: Delete some newlines cql3: Drop superfluous ALLOW FILTERING cql3: Drop unneeded filtering for continuous CK	2020-11-10 17:41:00 +02:00
Piotr Sarna	2e544a0c89	storage_proxy: add metrics for too many in-flight hints failures When there are too many in-flight hints, writes start returning overloaded exceptions. We're missing metrics for that, and these could be useful when judging if the system is in overloaded state.	2020-11-10 16:26:18 +01:00
Botond Dénes	7f07b95dd3	utils/chunked_vector: reserve_partial(): better explain how to properly use Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20201110130953.435123-1-bdenes@scylladb.com>	2020-11-10 15:45:01 +02:00
Eliran Sinvani	8380ac93c5	build: Make artifacts product aware This commit changes the build file generation and the package creation scripts to be product aware. This will change the relocatable package archives to be named after the product, this commit deals with two main things: 1. Creating the actual Scylla server relocatable with a product prefixed name - which is independent of any other change 2. Expect all other packages to create product prefixed archive - which is dependant uppon the actual submodules creating product prefixed archives. If the support is not introduced in the submodules first this will break the package build. Tests: Scylla full build with the original product and a different product name. Closes #7581	2020-11-10 14:38:10 +02:00
Takuya ASADA	f8c7d899b4	dist/debian: fix typo for scylla-server.service filename Currently debian_files_gen.py mistakenly renames scylla-server.service to "scylla-server." on non-standard product name environment such as scylla-enterprise, it should be fix to correct filename. Fixes #7423	2020-11-10 10:38:41 +02:00
Pavel Solodovnikov	2997f6bd2e	cmake: redesign scylla's `CMakeLists.txt` to finally allow full-fledged building This patch introduces many changes to the Scylla `CMakeLists.txt` to enable building Scylla without resorting to pre-building with a previous configure.py build, i.e. cmake script can now be used as a standalone solution to build and execute scylla. Submodules, such as Seastar and Abseil, are also dealt with by importing their CMake scripts directly via `add_subdirectory` calls. Other submodules, such as `libdeflate` now have a custom command to build the library at runtime. There are still a lot of things that are incomplete, though: * Missing auxiliary packaging targets * Unit-tests are not built (First priority to address in the following patches) * Compile and link flags are mostly hardcoded to the values appropriate for the most recent Fedora 33 installation. System libraries should be found via built-in `Find` scripts, compiler and linker flags should be observed and tested by executing feature tests. The current build is aimed to be built by GCC, need to support Clang since we are moving to it. * Utility cmake functions should be moved to a separate "cmake" directory. The script is updated to use the most recent CMake version available in Fedora 33, which is 3.18. Right now this is more of a PoC rather that a full-fledged solution but as far as it's not used widely, we are free to evolve it in a relaxed manner, improving it step by step to achieve feature parity with `configure.py` solution. The value in this patch is that now we are able to use any C++ IDE capable of dealing with CMake solutions and take advantage of their built-in capabilities, such as: * Building a code model to efficiently navigate code. * Find references to symbols. * Use pretty-printers, beautifiers and other tools conveniently. * Run scylla and debug it right from the IDE. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20201103221619.612294-1-pa.solodovnikov@scylladb.com>	2020-11-10 10:34:27 +02:00
Nadav Har'El	78c598e08e	alternator: add missing TableId field to DescribeTable response DescribeTable should return a UUID "TableId" in its reponse. We alread had it for CreateTable, and now this patch adds it to DescribeTable. The test for this feature is no longer xfail. Moreover, I improved the test to not only check that the TableId field is present - it should also match the documented regular expression (the standard representation of a UUID). Refs #5026 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201104114234.363046-1-nyh@scylladb.com>	2020-11-09 20:21:47 +01:00
Benny Halevy	0af54f3324	sstable: create_links: support for move When moving a sstable between directories, we would like to be able to crash at any point during the algorithm with a clear way to either roll the operation forwards or backwards. To achieve that, define sstable::create_links_common that accepts a `mark_for_removal` flag, implementing the following algorithm: 1. link src.toc to dst.temp_toc. until removed, the destination sstable is marked for removal. 2. link all src components to dst. crashing here will leave dst with both temp_toc and toc. 3. a. if mark_for_removal is unset then just remove dst.temp_toc. this is commit the destination sstable and complete create_links. b. if mark_for_removal is set then move dst.temp_toc to src.temp_toc. this will atomically toggle recovery after crash from roll-back to roll-forward. here too, crashing at this point will leave src with both temp_toc and toc. Adjust the unit test for the revised algorithm. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:57:40 +02:00
Benny Halevy	d893cbd918	sstable_directory: support sstables with both TemporaryTOC and TOC Keep descriptors in a map so it could be searched easily by generation. and possibly delete the descriptor, if found, in the presence of a temporary toc component. A following patch will add support to create_links for moving sstables between directories. It is based on keeping a TemporaryTOC file in the destination directory while linking all source components. If scylla crashes here, the destination sstable will have both its TemporaryTOC and TOC components and it needs to be removed to roll the move backwards. Then, create_links will atomically move the TemporaryTOC from the destination back to the source directory, to toggle rolling back to rolling forward by marking the source sstable for removal. If scylla crashes here, the source sstable will have both its TemporaryTOC and TOC components and it needs to be removed to roll the move forward. Add unit test for this case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:57:40 +02:00
Benny Halevy	7c74222037	sstable: create_links: move automatic sstring variables Rather than copy them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:57:40 +02:00
Benny Halevy	9a906d4d69	sstable: create_links: use captured comps Now that all_components() is held by `do_with`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:57:25 +02:00
Benny Halevy	a59911a84c	sstable: create_links: capture dir by reference Now that it's held with `do_with`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:55:43 +02:00
Benny Halevy	07f80e0521	sstable: create_links: fix indentation Previous patch was optimized for reviewabilty. Now cleanup indentation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:55:32 +02:00
Benny Halevy	6bee63158c	sstable: create_links: no need to roll-back on failure anymore Now that we use `idempotent_link_file` it'll no longer fail with EEXIST in a replay scenario. It may fail on ENOENT, and return an exceptional future. This will be propagated up the stack. Since it may indicate parallel invokation of move_to_new_dir, that deletes the source sstable while this thread links it to the same destination, rolling back by removing the destination links would be dangerous. For an other error, the node is going to be isolated and stop operation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:44:55 +02:00
Benny Halevy	65a3b0e51c	sstable: create_links: support idempotent replay Handle the case where create_link is replayed after crashing in the middle. In particular, if we restart when moving sstables from staging to the base dir, right after create_links completes, and right before deleting the source links, we end up with seemingly 2 valid sstables, one still in staging and the other already in the base table directory, both are hard linked to the same inodes. Make create_links idempotent so it can replay the operation safely if crashed and restarted at any point of its operation. Add unit tests for replay after partial create_links that is expected to succeed, and a test for replay when an sstable exist in the destination that is not hard-linked to the source sstable; create_links is expected to fail in this case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:44:42 +02:00
Benny Halevy	f0a57deed7	sstable: create_links: cleanup style Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:44:27 +02:00
Benny Halevy	55f781689a	sstable: create_links: add debug/trace logging Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:44:11 +02:00
Benny Halevy	884fc07e20	sstable: move_to_new_dir: rm TOC last To facilitate cleanup on crash, first rename the TOC file to TOC.tmp, and keep until all other files are removed, finally remove TOC.tmp. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:44:04 +02:00
Benny Halevy	ca76ebb898	sstable: move_to_new_dir: io check remove calls We need to check these to detect critical errors while removing the source sstable files. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:43:38 +02:00
Benny Halevy	818af720d7	test: add sstable_move_test Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-09 19:43:28 +02:00
Benny Halevy	8bcdf39a18	hints/manager: scan_for_hints_dirs: fix use-after-move This use-after move was apprently exposed after switching to clang in commit `eb861e68e9`. The directory_entry is required for std::stoi(de.name.c_str()) and later in the catch{} clause. This shows in the node logs as a "Ignore invalid directory" debug log message with an empty name, and caused the hintedhandoff_rebalance_test to fail when hints files aren't rebalanced. Test: unit(dev) DTest: hintedhandoff_additional_test.py:TestHintedHandoff.hintedhandoff_rebalance_test (dev, debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201106172017.823577-1-bhalevy@scylladb.com>	2020-11-09 16:32:54 +01:00
Takuya ASADA	4410934829	install.sh: show warning nonroot mode when systemd does not support user mode On older distribution such as CentOS7, it does not support systemd user mode. On such distribution nonroot mode does not work, show warning message and skip running systemctl --user. Fixes #7071	2020-11-09 12:16:35 +02:00
Piotr Wojtczak	72c7f25a29	db: add TransitionalAuthorizer and TransitionalAuthenticator... ... to config descriptions We allow setting the transitional auth as one of the options in scylla.yaml, but don't mention it at all in the field's description. Let's change that. Closes #7565	2020-11-09 10:51:54 +01:00
Gleb Natapov	a01dd636ea	suppress ubsan error in boost::deque::clear() The function is used by raft and fails with ubsan and clang. The ub is harmless. Lets wait for it to be fixed in boost. Message-Id: <20201109090353.GZ3722852@scylladb.com>	2020-11-09 11:25:19 +02:00
Bentsi Magidovich	956b97b2a8	scylla_util.py: fix exception handling in curl Retry mechanism didn't work when URLError happend. For example: urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable> Let's catch URLError instead of HTTP since URLError is a base exception for all exceptions in the urllib module. Fixes: #7569 Closes #7567	2020-11-09 10:20:35 +02:00
Benny Halevy	02f5659f21	sstables mx/writer: clustering_blocks_input_range::next: warn on potentially bad key If _offset falls beyond compound_type->types().size() ignore the extra components instead of accessing out of the types vector range. FIXME: we should validate the thrift key against the schema and reject it in the thrift handler layer. Refs #7568 Test: unit(dev) DTest: cql_tests.py:MiscellaneousCQLTester.cql3_insert_thrift_test (dev, debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201108175738.1006817-1-bhalevy@scylladb.com>	2020-11-08 20:53:14 +02:00
Avi Kivity	6b4a7fa515	Revert "Revert "config: Do not enable repair based node operations by default"" This reverts commit `71d0d58f8c`. Repair based node operations are still not ready and will be re-enabled after more testing and fixes.	2020-11-08 14:09:50 +02:00
Michał Chojnowski	1eb19976b9	database: make changes to durable_writes effective immediately Users can change `durable_writes` anytime with ALTER KEYSPACE. Cassandra reads the value of `durable_writes` every time when applying a mutation, so changes to that setting take effect immediately. That is, mutations are added to the commitlog only when `durable_writes` is `true` at the moment of their application. Scylla reads the value of `durable_writes` only at `keyspace` construction time, so changes to that setting take effect only after Scylla is restarted. This patch fixes the inconsistency. Fixes #3034 Closes #7533	2020-11-06 17:53:22 +01:00
Tomasz Grabiec	894abfa6fc	Merge "raft: miscellaneous fixes" from Kostja This series provides assorted fixes which are a pre-requisite for the joint consensus implementation series which follows. * scylla-dev/raft-misc: raft: fix raft_fsm_test flakiness raft: drop a waiter of snapshoted entry raft: use correct type for node info in add_server() raft: overload operator<< for debugging	2020-11-06 15:34:16 +01:00
Konstantin Osipov	c4bbbac975	raft: fix raft_fsm_test flakiness When election_threshold expires, the current node can become a candidate, in which case it won't switch back to follower state upon vote_request.	2020-11-06 17:06:07 +03:00
Gleb Natapov	552745d3d3	raft: drop a waiter of snapshoted entry An index that is waited can be included in an installed snapshot in which case there is no way to know if the entry was committed or not. Abort such waiters with an appropriate error.	2020-11-06 17:06:07 +03:00
Gleb Natapov	8bab38c6fa	raft: use correct type for node info in add_server()	2020-11-06 17:06:07 +03:00
Alejo Sanchez	2e4977b24c	raft: overload operator<< for debugging Overload operator<< for ostream and print relevant state for server, fsm, log, and typed_uint64 types. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-11-06 17:06:07 +03:00
Tomasz Grabiec	3591e7dffd	Merge "Remove unused args from range_tombstone methods" from Pavel Emelyanov * https://github.com/xemul/scylla/tree/br-range-tombstone-unused-args-2: range_tombstone: Remove unused trim-front arg from .apply() range_tombstone: Undefault argument in .apply range_tombstone: Remove unused schema arg from .set_start	2020-11-06 15:04:15 +01:00
Tomasz Grabiec	6d0d55aa72	Merge "Unglobal query processor instance" from Pavel Emelyanov The query processor is present in the global namespace and is widely accessed with global get(_local)?_query_processor(). There's a long-term task to get rid of this globality and make services and componenets reference each-other and, for and due-to this, start and stop in specific order. This set makes this for the query processor. The remaining users of it are -- alternator, controllers for client services, schema_tables and sys_dist_ks. All of them except for the schema_tables are fixed just by passing the reference on query processor with small patches. The schema tables accessing qp sit deep inside the paxos code, but can be "fixed" with the qctx thing until the qctx itself is de-globalized. * https://github.com/xemul/scylla/tree/br-rip-global-query-processor: code: RIP global query processor instance cql test env: Keep query processor reference on board system distributed keyspace: Start sharded service erarlier schema_tables: Use qctx to make internal requests transport: Keep sharded query processor reference on controller thrift: Keep sharded query processor reference on controller alternator: Use local query processor reference to get keys alternator: Keep local query processor reference in server	2020-11-06 14:24:41 +01:00
Pavel Emelyanov	bbd7463960	range_tombstone: Remove unused trim-front arg from .apply() The only caller of this method always passes true to it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-06 15:13:05 +03:00
Pavel Emelyanov	787a496caf	range_tombstone: Undefault argument in .apply The only purpose of this change is to compile (git-bisect safety) and thus prove that the next patch is correct. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-06 15:13:05 +03:00
Pavel Emelyanov	3da3d448c8	range_tombstone: Remove unused schema arg from .set_start Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-06 15:13:05 +03:00
Piotr Sarna	b61d4bc8d0	db: degrade view building progress loading error to warning When the view builder cannot read view building progress from an internal CQL table it produces an error message, but that only confuses the user and the test suite -- this situation is entirely recoverable, because the builder simply assumes that there is no progress and the view building should start from scratch. Fixes #7527 Closes #7558	2020-11-06 10:19:11 +02:00
Avi Kivity	512daa75a6	Merge 'repair: Use single writer for all followers' from Asias He repair: Use single writer for all followers Currently, repair master create one writer for each follower to write rows from follower to sstables. That are RF - 1 writers in total. Each writer creates 1 sstable for the range repaired, usually a vnode range. Those sstables for a given vnode range are disjoint. To reduce the compaction work, we can create one writer for all the followers. This reduces the number of sstables generated by repair significantly to one per vnode range from RF - 1 per vnode range. Fixes #7525 Closes #7528 * github.com:scylladb/scylla: repair: No more vector for _writer_done and friends repair: Use single writer for all followers	2020-11-05 18:45:07 +01:00
Gleb Natapov	e1442282d1	raft: test: do not store data in initializer_list Lifetime rules for initializer_list is weird. Use vector instead. Message-Id: <20201105111309.GT3722852@scylladb.com>	2020-11-05 18:44:50 +01:00
Michał Chojnowski	f6c33f5775	dbuild: export $HOME seen by dbuild, not by $tool The default of DBUILD_TOOL=docker requires passwordless access to docker by the user of dbuild. This is insecure, as any user with unconstrained access to docker is root equivalent. Therefore, users might prefer to run docker as root (e.g. by setting DBUILD_TOOL="sudo docker"). However, `$tool -e HOME` exports HOME as seen by $tool. This breaks dbuild when `$tool` runs docker as a another user. `$tool -e HOME="$HOME"` exports HOME as seen by dbuild, which is the intended behaviour. Closes #7555	2020-11-05 18:44:50 +01:00
Michał Chojnowski	8f74c7e162	dbuild: Replace stray use of `docker` with `$tool` Instead of invoking `$tool`, as is done everywhere else in dbuild, kill_it() invoked `docker` explicitly. This was slightly breaking the script for DBUILD_TOOL other than `docker`. Closes #7554	2020-11-05 18:44:49 +01:00
Tomasz Grabiec	fb9b5cae05	sstables: ka/la: Fix abort when next_partition() is called with certain reader state Cleanup compaction is using consume_pausable_in_thread() to skip over disowned partitions, which uses flat_mutation_reader::next_partition(). The implementation of next_partition() for the sstable reader has a bug which may cause the following assertion failure: scylla: sstables/mp_row_consumer.hh:422: row_consumer::proceed sstables::mp_row_consumer_k_l::flush(): Assertion `!_ready' failed. This happens when the sstable reader's buffer gets full when we reach the partition end. The last fragment of the partition won't be pushed into the buffer but will stay in the _ready variable. When next_partition() is called in this state, _ready will not be cleared and the fragment will be carried over to the next partition. This will cause assertion failure when the reader attempts to emit the first fragment of the next partition. The fix is to clear _ready when entering a partition, just like we clear _range_tombstones there. Fixes #7553. Message-Id: <1604534702-12777-1-git-send-email-tgrabiec@scylladb.com>	2020-11-05 18:44:49 +01:00
Nadav Har'El	7ff72b0ba5	Merge 'secondary_index: fix returned rows token ordering' from Piotr Grabowski Fixes returned rows ordering to proper signed token ordering. Before this change, rows were sorted by token, but using unsigned comparison, meaning that negative tokens appeared after positive tokens. Rename `token_column_computation` to `legacy_token_column_computation` and add some comments describing this computation. Added (new) `token_column_computation` which returns token as `long_type`, which is sorted using signed comparison - the correct ordering of tokens. Add new `correct_idx_token_in_secondary_index` feature, which flags that the whole cluster is able to use new `token_column_computation`. Switch token computation in secondary indexes to (new) `token_column_computation`, which fixes the ordering. This column computation type is only set if cluster supports `correct_idx_token_in_secondary_index` feature to make sure that all nodes will be able to compute new `token_column_computation`. Also old indexes will need to be rebuilt to take advantage of this fix, as new token column computation type is only set for new indexes. Fix tests according to new token ordering and add one new test to validate this aspect explicitly. Fixes #7443 Tested manually a scenario when someone created an index on old version of Scylla and then migrated to new Scylla. Old index continued to work properly (but returning in wrong order). Upon dropping and re-creating the index, it still returned the same data, but now in correct order. Closes #7534 * github.com:scylladb/scylla: tests: add token ordering test of indexed selects tests: fix tests according to new token ordering secondary_index: use new token_column_computation feature: add correct_idx_token_in_secondary_index column_computation: add token_column_computation token_column_computation: rename as legacy	2020-11-05 18:44:49 +01:00
Benny Halevy	f93fb55726	repair: repair_writer: do not capture lw_shared_ptr cross-shard The shared_from_this lw_shared_ptr must not be accessed across shards. Capturing it in the lambda passed to mutation_writer::distribute_reader_and_consume_on_shards causes exactly that since the captured lw_shared_ptr is copied on other shards, and ends up in memory corruption as seen in #7535 (probably due to lw_shared_ptr._count going out-of-sync when incremented/decremented in parallel on other shards with no synchronization. This was introduced in `289a08072a`. The writer is not needed in the body of this lambda anyways so it doesn't need to capture it. It is already held by the continuations until the end of the chain. Fixes #7535 Test: repair_additional_test:RepairAdditionalTest.repair_disjoint_row_3nodes_diff_shard_count_test (dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201104142216.125249-1-bhalevy@scylladb.com>	2020-11-05 18:44:49 +01:00
Tomasz Grabiec	dccd47eec6	Merge "make raft clang compatible" from Gleb " Since we are switching to clang due to raft make it actually compile with clang. " tgrabiec: Dropped the patch "raft: compile raft by default" because the replication_test still fails in debug mode: /usr/include/boost/container/deque.hpp:1802:63: runtime error: applying non-zero offset 8 to null pointer * 'raft-clang-v2' of github.com:scylladb/scylla-dev: raft: Use different type to create type dependent statement for static assertion raft: drop use of <ranges> for clang raft: make test compile with clang raft: drop -fcoroutines support from configure.py	2020-11-05 18:42:31 +01:00
Asias He	db28efb28a	repair: No more vector for _writer_done and friends Now that both repair followers and repair master use a single writer. We can get rid of the vector associated with _writer_done and friends. Fixes #7525	2020-11-05 13:28:40 +08:00
Asias He	998b153f86	repair: Use single writer for all followers Currently, repair master create one writer for each follower to write rows from follower to sstables. That are RF - 1 writers in total. Each writer creates 1 sstable for the range repaired, usually a vnode range. Those sstables for a given vnode range are disjoint. To reduce the compaction work, we can create one writer for all the followers. This reduces the number of sstables generated by repair significantly to one per vnode range from RF - 1 per vnode range. Fixes #7525	2020-11-05 13:28:40 +08:00
Pekka Enberg	edf04cd348	Update tools/python3 submodule * tools/python3 cfa27b3...1763a1a (1): > Relocatable Package: create product prefixed relocatable archive	2020-11-04 14:24:20 +02:00
Pekka Enberg	5519ce2f0e	Update tools/jmx submodule * tools/jmx c51906e...6174a47 (2): > Relocatable Package: create product prefixed relocatable archive > build(deps-dev): bump junit from 4.8.2 to 4.13.1	2020-11-04 14:24:15 +02:00
Avi Kivity	193d1942f2	build: silence gcc ABI interoperability warning on arm A gcc bug [1] caused objects built by different versions of gcc not to interoperate. Gcc helpfully warns when it encounters code that could be affected. Since we build everything with one version, and as that versions is far newer than the last version generating incorrect code, we can silence that warning without issue. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728 Closes #7495	2020-11-04 13:29:51 +02:00
Tomasz Grabiec	a7837a9a3b	Merge "Enable raft tests" from Kostja Do not run tests which are not built. For that, pass the test list from configure.py to test.py via ninja unit_test_list target. Minor cleanups. * scylla-dev.git/test.py-list: test: enable raft tests test.py: do not run tests which are not built configure.py: add a ninja command to print unit test list test.py: handle ninja mode_list failure configure.py: don't pass modes_list unless it's used	2020-11-04 12:25:04 +01:00
Piotr Grabowski	491987016c	tests: add token ordering test of indexed selects Add new test validating that rows returned from both non-indexed selects and indexed selects return rows sorted in token order (making sure that both positive and negative tokens are present to test if signed comparison order is maintained).	2020-11-04 12:02:42 +01:00
Piotr Grabowski	2bd23fbfa9	tests: fix tests according to new token ordering Fix tests to adhere to new (correct) token ordering of rows when querying tables with secondary indexes.	2020-11-04 12:02:42 +01:00
Piotr Grabowski	2342b386f4	secondary_index: use new token_column_computation Switches token column computation to (new) token_column_computation, which fixes #7443, because new token column will be compared using signed comparisons, not the previous unsigned comparison of CQL bytes type. This column computation type is only set if cluster supports correct_idx_token_in_secondary_index feature to make sure that all nodes will be able to compute (new) token_column_computation. Also old indexes will need to be rebuilt to take advantage of this fix, as new token column computation type is only set for new indexes.	2020-11-04 12:02:42 +01:00
Piotr Grabowski	6624d933c9	feature: add correct_idx_token_in_secondary_index Add new correct_idx_token_in_secondary_index feature, which will be used to determine if all nodes in the cluster support new token_column_computation. This column computation will replace legacy_token_column_computation in secondary indexes, which was incorrect as this column computation produced values that when compared with unsigned comparison (CQL type bytes comparison) resulted in different ordering than token signed comparison. See issue: https://github.com/scylladb/scylla/issues/7443	2020-11-04 12:02:42 +01:00
Piotr Grabowski	9fc2dc59b8	column_computation: add token_column_computation Introduce new token_column_computation class which is intended to replace legacy_token_column_computation. The new column computation returns token as long_type, which means that it will be ordered according to signed comparison (not unsigned comparison of bytes), which is the correct ordering of tokens.	2020-11-04 12:02:42 +01:00
Piotr Grabowski	b1350af951	token_column_computation: rename as legacy Raname token_column_computation to legacy_token_column_computation, as it will be replaced with new column_computation. The reason is that this computation returns bytes, but all tokens in Scylla can now be represented by int64_t. Moreover, returning bytes causes invalid token ordering as bytes comparison is done in unsigned way (not signed as int64_t). See issue: https://github.com/scylladb/scylla/issues/7443	2020-11-04 12:00:18 +01:00
Eliran Sinvani	4c434f3fa4	moving avarage rate: Keep computed rates in zero until they are meaningful When computing moving average rates too early after startup, the rate can be infinite, this is simply because the sample interval since the system started is too small to generate meaningful results. Here we check for this situation and keep the rate at 0 if it happens to signal that there are still no meaningful results. This incident is unlikely to happen since it can happen only during a very small time window after restart, so we add a hint to the compiler to optimize for that in order to have a minimum impact on the normal usecase. Fixes #4469	2020-11-04 11:13:59 +02:00
Avi Kivity	8aa842614a	test: gossip_test: configure database memory allocation correctly The memory configuration for the database object was left at zero. This can cause the following chain of failures: - the test is a little slow due to the machine being overloaded, and debug mode - this causes the memtable flush_controller timer to fire before the test completes - the backlog computation callback is called - this calculates the backlog as dirty_memory / total_memory; this is 0.0/0.0, which resolves to NaN - eventually this gets converted to an integer - UBSAN dooesn't like the convertion from NaN to integer, and complains Fix by initializing dbcfg.available_memory. Test: gossip_test(debug), 1000 repetitions with concurrency 6 Closes #7544	2020-11-04 09:26:08 +02:00
Calle Wilund	1db9da2353	alternator::streams: Workaround fix for apparent code gen bug in seq_number Fixes #7325 When building with clang on fedora32, calling the string_view constructor of bignum generates broken ID:s (i.e. parsing borks). Creating a temp std::string fixes it. Closes #7542	2020-11-04 09:26:08 +02:00
Benny Halevy	1d199c31f8	storage_service: check_for_endpoint_collision: copy gossip state across preemeption point Since `11a8912093`, get_gossip_status returns a std::string_view rather than a sstring. As seen in dtest we may print garbage to the log if we print the string_view after preemption (calling _gossiper.reset_endpoint_state_map().get()) Test: update_cluster_layout_tests:TestUpdateClusterLayout.simple_add_two_nodes_in_parallel_test (dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201103132720.559168-1-bhalevy@scylladb.com>	2020-11-04 09:26:08 +02:00
Konstantin Osipov	507ca98748	test: enable raft tests It's safe to do this since now the tests are only run if they are configured.	2020-11-03 21:30:11 +03:00
Konstantin Osipov	5f90582362	test.py: do not run tests which are not built Use ninja unit_test_list to find out the list of configured tests. If a test is not configured by configure.py, do not try to run it.	2020-11-03 21:30:08 +03:00
Konstantin Osipov	9198e38311	configure.py: add a ninja command to print unit test list test.py needs this list to avoid running tests which are not configured, and hence not built.	2020-11-03 21:27:45 +03:00
Konstantin Osipov	ef9c63a6d9	test.py: handle ninja mode_list failure Print an error message if the subcommand fails. Use a regular expression to match output.	2020-11-03 21:06:17 +03:00
Konstantin Osipov	7fa08496b0	configure.py: don't pass modes_list unless it's used Don't redefine modes_list if it's not used by the ninja file formatter.	2020-11-03 21:02:55 +03:00
Benny Halevy	9d91d38502	SCYLLA-VERSION-GEN: change master version to 4.4.dev Now that scylla-ccm and scylla-dtest conform to PEP-440 version comparison (See https://www.python.org/dev/peps/pep-0440/) we can safely change scylla version on master to be the development branch for the next release. The version order logic is: 4.3.dev is followed by 4.3.rc[i] followed by 4.3.[n] Note that also according to https://blog.jasonantman.com/2014/07/how-yum-and-rpm-compare-versions/ 4.3.dev < 4.3.rc[i] < 4.3.[n] as "dev" < "rc" by alphabetical order and both "dev" and "rc*" < any number, based on the general rule that alphabetical strings compare as less than numbers. Refs scylladb/scylla-machine-image#79 Test: unit Dtest: gating Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201015151153.726637-1-bhalevy@scylladb.com>	2020-11-03 13:42:54 +02:00
Avi Kivity	25e6a9e493	Merge "utils/large_bitset: reserve memory for _storage gently" from Botond " Introduce a gentle (yielding) implementation of reserve for chunked vector and use it when reserving the backing storage vector for large bitset. Large bitset is used by bloom filters, which can be quite large and have been observed to cause stalls when allocating memory for the storage. Fixes: #6974 Tests: unit(dev) " * 'gentle-reserve/v1' of https://github.com/denesb/scylla: utils/large_bitset: use reserve_partial() to reserve _storage utils/chunked_vector: add reserve_partial()	2020-11-03 13:42:54 +02:00
Tomasz Grabiec	5abddc8568	Merge "Testing performance of different collections" from Pavel Emelyanov There's a perf_bptree test that compares B+ tree collection with std::set and std::map ones. There will come more, also the "patterns" to compare are not just "fill with keys" and "drain to empty", so here's the perf_collection test, that measures timings of - fill with keys - drain key by key - empty with .clear() call - full scan with iterator - insert-and-remove of a single element for currently used collections - std::set - std::map - intrusive_set_external_comparator - bplus::tree * https://github.com/xemul/scylla/tree/br-perf-collection-test: test: Generalize perf_bptree into perf_collection perf_collection: Clear collection between itartions perf_collection: Add intrusive_set_external_comparator perf_collection: Add test for single element insertion perf_collection: Add test for destruction with .clear() perf_collection: Add test for full scan time	2020-11-03 13:42:54 +02:00
Gleb Natapov	88a1274583	raft: Use different type to create type dependent statement for static assertion For some reason the one that woks for gcc does not work for clang.	2020-11-03 08:49:54 +02:00
Gleb Natapov	b6b51bf17e	raft: drop use of <ranges> for clang	2020-11-03 08:49:54 +02:00
Gleb Natapov	847400ee96	raft: make test compile with clang clang does not allow to return a future<> with co_return and it is more strict with type conversion.	2020-11-03 08:49:54 +02:00
Gleb Natapov	ff18072de8	raft: drop -fcoroutines support from configure.py We switched to clang and it does not have this flag.	2020-11-03 08:49:54 +02:00
Botond Dénes	a08b640fa7	utils/large_bitset: use reserve_partial() to reserve _storage To avoid stalls when reserving memory for a large bloom filter. The filter creation already has a yielding loop for initialization, this patch extends it to reservation of memory too.	2020-11-02 18:03:19 +02:00
Botond Dénes	bb908b1750	utils/chunked_vector: add reserve_partial() A variant of reserve() which allows gentle reserving of memory. This variant will allocate just one chunk at a time. To drive it to completion, one should call it repeatedly with the return value of the previous call, until it returns 0. This variant will be used in the next patch by the large bitset creation code, to avoid stalls when allocating large bloom filters (which are backed by large bitset).	2020-11-02 18:02:01 +02:00
Piotr Wojtczak	caa3c471c0	Validate ascii values when creating from CQL Although the code for it existed already, the validation function hasn't been invoked properly. This change fixes that, adding a validating check when converting from text to specific value type and throwing a marshal exception if some characters are not ASCII. Fixes #5421 Closes #7532	2020-11-02 16:47:32 +02:00
Pavel Emelyanov	364ddab148	test: Do not dump test log onto terminal When unit tests fail the test.py dump their output on the screen. This is impossible to read this output from the terminal, all the more so the logs are anyway saved in the testlog/ directory. At the same time the names of the failed tests are all left _before_ these logs, and if the terminal history is not large enough, it becomes quite annoying to find the names out. The proposal is not to spoil the terminal with raw logs -- just names and summaries. Logs themselves are at testlog/$mode/$name_of_the_test.log Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20201031154518.22257-1-xemul@scylladb.com>	2020-11-02 15:42:34 +02:00
Tomasz Grabiec	ba42e7fcc5	multishard_mutation_query: Propagate mutation_reader::forwarding flag Otherwise all readers will be created with the default forwarding::yes. This inhibits some optimizations (e.g. results in more sstable read-ahead). It will also be problematic when we introduce mutation sources which don't support forwarding::yes in the future. Message-Id: <1604065206-3034-1-git-send-email-tgrabiec@scylladb.com>	2020-11-02 15:24:36 +02:00
Avi Kivity	eb861e68e9	build: switch to clang as the default compiler Clang brings us working support for coroutines, which are needed for Raft and for code simplification. perf_simple_query as well as full system tests show no significant performance regression. Test: unit(dev, release, debug) Closes #7531	2020-11-02 14:18:13 +02:00
Nadav Har'El	ffbd487c86	Merge 'alternator::streams: Use end-of-record info in get_records' from Calle Wilund Fixes #7496 Since cdc log now has an end-of-batch/record marker that tells us explicitly that we've read the last row of a change, we can use this instead of timestamp checks + limit extra to ensure we have complete records. Note that this does not try to fulfill user query limit exact. To do this we would need to add a loop and potentially re-query if quried rows are not enough. But that is a separate exercise, and superbly suited for coroutines! Closes #7498 * github.com:scylladb/scylla: alternator::streams: Reduce the query limit depending on cdc opts alternator::streams: Use end-of-record info in get_records	2020-11-02 13:34:00 +02:00
Tomasz Grabiec	2dfc5f1ee5	Merge "Cleanup gossiper endpoint interface" from Benny This series cleans up the gossiper endpoint_state interface marking methods const and const noexcept where possible. To achieve that, endpoint_state::get_status was changed to return a string_view rather than a sstring so it won't need to allocate memory. Also, the get_cluster_name and get_partitioner_name were changes to return a const sstring& rather than sstring so they won't need to allocate memory. The motivation for the series stems from #7339 where an exception in get_host_id within a storage_service notification handler, called from seastar::defer crashed the server. With this series, get_host_id may still throw exceptions on logical error, but not from calling get_application_state_ptr. Refs #7339 Test: unit(dev) * tag 'gossiper-endpoint-noexcept-v2': gossiper: mark trivial methods noexcept gossiper: get_cluster_name, get_partitioner_name: make noexcept gossiper: get_gossip_status: return string_view and make noexcept gms/endpoint_state: mark methods using get_status noexcept gms/endpoint_state: get_status: return string_view and make noexcept gms/endpoint_state: mark get_application_state_ptr and is_cql_ready noexcept gms/endpoint_state: mark trivial methods noexcept gms/heart_beat_state: mark methods noexcept gms/versioned_value: mark trivial methods noexcept gms/version_generator: mark get_next_version noexcept fb_utilities.hh: mark methods noexcept messaging: msg_addr: mark methods noexcept gms/inet_address: mark methods noexcept	2020-11-02 12:30:30 +01:00
Avi Kivity	7a3376907e	Merge 'improvements for GCE image' from Bentsi when logging in to the GCE instance that is created from the GCE image it takes 10 seconds to understand that we are not running on AWS. Also, some unnecessary debug logging messages are printed: ``` bentsi@bentsi-G3-3590:~/devel/scylladb$ ssh -i ~/.ssh/scylla-qa-ec2 bentsi@35.196.8.86 Warning: Permanently added '35.196.8.86' (ECDSA) to the list of known hosts. Last login: Sun Nov 1 22:14:57 2020 from 108.128.125.4 _____ _ _ _____ ____ / ____\| \| \| \| \| __ \\| _ \ \| (___ ___ _ _\| \| \| __ _\| \| \| \| \|_) \| \___ \ / __\| \| \| \| \| \|/ _` \| \| \| \| _ < ____) \| (__\| \|_\| \| \| \| (_\| \| \|__\| \| \|_) \| \|_____/ \___\|\__, \|_\|_\|\__,_\|_____/\|____/ __/ \| \|___/ Version: 666.development-0.20201101.6be9f4938 Nodetool: nodetool help CQL Shell: cqlsh More documentation available at: http://www.scylladb.com/doc/ By default, Scylla sends certain information about this node to a data collection server. For information, see http://www.scylladb.com/privacy/ WARNING:root:Failed to grab http://169.254.169.254/latest/... WARNING:root:Failed to grab http://169.254.169.254/latest/... Initial image configuration failed! To see status, run 'systemctl status scylla-image-setup' [bentsi@artifacts-gce-image-jenkins-db-node-aa57409d-0-1 ~]$ ``` this PR fixes this Closes #7523 * github.com:scylladb/scylla: scylla_util.py: remove unnecessary logging scylla_util.py: make is_aws_instance faster scylla_util.py: added ability to control sleep time between retries in curl()	2020-11-02 12:32:25 +02:00
Piotr Sarna	b66c285f94	schema_tables: fix fixing old secondary index schemas Old secondary index schemas did not have their idx_token column marked as computed, and there already exists code which updates them. Unfortunately, the fix itself contains an error and doesn't fire if computed columns are not yet supported by the whole cluster, which is a very common situation during upgrades. Fixes #7515 Closes #7516	2020-11-02 12:30:20 +02:00
Takuya ASADA	100127bc02	install.sh: allow --packaging with nonroot mode Since scylla-ccm wants to skip systemctl, we need to support --packaging in nonroot mode too. Related: #7187	2020-11-02 12:07:14 +02:00
Calle Wilund	7c8f457bab	alternator::streams: Reduce the query limit depending on cdc opts Avoid querying much more than needed. Since we have exact row markers now, this is more safe to do.	2020-11-02 08:37:27 +00:00
Calle Wilund	c79108edbb	alternator::streams: Use end-of-record info in get_records Fixes #7496 Since cdc log now has an end-of-batch/record marker that tells us explicitly that we've read the last row of a change, we can use this instead of timestamp checks + limit extra to ensure we have complete records. Note that this does not try to fulfill user query limit exact. To do this we would need to add a loop and potentially re-query if quried rows are not enough. But that is a separate exercise, and superbly suited for coroutines!	2020-11-02 08:35:36 +00:00
Avi Kivity	b6f8bb6b77	tools/toolchain: update maintainer instructions The instructions are updated for multiarch images (images that can be used on x86 and ARM machines). Additionally, - docker is replaced with podman, since that is now used by developers. Docker is still supported for developers, but the image creation instructions are only tested with podman. - added instructions about updating submodules - `--format docker` is removed. It is not necessary with more recent versions of docker. Closes #7521	2020-11-02 10:29:54 +02:00
Avi Kivity	3993498fb4	connection_notifier: prevent link errors due to variables defined in header connection_notifier.hh defines a number of template-specialized variables in a header. This is illegal since you're allowed to define something multiple times if it's a template, but not if it's fully specialized. gcc doesn't care but clang notices and complains. Fix by defining the variiables as inline variables, which are allowed to have definitions in multiple translation units. Closes #7519	2020-11-02 10:28:55 +02:00
Avi Kivity	83b3d3d1d1	test: increase timeout to 12000 seconds to account for slow ARM cores Some ARM cores are slow, and trip our current timeout of 3000 seconds in debug mode. Quadrupling the timeout is enough to make debug-mode tests pass on those machines. Since the timeout's role is to catch rare infinite loops in unsupervised testing, increasing the timeout has no ill effect (other than to delay the report of the failure). Closes #7518	2020-11-02 10:28:14 +02:00
Piotr Sarna	ed047d54bf	Merge 'alternator: fix combination of filter and projection' from Nadav The main goal of this this series is to fix issue #6951 - a Query (or Scan) with a combination of filtering and projection parameters produced wrong results if the filter needs some attributes which weren't projected. This series also adds new tests for various corner cases of this issue. These new tests also pass after this fix, or still fail because some other missing feature (namely, nested attributes). These additional tests will be important if we ever want to refactor or optimize this code, because they exercise some rare corner code paths at the intersection of filtering and projection. This series also fixes some additional problems related to this issue, like combining old and new filtering/projection syntaxes (should be forbidden), and even one fix to a wrong comment. Closes #7328 * github.com:scylladb/scylla: alternator test: tests for nested attributes in FilterExpression alternator test: fix comment alternator tests: additional tests for filter+projection combination alternator: forbid combining old and new-style parameters alternator: fix query with both projection and filtering	2020-11-02 07:28:41 +01:00
Bentsi Magidovich	2866f2d65d	scylla_util.py: remove unnecessary logging when calling curl and exception is raised we can see unnecessary log messages that we can't control. For example when used in scylla_login we can see following messages: WARNING:root:Failed to grab http://169.254.169.254/latest/... WARNING:root:Failed to grab http://169.254.169.254/latest/... Initial image configuration failed! To see status, run 'systemctl status scylla-image-setup'	2020-11-02 01:13:44 +03:00
Bentsi Magidovich	a62237f1c6	scylla_util.py: make is_aws_instance faster when used for example in scylla_login we need to understand that we are not running on AWS faster then 10 seconds	2020-11-02 00:11:21 +03:00
Bentsi Magidovich	83a8550a5f	scylla_util.py: added ability to control sleep time between retries in curl()	2020-11-01 22:39:19 +03:00
Avi Kivity	b45c933036	tools: toolchain: update for gcc-10.2.1-6.fc33.x86_64	2020-11-01 19:18:00 +02:00
Avi Kivity	d626563fe3	Update seastar submodule * seastar 57b758c2f9...a62a80ba1d (1): > thread: increase stack size in debug mode	2020-11-01 19:16:59 +02:00
Benny Halevy	e4614d4836	gossiper: mark trivial methods noexcept Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:47 +02:00
Benny Halevy	1ba4c84ae2	gossiper: get_cluster_name, get_partitioner_name: make noexcept These methods can return a const sstring& rather than allocating a sstring. And with that they can be marked noexcept. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:29 +02:00
Benny Halevy	11a8912093	gossiper: get_gossip_status: return string_view and make noexcept Change get_gossip_status to return string_view, and with that it can be noexcept now that it doesn't allocate memory via sstring. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:18 +02:00
Benny Halevy	126e486fde	gms/endpoint_state: mark methods using get_status noexcept Now that get_status returns string_view, just compare it with a const char* rather than making a sstring out of it, and consequently, can be marked noexcept. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:18 +02:00
Benny Halevy	6b9191b6c2	gms/endpoint_state: get_status: return string_view and make noexcept get_status doesn't need to allocate a sstring, it can just return a std::string_view to the status string, if found. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:18 +02:00
Benny Halevy	232c665bab	gms/endpoint_state: mark get_application_state_ptr and is_cql_ready noexcept Although std::map::find is not guaranteed to be noexcept it depends on the comperator used and in this case comparing application_state is noexcept. Therefore, we can safely mark get_application_state_ptr noexcept. is_cql_ready depends on get_application_state_ptr and otherwise handles an exceptions boost::lexical_cast so it can be marked noexcept as well. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:18 +02:00
Benny Halevy	5d8e2c038b	gms/endpoint_state: mark trivial methods noexcept Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:18 +02:00
Benny Halevy	d4c364507e	gms/heart_beat_state: mark methods noexcept Now that get_next_version() is noexcept, update_heart_beat can be noexcept too. All others are trivially noexcept. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:18 +02:00
Benny Halevy	68a2920201	gms/versioned_value: mark trivial methods noexcept Also, versioned_value::compare_to() can be marked const. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:18 +02:00
Benny Halevy	c295f521b9	gms/version_generator: mark get_next_version noexcept It is trivially so. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:18 +02:00
Benny Halevy	87c3fd9cd8	fb_utilities.hh: mark methods noexcept Now that gms::inet_address assignment is marked as noexcept. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:18 +02:00
Benny Halevy	e28d80ec0c	messaging: msg_addr: mark methods noexcept Based on gms::inet_address. With that, gossiper::get_msg_addr can be marked noexcept (and const while at it). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:18 +02:00
Benny Halevy	232fc19525	gms/inet_address: mark methods noexcept Based on the corresponding net::inet_address calls. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-01 16:46:18 +02:00
Avi Kivity	6be9f49380	cql3: expression: switch from range_bound to interval_bound to avoid clang class template argument deduction woes Clang does not implement P1814R0 (class template argument deduction for alias templates), so it can't deduce the template arguments for range_bound, but it can for interval_bound, so switch to that. Using the modern name rather than the compatibility alias is preferred anyway. Closes #7422	2020-11-01 13:19:44 +02:00
Nadav Har'El	deaa141aea	docs/isolation.md: fix list of IO priority classes In commit `de38091827` the two IO priority classes streaming_read and streaming_write into just one. The document docs/isolation.md leaves a lot to be desired (hint, hint, to anyone reading this and can write content!) but let's at least not have incorrect information there. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201101102220.2943159-1-nyh@scylladb.com>	2020-11-01 12:27:06 +02:00
Avi Kivity	46612fe92b	Merge 'Add debug context to views out of sync' from Piotr Sarna This series adds more context to debugging information in case a view gets out of sync with its base table. A test was conducted manually, by: 1. creating a table with a secondary index 2. manually deleting computed column information from system_schema.computed_columns 3. restarting the target node 4. trying to write to the index Here's what's logged right after the index metadata is loaded from disk: ``` ERROR 2020-10-30 12:30:42,806 [shard 0] view - Column idx_token in view ks.t_c_idx_index was not found in the base table ks.t ERROR 2020-10-30 12:30:42,806 [shard 0] view - Missing idx_token column is caused by an incorrect upgrade of a secondary index. Please recreate index ks.t_c_idx_index to avoid future issues. ``` And here's what's logged during the actual failure - when Scylla notices that there exists a column which is not computed, but it's also not found in the base table: ``` ERROR 2020-10-30 12:31:25,709 [shard 0] storage_proxy - exception during mutation write to 127.0.0.1: seastar::internal::backtraced<std::runtime_error> (base_schema(): operation unsupported when initialized only for view reads. Missing column in the base table: idx_token Backtrace: 0x1d14513 0x1d1468b 0x1d1492b 0x109bbad 0x109bc97 0x109bcf4 0x1bc4370 0x1381cd3 0x1389c38 0xaf89bf 0xaf9b20 0xaf1654 0xaf1afe 0xb10525 0xb10ad8 0xb10c3a 0xaaefac 0xabf525 0xabf262 0xac107f 0x1ba8ede 0x1bdf749 0x1be338c 0x1bfe984 0x1ba73fa 0x1ba77a4 0x9ea2c8 /lib64/libc.so.6+0x27041 0x9d11cd -------- seastar::lambda_task<seastar::execution_stage::flush()::{lambda()#1}> ``` Hopefully, this information will make it much easier to solve future problems with out-of-sync views. Tests: unit(dev) Fixes #7512 Closes #7513 * github.com:scylladb/scylla: view: add printing missing base column on errors view: simplify creating base-dependent info for reads only view: fix typo: s/dependant/dependent view: add error logs if a view is out of sync with its base	2020-11-01 11:09:58 +02:00
Piotr Wojtczak	2150c0f7a2	cql: Check for timestamp correctness in USING TIMESTAMP statements In certain CQL statements it's possible to provide a custom timestamp via the USING TIMESTAMP clause. Those values are accepted in microseconds, however, there's no limit on the timestamp (apart from type size constraint) and providing a timestamp in a different unit like nanoseconds can lead to creating an entry with a timestamp way ahead in the future, thus compromising the table. To avoid this, this change introduces a sanity check for modification and batch statements that raises an error when a timestamp of more than 3 days into the future is provided. Fixes #5619 Closes #7475	2020-11-01 11:01:24 +02:00
Pavel Emelyanov	d045df773f	code: RIP global query processor instance Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-31 18:51:52 +03:00
Pavel Emelyanov	a340caa328	cql test env: Keep query processor reference on board Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-31 18:51:52 +03:00
Pavel Emelyanov	8989021dc3	system distributed keyspace: Start sharded service erarlier The constructors just set up the references, real start happens in .start() so it is safe to do this early. This helps not carrying migration manager and query processor down the storage service cluster joining code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-31 18:51:52 +03:00
Pavel Emelyanov	021b905773	schema_tables: Use qctx to make internal requests The query processor global instance is going away. The schema_tables usage of it requires a huge rework to push the qp reference to the needed places. However, those places talk to system keyspace and are thus the users of the "qctx" thing -- the query context for local internal requests. To make cql tests not crash on null qctx pointer, its initialization should come earlier (conforming to the main start sequence). The qctx itself is a global pointer, which waits for its fix too, of course. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-31 18:50:01 +03:00
Pavel Emelyanov	699074bd48	transport: Keep sharded query processor reference on controller Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-31 15:44:21 +03:00
Pavel Emelyanov	c887d0df4c	thrift: Keep sharded query processor reference on controller Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-31 15:44:21 +03:00
Pavel Emelyanov	cf172cf656	alternator: Use local query processor reference to get keys Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-31 15:44:21 +03:00
Pavel Emelyanov	94a9f22002	alternator: Keep local query processor reference in server Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-31 15:44:21 +03:00
Piotr Sarna	35887bf88b	view: add printing missing base column on errors When an out-of-sync view is attempted to be used in a write operation, the whole operation needs to be aborted with an error. After this patch, the error contains more context - namely, the missing column.	2020-10-31 12:22:07 +01:00
Piotr Sarna	ef3470fa34	view: simplify creating base-dependent info for reads only The code which created base-dependent info for materialized views can be expressed with fewer branches. Also, the constructor which takes a single parameter is made explicit.	2020-10-31 12:22:07 +01:00
Piotr Sarna	71b28d69b3	view: fix typo: s/dependant/dependent	2020-10-31 12:22:07 +01:00
Piotr Sarna	669e2ada92	view: add error logs if a view is out of sync with its base When Scylla finds out that a materialized view contains columns which are not present in the base table (and they are not computed), it now presents comprehensible errors in the log.	2020-10-31 12:22:07 +01:00
Avi Kivity	1734205315	Update seastar submodule * seastar 6973080cd1...57b758c2f9 (11): > http: handle 'match all' rule correctly > http: add missing HTTP methods > memory: remove unused lambda capture in on_allocation_failure() > Support seastar allocator when seastar::alien is used > Merge "make timer related functions noexcept" from Benny > script: update dependecy packages for centos7/8 > tutorial: add linebreak between sections > doc: add nav for the second last chap > doc: add nav bar at the bottom also > doc: rename add_prologue() to add_nav_to_body() > Wrong name used in an example in mini tutorial.	2020-10-30 09:49:47 +02:00
Avi Kivity	27125a45b2	test: switch lsa-related tests (imr_test and double_decker_test) to seastar framework An upcoming change in Seastar only initializes the Seastar allocator in reactor threads. This causes imr_test and double_decker_test to fail: 1. Those tests rely on LSA working 2. LSA requires the Seastar allocator 3. Seastar is not initialized, so the Seastar allocator is not initialized. Fix by switching to the Seastar test framework, which initializes Seastar. Closes #7486	2020-10-30 08:06:04 +02:00
Avi Kivity	8a8589038c	test: increase quota for tests to 6GB test.py estimates the amount of memory needed per test in order not to overload the machine, but it underestimates badly and so machines with many cores but not a lot of memory fail the tests (in debug mode principally) due to running out of memory. Increase the estimate from 2GB per test to 6GB. Closes #7499	2020-10-30 08:04:40 +02:00
Avi Kivity	24097eee11	test: sstable_3_x_test: reduce stack usage in thread- local storage initialization gcc collects all the initialization code for thread-local storage and puts it in one giant function. In combination with debug mode, this creates a very large stack frame that overflows the stack on aarch64. Work around the problem by placing each initializer expression in its own function, thus reusing the stack. Closes #7509	2020-10-30 08:03:44 +02:00
Piotr Grabowski	e96ef0d629	tests: Cleanup select_statement_utils Add additional comments to select_statement_utils, fix formatting, add missing #pragma once and introduce set_internal_paging_size_guard to set internal_paging in RAII fashion. Closes #7507	2020-10-29 15:25:02 +01:00
Asias He	d47033837a	gossiper: Use dedicated gossip scheduling group Gossip currently runs inside the default (main) scheduling group. It is fine to run inside default scheduling group. From time to time, we see many tasks in main scheduling group and we suspect gossip. It is best we can move gossip to a dedicated scheduling group, so that we can catch bugs that leak tasks to main group more easily. After this patch, we can check: scylla_scheduler_time_spent_on_task_quota_violations_ms{group="gossip",shard="0"} Fixes: #7154 Tests: unit(dev)	2020-10-29 12:53:37 +02:00
Avi Kivity	bd73898a5c	dist: redhat: don't pull in kernel package We require a kernel that is at least 3.10.0-514, because older kernel have an XFS related bug that causes data corruption. However this Requires: clause pulls in a kernel even in Docker installation, where it (and especially the associated firmware) occupies a lot of space. Change to a Conflicts: instead. This prevents installation when the really old kernel is present, but doesn't pull it in for the Docker image. Closes #7502	2020-10-29 12:44:22 +02:00
Piotr Sarna	8c645f74ce	Merge 'select_statement: Fix aggregate results on indexed selects (timeouts fixed) ' from Piotr Grabowski Overview Fixes #7355. Before this changes, there were a few invalid results of aggregates/GROUP BY on tables with secondary indexes (see below). Unfortunately, it still does NOT fix the problem in issue #7043. Although this PR moves forward fixing of that issue, there is still a bug with `TOKEN(...)` in `WHERE` clauses of indexed selects that is not addressed in this PR. It will be fixed in my next PR. It does NOT fix the problems in issues #7432, #7431 as those are out-of-scope of this PR and do not affect the correctness of results (only return a too large page). GROUP BY (first commit) Before the change, `GROUP BY` `SELECT`s with some `WHERE` restrictions on an indexed column would return invalid results (same grouped column values appearing multiple times): ``` CREATE TABLE ks.t(pk int, ck int, v int, PRIMARY KEY(pk, ck)); CREATE INDEX ks_t on ks.t(v); INSERT INTO ks.t(pk, ck, v) VALUES (1, 2, 3); INSERT INTO ks.t(pk, ck, v) VALUES (1, 4, 3); SELECT pk FROM ks.t WHERE v=3 GROUP BY pk; pk ---- 1 1 ``` This is fixed by correctly passing `_group_by_cell_indices` to `result_set_builder`. Fixes the third failing example from issue #7355. Paging (second commit) Fixes two issues related to improper paging on indexed `SELECT`s. As those two issues are closely related (fixing one without fixing the other causes invalid results of queries), they are in a single commit (second commit). The first issue is that when using `slice.set_range`, the existing `_row_ranges` (which specify clustering key prefixes) are not taken into account. This caused the wrong rows to be included in the result, as the clustering key bound was set to a half-open range: ``` CREATE TABLE ks.t(a int, b int, c int, PRIMARY KEY ((a, b), c)); CREATE INDEX kst_index ON ks.t(c); INSERT INTO ks.t(a, b, c) VALUES (1, 2, 3); INSERT INTO ks.t(a, b, c) VALUES (1, 2, 4); INSERT INTO ks.t(a, b, c) VALUES (1, 2, 5); SELECT COUNT() FROM ks.t WHERE c = 3; count ------- 2 ``` The second commit fixes this issue by properly trimming `row_ranges`. The second fixed problem is related to setting the `paging_state` to `internal_options`. It was improperly set to the value just after reading from index, making the base query start from invalid `paging_state`. The second commit fixes this issue by setting the `paging_state` after both index and base table queries are done. Moreover, the `paging_state` is now set based on `paging_state` of index query and the results of base table query (as base query can return more rows than index query). The second commit fixes the first two failing examples from issue #7355. Tests (fourth commit) Extensively tests queries on tables with secondary indices with aggregates and `GROUP BY`s. Tests three cases that are implemented in `indexed_table_select_statement::do_execute` - `partition_slices`, `whole_partitions` and (non-`partition_slices` and non-`whole_partitions`). As some of the issues found were related to paging, the tests check scenarios where the inserted data is smaller than a page, larger than a page and larger than two pages (and some in-between page boundaries scenarios). I found all those parameters (case of `do_execute`, number of inserted rows) to have an impact of those fixed bugs, therefore the tests validate a large number of those scenarios. Configurable internal_paging_size (third commit) Before this change, internal `page_size` when doing aggregate, `GROUP BY` or nonpaged filtering queries was hard-coded to `DEFAULT_COUNT_PAGE_SIZE` (10,000). This change adds new internal_paging_size variable, which is configurable by `set_internal_paging_size` and `reset_internal_paging_size` free functions. This functionality is only meant for testing purposes. Closes #7497 github.com:scylladb/scylla: tests: Add secondary index aggregates tests select_statement: Introduce internal_paging_size select_statement: Fix paging on indexed selects select_statement: Fix GROUP BY on indexed select	2020-10-29 08:30:16 +01:00
Takuya ASADA	fc1c4f2261	scylla_raid_setup: use sysfs to detect existing RAID volume We may not able to detect existing RAID volume by device file existance, we should use sysfs instead to make sure it's running. Fixes #7383 Closes #7399	2020-10-29 09:13:55 +02:00
Avi Kivity	17226f2f6c	tools: toolchain: update to Fedora 33 with clang 11 Update the toolchain to Fedora 33 with clang 11 (note the build still uses gcc). The image now creates a /root/.m2/repository directory; without this the tools/jmx build fails on aarch64. Add java-1.8.0-openjdk-devel since that is where javac lives now. Add a JAVA8_HOME environment variable; wihtout this ant is not able to find javac. The toolchain is enabled for x86_64 and aarch64.	2020-10-28 20:21:44 +02:00
Piotr Grabowski	006d4f40d9	tests: Add secondary index aggregates tests Extensively tests queries on tables with secondary indices with aggregates and GROUP BYs. Tests three cases that are implemented in indexed_table_select_statement::do_execute - partition_slices, whole_partitions and (non-partition_slices and non-whole_partitions). As some of the issues found were related to paging, the tests check scenarios where the inserted data is smaller than a page, larger than a page and larger than two pages (and some boundary scenarios).	2020-10-28 17:01:25 +01:00
Piotr Grabowski	4975d55cdc	select_statement: Introduce internal_paging_size Before this change, internal page_size when doing aggregate, GROUP BY or nonpaged filtering queries was hard-coded to DEFAULT_COUNT_PAGE_SIZE. This made testing hard (timeouts in debug build), because the tests had to be large to test cases when there are multiple internal pages. This change adds new internal_paging_size variable, which is configurable by set_internal_paging_size and reset_internal_paging_size free functions. This functionality is only meant for testing purposes.	2020-10-28 17:01:25 +01:00
Piotr Grabowski	b7b5066581	select_statement: Fix paging on indexed selects Fixes two issues related to improper paging on indexed SELECTs. As those two issues are closely related (fixing one without fixing the other causes invalid results of queries), they are in a single commit. The first issue is that when using slice.set_range, the existing _row_ranges (which specify clustering key prefixes) are not taken into account. This caused the wrong rows to be included in the result, as the clustering key bound was set to a half-open range: CREATE TABLE ks.t(a int, b int, c int, PRIMARY KEY ((a, b), c)); CREATE INDEX kst_index ON ks.t(c); INSERT INTO ks.t(a, b, c) VALUES (1, 2, 3); INSERT INTO ks.t(a, b, c) VALUES (1, 2, 4); INSERT INTO ks.t(a, b, c) VALUES (1, 2, 5); SELECT COUNT(*) FROM ks.t WHERE c = 3; count ------- 2 This change fixes this issue by properly trimming row_ranges. The second fixed problem is related to setting the paging_state to internal_options. It was improperly set just after reading from index, making the base query start from invalid paging_state. This change fixes this issue by setting the paging_state after both index and base table queries are done. Moreover, the paging_state is now set based on paging_state of index query and the results of base table query (as base query can return more rows than index query). Fixes the first two failing examples from issue #7355.	2020-10-28 17:01:25 +01:00
Piotr Grabowski	fb10386017	select_statement: Fix GROUP BY on indexed select Before the change, GROUP BY SELECTs with some WHERE restrictions on an indexed column would return invalid results (same grouped column values appearing multiple times): CREATE TABLE ks.t(pk int, ck int, v int, PRIMARY KEY(pk, ck)); CREATE INDEX ks_t on ks.t(v); INSERT INTO ks.t(pk, ck, v) VALUES (1, 2, 3); INSERT INTO ks.t(pk, ck, v) VALUES (1, 4, 3); SELECT pk FROM ks.t WHERE v=3 GROUP BY pk; pk ---- 1 1 This is fixed by correctly passing _group_by_cell_indices to result_set_builder. Fixes the third failing example from issue #7355.	2020-10-28 17:01:25 +01:00
Avi Kivity	5ff5d43c7a	Update tools/java submodule * tools/java e97c106047...ad48b44a26 (1): > build: Add generated Thrift sources to multi-Java build	2020-10-28 16:52:25 +02:00
Pavel Emelyanov	b2ce3b197e	allocation_strategy: Fix standard_migrator initialization This is the continuation of `30722b8c8e`, so let me re-cite Rafael: The constructors of these global variables can allocate memory. Since the variables are thread_local, they are initialized at first use. There is nothing we can do if these allocations fail, so use disable_failure_guard. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20201028140553.21709-1-xemul@scylladb.com>	2020-10-28 16:22:23 +02:00
Asias He	289a08072a	repair: Make repair_writer a shared pointer The future of the fiber that writes data into sstables inside the repair_writer is stored in _writer_done like below: class repair_writer { _writer_done[node_idx] = mutation_writer::distribute_reader_and_consume_on_shards().then([this] { ... }).handle_exception([this] { ... }); } The fiber access repair_writer object in the error handling path. We wait for the _writer_done to finish before we destroy repair_meta object which contains the repair_writer object to avoid the fiber accessing already freed repair_writer object. To be safer, we can make repair_writer a shared pointer and take a reference in the distribute_reader_and_consume_on_shards code path. Fixes #7406 Closes #7430	2020-10-28 16:22:23 +02:00
Avi Kivity	4b9206a180	install: abort if LD_PRELOAD is set when executing a relocatable binary LD_PRELOAD libraries usually have dependencies in the host system, which they will not have access to in a relocatable environment since we use a different libc. Detect that LD_PRELOAD is in use and if so, abort with an error. Fixes #7493. Closes #7494	2020-10-28 16:22:23 +02:00
Avi Kivity	2a42fc5cde	build: supply linker flags only to the linker, not the compiler Clang complains if it sees linker-only flags when called for compilation, so move the compile-time flags from cxx_ld_flags to cxxflags, and remove cxx_ld_flags from the compiler command line. The linker flags are also passed to Seastar so that the build-id and interpreter hacks still apply to iotune. Closes #7466	2020-10-28 16:22:23 +02:00
Avi Kivity	fc15d0a4be	build: relocatable package: exclude tools/python3 python3 has its own relocatable package, no need to include it in scylla-package.tar.gz. Python has its own relocatable package, so packaging it in scylla-package.ta Closes #7467	2020-10-28 16:22:23 +02:00
Avi Kivity	6eb3ba74e4	Update tools/java submodule * tools/java f2e8666d7e...e97c106047 (1): > Relocatable Package: create product prefixed relocatable archive	2020-10-28 08:47:49 +02:00
Juliusz Stasiewicz	e0176bccab	create_table_statement: Disallow default TTL on counter tables In such attempt `invalid_request_exception` is thrown. Also, simple CQL test is added. Fixes #6879	2020-10-27 22:44:02 +02:00
Nadav Har'El	92b741b4ff	alternator test: more tests for disabled streams and closed shards We already have a test for the behavior of a closed shard and how iterators previously created for it are still valid. In this patch we add to this also checking that the shard id itself, not just the iterator, is still valid. Additionally, although the aforementioned test used a disabled stream to create a closed shard, it was not a complete test for the behavior of a disabled stream, and this patch adds such a test. We check that although the stream is disabled, it is still fully usable (for 24 hours) - its original ARN is still listed on ListStreams, the ARN is still usable, its shards can be listed, all are marked as closed but still fully readable. Both tests pass on DynamoDB, and xfail on Alternator because of issue #7239 - CDC drops the CDC log table as soon as CDC is disabled, so the stream data is lost immediately instead of being retained for 24 hours. Refs #7239 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201006183915.434055-1-nyh@scylladb.com>	2020-10-27 22:44:02 +02:00
Nadav Har'El	a57d4c0092	docs: clean up format of docs/alternator/getting-started.md In https://github.com/scylladb/scylla-docs/pull/3105 it was noted that the Sphynx document parser doesn't like a horizontal line ("---") in the beginning of a section. Since there is no real reason why we must have this horizontal line, let's just remove it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201001151312.261825-1-nyh@scylladb.com>	2020-10-27 22:44:02 +02:00
Avi Kivity	e2a02f15c2	Merge 'transport/system_ks: Add more info to `system.clients`' from Juliusz Stasiewicz This patch fills the following columns in `system.clients` table: * `connection_stage` * `driver_name` * `driver_version` * `protocol_version` It also improves: * `client_type` - distinguishes cql from thrift just in case * `username` - now it displays correct username iff `PasswordAuthenticator` is configured. What is still missing: * SSL params (I'll happily get some advice here) * `hostname` - I didn't find it in tested drivers Refs #6946 Closes #7349 * github.com:scylladb/scylla: transport: Update `connection_stage` in `system.clients` transport: Retrieve driver's name and version from STARTUP message transport: Notify `system.clients` about "protocol_version" transport: On successful authentication add `username` to system.clients	2020-10-27 22:44:02 +02:00
Amnon Heiman	52db99f25f	scyllatop/livedata.py: Safe iteration over metrics This patch change the code that iterates over the metrics to use a copy of the metrics names to make it safe to remove the metrics from the metrics object. Fixes #7488 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-10-27 22:44:02 +02:00
Calle Wilund	1bc96a5785	alternator::streams: Make describe_stream use actual log ttl as window Allows QA to bypass the normal hardcoded 24h ttl of data and still get "proper" behaviour w.r.t. available stream set/generations. I.e. can manually change cdc ttl option for alternator table after streams enabled. Should not be exposed, but perhaps useful for testing. Closes #7483	2020-10-26 12:16:36 +02:00
Calle Wilund	4b65d67a1a	partition_version: Change range_tombstones() to return chunked_vector Refs #7364 The number of tombstones can be large. As a stopgap measure to just returning a source range (with keepalive), we can at least alleviate the problem by using a chunked vector. Closes #7433	2020-10-26 11:54:42 +02:00
Benny Halevy	82aabab054	table: get rid of reshuffle_sstables It is unused since `7351db7cab` Refs #6950 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201026074914.34721-1-bhalevy@scylladb.com>	2020-10-26 09:50:21 +02:00
Calle Wilund	46ea8c9b8b	cdc: Add an "end-of-record" column to Fixes #7435 Adds an "eor" (end-of-record) column to cdc log. This is non-null only on last-in-timestamp group rows, i.e. end of a singular source "event". A client can use this as a shortcut to knowing whether or not he has a full cdc "record" for a given source mutation (single row change). Closes #7436	2020-10-26 09:39:27 +02:00
Takuya ASADA	fe2d6765f9	node_exporter_install: upgrade to latest release We currently uses outdated version of node_exporter, let's upgrade to latest version. Fixes #7427	2020-10-25 13:59:14 +02:00
Etienne Adam	c518c1de1c	redis: remove useless std::move() As remarked during the last review, this commit removes the useless std::move(). Signed-off-by: Etienne Adam <etienne.adam@gmail.com> Message-Id: <20201024180447.16799-1-etienne.adam@gmail.com>	2020-10-25 13:17:40 +02:00
Avi Kivity	d5150a94d2	Update abseil submodule from upstream The dynamic_annotations library is now header-only, so it is no longer built. * abseil 2069dc7...1e3d25b (73): > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > fix compile fails with asan and -Wredundant-decls (#801) > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > btree: fix sign-compare warnings (#800) > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Added missing asserts for seq.index() < capacity_ and unified their usage based on has_element(). (#781) > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > fix build on P9 (#739) > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Disable pthread for standalone wasm build support (#721) > Merge branch 'master' of https://github.com/abseil/abseil-cpp into master > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Export of internal Abseil changes > Exclude empty directories (#697)	2020-10-25 12:51:40 +02:00
Dejan Mircevski	40adf38915	cql3/expr: Use Boost concept assert In `bd6855e`, we reverted to Boost ranges and commented out the concept check. But Boost has its own concept check, which this patch enables. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #7471	2020-10-22 17:24:49 +03:00
Benny Halevy	fcca64b4f6	test: imr_test should run automatically Unclear why it was places in test/manual in commit `1c8736f998` Test: boost/imr_test Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201022093826.12009-1-bhalevy@scylladb.com>	2020-10-22 12:40:30 +03:00
Nadav Har'El	6740907f3d	Merge 'utf8: don't linearize cells for validation' from Avi Kivity Currently, we linearize large UTF8 cells in order to validate them. This can cause large latency spikes if the cell is large. This series changes UTF8 validation to work on fragmented buffers. This is somewhat tricky since the validation routines are optimized for single-instruction-multiple-data (SIMD) architectures. The unit tests are expanded to cover the new functionality. Fixes #7448. Closes #7449 * github.com:scylladb/scylla: types: don't linearize utf8 for validation test: utf8: add fragmented buffer validation tests utils: utf8: add function to validate fragmented buffers utils: utf8: expose validate_partial() in a header utils: utf8: introduce validate_partial() utils: utf8: extract a function to evaluate a single codepoint	2020-10-21 20:51:15 +03:00
Tomasz Grabiec	158ae99c89	Merge 'view info: preserve integrity by allowing base info for reads only and by initializing base info' from Eliran Sinvani This PR purpose is to handle schema integrity issues that can arise in races involving materialized views. The possibility of such integrity issues was found in #7420 , where a view schema was used for reading without it's _base_info member initialized resulting in a segfault. We handle this doing 3 things: 1. First guard against using an uninitialized base info - this will be considered as an internal error as it will indicate that there is a path in our code that creates a view schema to be used for reads or writes but is not initializing the base info. 2. We allow the base info to be initialized also from partially matching base (most likely a newer one that this used to create the view). 3. We fix the suspected path that create such a view schema to initialize it. (in migration manager) It is worth mentioning that this PR is a workaround to a probable design flaw in our materialized views which requires the base table's information to be retrieved in the first place instead of just being self contained. Refs #7420 Closes #7469 * github.com:scylladb/scylla: materialized views: add a base table reference if missing view info: support partial match between base and view for only reading from view. view info: guard against null dereference of the base info	2020-10-21 16:21:00 +02:00
Eliran Sinvani	4749c58068	materialized views: add a base table reference if missing schema pointers can be obtained from two distinct entities, one is the database, those schema are obtained from the table objects and the other is from the schema registry. When a schema or a new schema is attached to a table object that represents a base table for views, all of the corresponding attached view schemas are guarantied to have their base info in sync. However if an older schema is inserted into the registry by the migratrion manager i.e loaded from other node, it will be missing this info. This becomes a problem when this schema is published through the schema registry as it can be obtained for an obsolete read command for example and then eventually cause a segmentation fault by null dereferencing the _base_info ptr. Refs #7420	2020-10-21 16:52:28 +03:00
Eliran Sinvani	70e04c1123	view info: support partial match between base and view for only reading from view. The current implementation of materialized views does no keep the version to which a specific version of materialized view schema corresponds to. This complicate things especially on old views versions that the schema doesn't support anymore. However, the views, being also an independent table should allow reading from them as long as they exist even if the base table changed since then. For the reading purpose, we don't need to know the exact composition of view primary key columns that are not part of the base primary key, we only need to know that there are any, and this is a much looser constrain on the schema. We can rely on a table invariants such as the fact that pk columns are not going to disappear on newer version of the table. This means that if we don't find a view column in the base table, it is not a part of the base table primary key. This information is enough for us to perform read on the view. This commit adds support for being able to rely on such partial information along with a validation that it is not going to be used for writes. If it is, we simply abort since this means that our schema integrity is compromised.	2020-10-21 15:20:43 +03:00
Eliran Sinvani	372051c97d	view info: guard against null dereference of the base info The change's purpose is to guard against segfault that is the result of dereferencing the _base_info member when it is uninitialized. We already know this can happen (#7420). The only purpose of this change is to treat this condition as an internal error, the reason is that it indicates a schema integrity problem. Besides this change, other measures should be taken to ensure that the _base_table member is initialized before calling methods that rely on it. We call the internal_error as a last resort.	2020-10-21 12:12:51 +03:00
Benny Halevy	70219b423f	table: add_sstable: provide strong exception guarantees Do not leave side-effects on nexception. Fixes #6658 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201020145429.19426-1-bhalevy@scylladb.com>	2020-10-21 11:40:03 +03:00
Avi Kivity	c0ca54395a	types: don't linearize utf8 for validation Use the new non-linearizing validator, avoiding linearization. Linearization can cause large contiguous memory allocations, which in turn causes latency spikes. Fixes #7448.	2020-10-21 11:14:44 +03:00
Avi Kivity	89f111b03f	test: utf8: add fragmented buffer validation tests Since there are a huge number of variations, we use random testing. Each test case is composed of a random number of valid code points, with a possible invalid code point somehwere. The test case is broken up into a random number of fragments. We test both validation success and error position indicator.	2020-10-21 11:14:44 +03:00
Avi Kivity	91490827c1	utils: utf8: add function to validate fragmented buffers Add a function to validate fragmented buffers. We validate each buffer with SIMD-optimized validate_partial(), then collect the codepoint that spans buffer boundaries (if any) in a temporary buffer, validate that too, and continue.	2020-10-21 11:14:44 +03:00
Avi Kivity	3d1be9286f	utils: utf8: expose validate_partial() in a header Since fragmented buffers are templates, we'll need access to validate_partial() in a header. Move it there.	2020-10-21 11:14:44 +03:00
Avi Kivity	22a0c457e2	utils: utf8: introduce validate_partial() The current validators expect the buffer to contain a full UTF-8 string. This won't be the case for fragmented buffers, since a codepoint can straddle two (or more) buffers. To prepare for that, convert the existing validators to validate_partial(), which returns either an error, or success with an indication of the size of the tail that was not validated and now many bytes it is missing. This is natural since the SIMD validators already cannot process a tail in SIMD mode if it's smaller than the vector size, so only minor rearrangements are needed. In addition, we now have validate_partial() for non-SIMD architectures, since we'll need it for fragmented buffer validation.	2020-10-21 11:14:44 +03:00
Avi Kivity	900699f1b5	utils: utf8: extract a function to evaluate a single codepoint Our SIMD optimized validators cannot process a codepoint that spans multiple buffers, and adapting them to be able to will slow them down. So our strategy is to special-case any codepoint that spans two buffers. To do that, extract an evaluate_codepoint() function from the current validate_naive() function. It returns three values: - if a codepoint was successfully decoded from the buffer, how many bytes were consumed - if not enough bytes were in the buffer, how many more are needed - otherwise, an error happened, so return an indication The new function uses a table to calculate a codepoint's size from its first byte, similar to the SIMD variants. validate_naive() is now implemented in terms of evaluate_codepoint().	2020-10-21 11:14:43 +03:00
Raphael S. Carvalho	6f805bd123	sstable_directory: Fix 50% space requirement for resharding This is a regression caused by `aebd965f0`. After the sstable_directory changes, resharding now waits for all sstables to be exhausted before releasing reference to them, which prevents their resources like disk space and fd from being released. Let's restore the old behavior of incrementally releasing resources, reducing the space requirement significantly. Fixes #7463. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20201020140939.118787-1-raphaelsc@scylladb.com>	2020-10-21 09:51:26 +02:00
Raphael S. Carvalho	74d35a2286	compaction: fix debug log for fully expired ssts the log is incorrectly printing actually compacted ssts, instead of fully expired ssts that weren't actually compacted Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20201020125053.109615-1-raphaelsc@scylladb.com>	2020-10-20 16:01:28 +03:00
Dmitry Kropachev	62709bab98	dist/docker: Pass extra arguments to Scylla Currently there is no way to pass scylla arguments from docker-entrypoint to scylla, but time to time it is needed. Example: https://github.com/scylladb/scylla-operator/issues/177 Closes #7458	2020-10-20 09:49:10 +03:00
Piotr Sarna	e5edf30869	Merge 'treewide: adjust for missing aggregate template ... ... type deduction and parenthesized aggregate construction' from Avi Kivity Clang does not implement P0960R3 and P1816R0, so constructions of aggregates (structs with no constructors) have to used braced initialization and cannot use class template argument deduction. This series makes the adjustments. Closes #7456 * github.com:scylladb/scylla: reader_concurrency_semaphore: adjust permit_summary construction for clang schema_tables: adjust altered_schema construction for clang types: adjust validation_visitor construction for clang	2020-10-20 08:52:29 +03:00
Dejan Mircevski	b037b0c10b	cql3: Delete some newlines Makes files shorter while still keeping the lines under 120 columns. Separate from other commits to make review easier. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-10-19 15:40:55 -04:00
Dejan Mircevski	62ea6dcd28	cql3: Drop superfluous ALLOW FILTERING Required no longer, after the last commit. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-10-19 15:38:11 -04:00
Dejan Mircevski	6773563d3d	cql3: Drop unneeded filtering for continuous CK Don't require filtering when a continuous slice of the clustering key is requested, even if partition is unrestricted. The read command we generate will fetch just the selected data; filtering is unnecessary. Some tests needed to update the expected results now that we're not fetching the extra data needed for filtering. (Because tests don't do the final trim to match selectors and assert instead on all the data read.) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-10-19 14:46:43 -04:00
Avi Kivity	cfada6e04d	reader_concurrency_semaphore: adjust permit_summary construction for clang Clang does not implement P0960R3, parenthesized initialization of aggregates, so we have to use brace initialization in permit_summary. As the parenthesized constructor call is done by emplace_back(), we have to do the braced call ourselves.	2020-10-19 14:57:51 +03:00
Avi Kivity	8e386a5f48	schema_tables: adjust altered_schema construction for clang Clang does not implement P0960R3, parenthesized initialization of aggregates, so we have to use brace initialization in altered_schema. As the parenthesized constructor call is done by emplace_back(), we have to do the braced call ourselves.	2020-10-19 14:57:21 +03:00
Avi Kivity	ed6775c585	types: adjust validation_visitor construction for clang Clang does not implement P0960R3, parenthesized initialization of aggregates, so we have to use brace initialization in validation_visitor. It also does not implement class template argument deduction for aggregates (P1816r0), so we have to specify the template parameters explicity.	2020-10-19 14:53:00 +03:00
Piotr Sarna	ef8815d39e	Merge 'treewide: drop some uses of <ranges> for clang' from Avi Kivity Clang has trouble compiling libstdc++'s `<ranges>`. It is not known whether the problem is in clang or in libstdc++; I filed bugs for both [1] [2]. Meanwhile, we wish to use clang to gain working coroutine support, so drop the failing uses of `<ranges>`. Luckily the changes are simple. [1] https://bugs.llvm.org/show_bug.cgi?id=47509 [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97120 Closes #7450 * github.com:scylladb/scylla: test: view_build_test: drop <ranges> test: mutation_reader_test: drop <ranges> cql3: expression: drop <ranges> sstables: leveled_compaction_strategy: drop use of <ranges> utils: to_range(): relax constraint	2020-10-19 09:58:53 +02:00
Tomasz Grabiec	d48f04f25e	migration_manger: Drop the schema version check from the schema pull handler The check was added to support migration from schema tables format v2 to v3. It was needed to handle the rolling upgrade from 2.x to 3.x scylla version. Old nodes wouldn't recognize new schema mutations, so the pull handler in v3 was changed to ignore requests from v2 nodes based on their advertised SCHEMA_TABLES_VERSION gossip state. This started to cause problems after `3b1ff90` (get rid of the seed concept). The bootstrapping node sometimes would hang during boot unable to reach schema agreement. It's relevant that gossip exchanges about new nodes are unidirectional (refs #2862). It's also relevant that pulls are edge-triggered only (refs #7426). If the bootstrapping node (A) is listed as a seed in one of the existing node's (B) configuration then node A can be contacted before it contacts node B. Node A may then send schema pull request to node B before it learns about node A, and node B will assume it's an old node and give an empty response. As a result, node A will end up with an old schema. The fix is to drop the check so that pull handler always responds with the schema. We don't support upgrades from nodes using v2 schema tables format anymore so this should be safe. Fixes #7396 Tests: - manual (ccm) - unit (dev) Message-Id: <1602612578-21258-1-git-send-email-tgrabiec@scylladb.com>	2020-10-19 10:45:23 +03:00
Avi Kivity	3249516f2e	test: view_build_test: drop <ranges> Clang has trouble with some parts of <ranges>. Replace with boost range adaptors for now.	2020-10-19 10:23:31 +03:00
Avi Kivity	1041521eb8	test: mutation_reader_test: drop <ranges> Clang has trouble with some parts of <ranges>. Replace with boost range adaptors for now.	2020-10-19 10:23:31 +03:00
Avi Kivity	bd6855ed62	cql3: expression: drop <ranges> Clang has trouble with some parts of <ranges>. Replace with boost range adaptors for now.	2020-10-19 10:23:30 +03:00
Avi Kivity	951b4d1541	sstables: leveled_compaction_strategy: drop use of <ranges> Clang has trouble with some parts of <ranges>. Replace with iterators for now.	2020-10-18 18:16:37 +03:00
Avi Kivity	f9129fc1f9	utils: to_range(): relax constraint The input range to utils::to_range() should be indeed a range, but clang has trouble compiling <ranges> which causes it to fail. Relax the constraint until this is fixed.	2020-10-18 18:16:30 +03:00
Avi Kivity	dfe4161e65	Revert "SCYLLA-VERSION-GEN: change master version to 4.3.dev" This reverts commit `951fb638a3`. QA was not prepared for it and it breaks their scripts.	2020-10-18 14:21:25 +03:00
Nadav Har'El	4159054baf	Merge 'treewide: don't capture structured bindings in lambdas' from Avi Kivity Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured. Hopefully, most of these lambda captures will be replaces with coroutines. Closes #7445 * github.com:scylladb/scylla: test: mutation_reader_test: don't capture structured bindings in lambdas api: column_family: don't capture structured bindings in lambdas thrift: don't capture structured bindings in lambdas test: partition_data_test: don't capture structured bindings in lambdas test: querier_cache_test: don't capture structured bindings in lambdas test: mutation_test: don't capture structured bindings in lambdas storage_proxy: don't capture structured bindings in lambdas db: hints/manager: don't capture structured bindings in lambdas db: commitlog_replayer: don't capture structured bindings in lambdas cql3: select_statement: don't capture structured bindings in lambdas cql3: statement_restrictions: don't capture structured bindings in lambdas cdc: log: don't capture structured bindings in lambdas	2020-10-18 13:12:11 +03:00
Avi Kivity	6f5ef5a5f5	dht: document incremental partition_range and token_range sharders Closes #6210	2020-10-18 12:24:49 +03:00
Pavel Solodovnikov	aa4c359cff	column_mapping_entry: extract == and != operators Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20201016123638.99534-1-pa.solodovnikov@scylladb.com>	2020-10-16 14:59:50 +02:00
Avi Kivity	e6d55e2778	test: mutation_reader_test: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:25:15 +03:00
Avi Kivity	82f79c0077	api: column_family: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:25:05 +03:00
Avi Kivity	99ee5f6aac	thrift: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:57 +03:00
Avi Kivity	d5e94ab224	test: partition_data_test: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:45 +03:00
Avi Kivity	77d54410d0	test: querier_cache_test: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:37 +03:00
Avi Kivity	b406af2556	test: mutation_test: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:28 +03:00
Avi Kivity	d50f508fa6	storage_proxy: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:19 +03:00
Avi Kivity	cb9a9584ac	db: hints/manager: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:09 +03:00
Avi Kivity	1986a74cc4	db: commitlog_replayer: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:01 +03:00
Avi Kivity	05a24408df	cql3: select_statement: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:23:53 +03:00
Avi Kivity	c2c3f8343e	cql3: statement_restrictions: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:23:33 +03:00
Avi Kivity	d3c0b4c555	cdc: log: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:23:16 +03:00
Avi Kivity	f87d4cca68	Merge "docs/debugging.md: add thematic debugging guides" from Botond " Focusing on different aspects of debugging Scylla. Also expand some of the existing segments and fix some small issues around the document. " * 'debugging.md-advanced-guides/v1' of https://github.com/denesb/scylla: docs/debugging.md: add thematic debugging guides docs/debugging.md: tips and tricks: add section about optimized-out variables docs/debugging.md: TLS variables: add missing $ to terminal command docs/debugging.md: TUI: describe how to switch between windows docs/debugging.md: troubleshooting: expand on crash on backtrace	2020-10-16 14:07:45 +03:00
Pekka Enberg	618e5cb1db	Merge 'token_restriction: invalid_request_exception on SELECTs with both normal and token restrictions' from Piotr Grabowski Before this change, invalid query exception on selects with both normal and token restrictions was only thrown when token restriction was after normal restriction. This change adds proper validation when token restriction is before normal restriction. Before the change - does not return error in last query; returns wrong results: ``` cqlsh> CREATE TABLE ks.t(pk int, PRIMARY KEY(pk)); cqlsh> INSERT INTO ks.t(pk) VALUES (1); cqlsh> INSERT INTO ks.t(pk) VALUES (2); cqlsh> INSERT INTO ks.t(pk) VALUES (3); cqlsh> INSERT INTO ks.t(pk) VALUES (4); cqlsh> SELECT pk, token(pk) FROM ks.t WHERE pk = 2 AND token(pk) > 0; InvalidRequest: Error from server: code=2200 [Invalid query] message="Columns "ColumnDefinition{name=pk, type=org.apache.cassandra.db.marshal.Int32Type, kind=PARTITION_KEY, componentIndex=0, droppedAt=-9223372036854775808}" cannot be restricted by both a normal relation and a token relation" cqlsh> SELECT pk, token(pk) FROM ks.t WHERE token(pk) > 0 AND pk = 2; pk \| system.token(pk) ----+--------------------- 3 \| 9010454139840013625 (1 rows) ``` Closes #7441 * github.com:scylladb/scylla: tests: Add token and non-token conjunction tests token_restriction: Add non-token merge exception	2020-10-16 13:09:29 +03:00
Tomasz Grabiec	f893516e55	Merge "lwt: store column_mapping's for each table schema version upon a DDL change" from Pavel Solodovnikov This patch introduces a new system table: `system.scylla_table_schema_history`, which is used to keep track of column mappings for obsolete table schema versions (i.e. schema becomes obsolete when it's being changed by means of `CREATE TABLE` or `ALTER TABLE` DDL operations). It is populated automatically when a new schema version is being pulled from a remote in get_schema_definition() at migration_manager.cc and also when schema change is being propagated to system schema tables in do_merge_schema() at schema_tables.cc. The data referring to the most recent table schema version is always present. Other entries are garbage-collected when the corresponding table schema version is obsoleted (they will be updated with a TTL equal to `DEFAULT_GC_GRACE_SECONDS` on `ALTER TABLE`). In case we failed to persist column mapping after a schema change, missing entries will be recreated on node boot. Later, the information from this table is used in `paxos_state::learn` callback in case we have a mismatch between the most recent schema version and the one that is stored inside the `frozen_mutation` for the accepted proposal. Such situation may arise under following circumstances: 1. The previous LWT operation crashed on the "accept" stage, leaving behind a stale accepted proposal, which waits to be repaired. 2. The table affected by LWT operation is being altered, so that schema version is now different. Stored proposal now references obsolete schema. 3. LWT query is retried, so that Scylla tries to repair the unfinished Paxos round and apply the mutation in the learn stage. When such mismatch happens, prior to that patch the stored `frozen_mutation` is able to be applied only if we are lucky enough and column_mapping in the mutation is "compatible" with the new table schema. It wouldn't work if, for example, the columns are reordered, or some columns, which are referenced by an LWT query, are dropped. With this patch we try to look up the column mapping for the obsolete schema version, then upgrade the stored mutation using obtained column mapping and apply an upgraded mutation instead. * git@github.com:ManManson/scylla.git feature/table_schema_history_v7: lwt: add column_mapping history persistence tests schema: add equality operator for `column_mapping` class lwt: store column_mapping's for each table schema version upon a DDL change schema_tables: extract `fill_column_info` helper frozen_mutation: introduce `unfreeze_upgrading` method	2020-10-15 20:48:29 +02:00
Pavel Solodovnikov	b59ac032c9	lwt: add column_mapping history persistence tests There are two basic tests, which: * Test that column mappings are serialized and deserialized properly on both CREATE TABLE and ALTER TABLE * Column mappings for obsoleted schema versions are updated with a TTL value on schema change Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-10-15 19:25:24 +03:00
Pavel Solodovnikov	81cf11f8a0	schema: add equality operator for `column_mapping` class Add a comparator for column mappings that will be used later in unit-tests to check whether two column mappings match or not. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-10-15 19:24:44 +03:00
Pavel Solodovnikov	055fd3d8ad	lwt: store column_mapping's for each table schema version upon a DDL change This patch introduces a new system table: `system.scylla_table_schema_history`, which is used to keep track of column mappings for obsolete table schema versions (i.e. schema becomes obsolete when it's being changed by means of `CREATE TABLE` or `ALTER TABLE` DDL operations). It is populated automatically when a new schema version is being pulled from a remote in get_schema_definition() at migration_manager.cc and also when schema change is being propagated to system schema tables in do_merge_schema() at schema_tables.cc. The data referring to the most recent table schema version is always present. Other entries are garbage-collected when the corresponding table schema version is obsoleted (they will be updated with a TTL equal to `DEFAULT_GC_GRACE_SECONDS` on `ALTER TABLE`). In case we failed to persist column mapping after a schema change, missing entries will be recreated on node boot. Later, the information from this table is used in `paxos_state::learn` callback in case we have a mismatch between the most recent schema version and the one that is stored inside the `frozen_mutation` for the accepted proposal. Such situation may arise under following circumstances: 1. The previous LWT operation crashed on the "accept" stage, leaving behind a stale accepted proposal, which waits to be repaired. 2. The table affected by LWT operation is being altered, so that schema version is now different. Stored proposal now references obsolete schema. 3. LWT query is retried, so that Scylla tries to repair the unfinished Paxos round and apply the mutation in the learn stage. When such mismatch happens, prior to that patch the stored `frozen_mutation` is able to be applied only if we are lucky enough and column_mapping in the mutation is "compatible" with the new table schema. It wouldn't work if, for example, the columns are reordered, or some columns, which are referenced by an LWT query, are dropped. With this patch we try to look up the column mapping for the obsolete schema version, then upgrade the stored mutation using obtained column mapping and apply an upgraded mutation instead. In case we don't find a column_mapping we just return an error from the learn stage. Tests: unit(dev, debug), dtests(paxos_tests.py:TestPaxos.schema_mismatch_*_test) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-10-15 19:24:30 +03:00
Benny Halevy	951fb638a3	SCYLLA-VERSION-GEN: change master version to 4.3.dev Now that scylla-ccm and scylla-dtest conform to PEP-440 version comparison (See https://www.python.org/dev/peps/pep-0440/) we can safely change scylla version on master to be the development branch for the next release. The version order logic is: 4.3.dev is followed by 4.3.rc[i] followed by 4.3.[n] Note that also according to https://blog.jasonantman.com/2014/07/how-yum-and-rpm-compare-versions/ 4.3.dev < 4.3.rc[i] < 4.3.[n] as "dev" < "rc" by alphabetical order and both "dev" and "rc*" < any number, based on the general rule that alphabetical strings compare as less than numbers. Test: unit Dtest: gating Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201015151153.726637-1-bhalevy@scylladb.com>	2020-10-15 18:32:51 +03:00
Gleb Natapov	30ff874e48	raft: make fsm::become_leader() private Message-Id: <20201015143634.2807731-4-gleb@scylladb.com>	2020-10-15 16:45:55 +02:00
Gleb Natapov	d2e8181852	raft: remove outdated comments in server_impl::add_entry_internal Message-Id: <20201015143634.2807731-3-gleb@scylladb.com>	2020-10-15 16:45:54 +02:00
Gleb Natapov	2f38c05b93	raft: fix apply fiber logging to be more consistent Message-Id: <20201015143634.2807731-2-gleb@scylladb.com>	2020-10-15 16:45:54 +02:00
Botond Dénes	8eb9da397f	docs/debugging.md: add thematic debugging guides Add debugging guides focusing on different aspects of debugging Scylla.	2020-10-15 16:17:39 +03:00
Botond Dénes	0ded715251	docs/debugging.md: tips and tricks: add section about optimized-out variables	2020-10-15 16:17:39 +03:00
Botond Dénes	a2d47738b1	docs/debugging.md: TLS variables: add missing $ to terminal command	2020-10-15 16:17:35 +03:00
Botond Dénes	d99d031c86	docs/debugging.md: TUI: describe how to switch between windows	2020-10-15 16:17:35 +03:00
Botond Dénes	b867142096	docs/debugging.md: troubleshooting: expand on crash on backtrace Describe why this happens and add a link to the GDB bug tracker as well as a workaround on avoiding the crash.	2020-10-15 16:17:29 +03:00
Avi Kivity	8068272b46	build: adjust inlining thresholds for clang too Commit `bc65659a46` adjusted the inlining parameters for gcc. Here we do the same for clang. With this adjustement, clang lags gcc by 3% in throughput (perf_simple_query --smp 1) compared to 20% without it. The value 2500 was derived by binary search. At 5000 compilation of storage_proxy never completes, at 1250 throughput is down by 10%. Closes #7418	2020-10-15 14:09:09 +03:00
Tomasz Grabiec	62d2979888	Merge "raft: snapshot support" from Gleb Support snapshotting for raft. The patch series only concerns itself with raft logic, not how a specific state machine implements take_snapshot() callback. * scylla-dev/raft-snapshots-v2: raft: test: add tests for snapshot functionality raft: preserve trailing raft log entries during snapshotting raft: implement periodic snapshotting of a state machine raft: add snapshot transfer logic	2020-10-15 12:45:30 +02:00
Piotr Grabowski	c8fdb02a13	tests: Add token and non-token conjunction tests Checks for invalid_request_exception in case of trying to run a query with both normal and token relations. Tests both orderings of those relations (normal or token relation first).	2020-10-15 12:32:18 +02:00
Piotr Grabowski	9d1cd2c57b	token_restriction: Add non-token merge exception Add exception that is thrown when merging of token and non-token restrictions is attempted. Before this change only merging non-token and token restriction was validated (WHERE pk = 0 AND token(pk) > 0) and not the other way (WHERE token(pk) > 0 AND pk = 0).	2020-10-15 12:32:18 +02:00
Gleb Natapov	36c67aef8b	raft: test: add tests for snapshot functionality The patch adds two tests; one for snapshot transfer and another for snapshot generation.	2020-10-15 11:50:27 +03:00
Gleb Natapov	7fdfa32dbd	raft: preserve trailing raft log entries during snapshotting This patch allows to leave snapshot_trailing amount of entries when a state machine is snapshotted and raft log entries are dropped. Those entries can be used to catch up nodes that are slow without requiring snapshot transfer. The value is part of the configuration and can be changed.	2020-10-15 11:50:27 +03:00
Gleb Natapov	7c1187b7f5	raft: implement periodic snapshotting of a state machine The patch implements periodic taking of a snapshot and trimming of the raft log. In raft the only way the log of already committed entries can be shorten is by taking a snapshot of the state machine and dropping log entries included in the snapshot from the raft log. To not let log to grow too large the patch takes the snapshot periodically after applying N number of entries where N can be configured by setting snapshot_threshold value in raft's configuration.	2020-10-15 11:48:44 +03:00
Gleb Natapov	6ca03585f4	raft: add snapshot transfer logic This patch adds the logic that detects that a follower misses data from a snapshot and initiate snapshot transfer in that case. Upon receiving the snapshot the follower stores it locally and applies it to its state machine. The code assumes that the snapshot is already exists on a leader.	2020-10-15 11:44:06 +03:00
Avi Kivity	71398f3fb4	Merge "Cleanup sstable writer" from Benny " This series cleans up the legacy and common ssatble writer code. metadata_collector::_ancestors were moved to class sstable so that the former can be moved out of sstable into file_writer_impl. Moved setting of replay position and sstable level via sstable_writer_config so that compaction won't need to access the metadata_collector via the sstable. With that, metadata_collector could be moved from class sstable to sstable_writer::writer_impl along with the column_stats. That allowed moved "generic" file_writer methods that were actually k/l format specific into sstable_writer_k_l. Eventually `file_writer` code is moved into sstables/writer.cc and sstable_writer_k_l into sstables/kl/writer.{hh,cc} A bonus cleanup is the ability to get rid of sstable::_correctly_serialize_non_compound_range_tombstones as it's now available to the writers via the writer configuration and not required to be stored in the sstable object. Fixes #3012 Test: unit(dev) " * tag 'cleanup-sstable-writer-v2' of github.com:bhalevy/scylla: sstables: move writer code away to writer.cc sstables: move sstable_writer_k_l away to kl/writer sstables: get rid of sstable::_correctly_serialize_non_compound_range_tombstones sstables: move writer methods to sstable_writer_k_l sstables: move compaction ancestors to sstable sstables: sstable_writer: optionally set sstable level via config sstables: sstable_writer: optionally set replay position via config sstables: compaction: make_sstable_writer_config sstables: open code update_stats_on_end_of_stream in sstable_writer::consume_end_of_stream sstables: fold components_writer into sstable_writer_k_l sstables: move sstable_writer_k_l definition upwards sstables: components_writer: turn _index into unique_ptr	2020-10-15 10:40:28 +03:00
Benny Halevy	279865e56c	sstables: move writer code away to writer.cc Move `file_writer` code into sstables/writer.cc Fixes #3012 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 23:41:47 +03:00
Benny Halevy	20adb96f62	sstables: move sstable_writer_k_l away to kl/writer Move the sstable_writer_k_l code into sstables/kl/writer.{hh,cc} Refs #3012 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 23:40:56 +03:00
Benny Halevy	8cd4d53643	sstables: mx/writer: fix copy-paste error in reader_semaphore name It was copied from sstables.cc in `6ca0464af5`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201014171651.541232-1-bhalevy@scylladb.com>	2020-10-14 22:17:49 +02:00
Benny Halevy	96cd6adc71	sstables: get rid of sstable::_correctly_serialize_non_compound_range_tombstones Now it's available to the writers via the writer configuration and not required to be stored in the sstable object. Refs #3012 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 19:53:23 +03:00
Benny Halevy	97a446f9fa	sstables: move writer methods to sstable_writer_k_l They are called solely from the sstable_writer_k_l path. With that, moce the metadata collector and column stats to writer_impl. They are now only used by the sstable writers. Refs #3012 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 19:52:17 +03:00
Benny Halevy	e1692bec17	sstables: move compaction ancestors to sstable Compaction needs access to the sstable's ancestors so we need to keep the ancestors for the sstable separately from the metadata collector as the latter is about to be moved to the sstable writer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 19:51:26 +03:00
Benny Halevy	a49a5f36c1	sstables: sstable_writer: optionally set sstable level via config And use compaction::make_sstable_writer_config to pass the compaction's `_sstable_level` to the writer via sstable_writer_config, instead of via the sstable metadata_collector, that is going to move from the sstable to the write_impl. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 19:49:36 +03:00
Benny Halevy	ac3c33ffca	sstables: sstable_writer: optionally set replay position via config And use compaction::make_sstable_writer_config to pass the compaction's replay_position (`_rp`) to the writer via sstable_writer_config, instead of via the sstable metadata_collector, that is going to move from the sstable to the write_impl. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 19:39:46 +03:00
Nadav Har'El	de8ff2f089	docs: some minor cleanups in protocols.md Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201014121626.643743-1-nyh@scylladb.com>	2020-10-14 18:14:00 +03:00
Nadav Har'El	509a41db04	alternator: change name of Alternator's SSL options When Alternator is enabled over HTTPS - by setting the "alternator_https_port" option - it needs to know some SSL-related options, most importantly where to pick up the certificate and key. Before this patch, we used the "server_encryption_options" option for that. However, this was a mistake: Although it sounds like these are the "server's options", in fact prior to Alternator this option was only used when communicating with other servers - i.e., connections between Scylla nodes. For CQL connections with the client, we used a different option - "client_encryption_options". This patch introduces a third option "alternator_encryption_options", which controls only Alternator's HTTPS server. Making it separate from the existing CQL "client_encryption_options" allows both Alternator and CQL to be active at the same time but with different certificates (if the user so wishes). For backward compatibility, we temporarily continue to allow server_encryption_options to control the Alternator HTTPS server if alternator_encryption_options is not specified. However, this generates a warning in the log, urging the user to switch. This temporary workaround should be removed in a future version. This patch also: 1. fixes the test run code (which has an "--https" option to test over https) to use the new name of the option. 2. Adds documentation of the new option in alternator.md and protocols.md - previously the information on how to control the location of the certificate was missing from these documents. Fixes #7204. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200930123027.213587-1-nyh@scylladb.com>	2020-10-14 18:13:57 +03:00
Nadav Har'El	4d7c63c50b	docs: in protocols.md, clarify CQL+SSL options and defaults The wording on how CQL with SSL is configured was ambigous. Clarify the text to explain that by default, it is disabled. We recommend to enable it on port 9142 - but it's not a "default". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201014144938.653311-1-nyh@scylladb.com>	2020-10-14 18:06:59 +03:00
Benny Halevy	e314eb3f78	sstables: compaction: make_sstable_writer_config Consolidate the code to make the sstable_writer_config for sstable writers into a helper method. Folowing patches will add the ability to set the replay position and sstable level via that config structure. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 18:01:46 +03:00
Benny Halevy	55d73ec2bc	sstables: open code update_stats_on_end_of_stream in sstable_writer::consume_end_of_stream In preparation to moving sstable methods to sstable_writer_k_l as part of #3012. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 17:46:26 +03:00
Benny Halevy	27e3c03ce2	sstables: fold components_writer into sstable_writer_k_l It serves no purpose being a different class but being called by sstable_writer_k_l. Refs #3012. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 17:40:47 +03:00
Benny Halevy	56a6a4ff17	sstables: move sstable_writer_k_l definition upwards To facilitate consolidation of components_writer and some sstable methods into sstable_writer_k_l. Refs #3012. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 17:13:44 +03:00
Benny Halevy	8f239f8f4c	sstables: components_writer: turn _index into unique_ptr In preparation to folding components_writer into sstable_writer_k_l in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 17:10:31 +03:00
Botond Dénes	23df38d867	scylla-gdb.py: add pretty-printer for nonwrapping_interval<dht::ring_position> The patch adds a generic pretty-printer for `nonwrapping_interval<>` (and `nonwrapping_range<>`) and a specific one for `dht::ring_position`. Adding support to clustering and partition ranges is just one more step. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20201014054013.606141-1-bdenes@scylladb.com>	2020-10-14 16:45:21 +03:00
Calle Wilund	83339f4bac	Alternator::streams: Make SequenceNumber monotinically growing Fixes #7424 AWS sdk (kinesis) assumes SequenceNumbers are monotonically growing bigints. Since we sort on and use timeuuids are these a "raw" bit representation of this will _not_ fulfill the requirement. However, we can "unwrap" the timestamp of uuid msb and give the value as timestamp<<64\|lsb, which will ensure sort order == bigint order.	2020-10-14 16:45:21 +03:00
Calle Wilund	3f800d68c6	alternator::streams: Ensure shards are reported in string lexical order Fixes #7409 AWS kinesis Java sdk requires/expects shards to be reported in lexical order, and even worse, ignores lastevalshard. Thus not upholding said order will break their stream intropection badly. Added asserts to unit tests. v2: * Added more comments * use unsigned_cmp * unconditional check in streams_test	2020-10-14 16:45:21 +03:00
Avi Kivity	f10debc48c	Update seastar submodule * seastar 35c255dcd...6973080cd (2): > Merge "memory: improve memory diagnostics dumped on allocation failures" from Botond > map_reduce: use get0 rather than get	2020-10-14 16:45:21 +03:00
Benny Halevy	b3f46e9cbf	test: serialized_action_test: add test_serialized_action_exception Tests that the exceptional future returned by the serialized action is propagated to trigger, reproducing #7352. The test fails without the previoud patch: "serialized_action: trigger: include also semaphore status to promise" Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 16:45:21 +03:00
Benny Halevy	f3fc81751f	serialized_action: trigger: propagate action error Currently, the serialized_action error is set to a shared_promise, but is not returned to the caller, unless there is an already outstanding action. Note that setting the exception to the promise when noone collected it via the shared_future caused 'Exceptional future ignored' warning to be issued, as seen in #7352. Fixes #7352 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 16:45:21 +03:00
Benny Halevy	81d2f60df9	serialized_action: trigger: include also semaphore status to promise Currently, if `with_semaphore` returns exceptional future, it is not propagated to the promise, and other waiters that got a shared future will not see that. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 16:45:21 +03:00
Avi Kivity	86bbf1763d	Merge "reader concurrency semaphore: dump permit diagnostics on timeout or queue overflow" from Botond " The reader concurrency semaphore timing out or its queue being overflown are fairly common events both in production and in testing. At the same time it is a hard to diagnose problem that often has a benign cause (especially during testing), but it is equally possible that it points to something serious. So when this error starts to appear in logs, usually we want to investigate and the investigation is lengthy... either involves looking at metrics or coredumps or both. This patch intends to jumpstart this process by dumping a diagnostics on semaphore timeout or queue overflow. The diagnostics is printed to the log with debug level to avoid excessive spamming. It contains a histogram of all the permits associated with the problematic semaphore organized by table, operation and state. Example: DEBUG 2020-10-08 17:05:26,115 [shard 0] reader_concurrency_semaphore - Semaphore _read_concurrency_sem: timed out, dumping permit diagnostics: Permits with state admitted, sorted by memory memory count name 3499M 27 ks.test:data-query 3499M 27 total Permits with state waiting, sorted by count count memory name 1 0B ks.test:drain 7650 0B ks.test:data-query 7651 0B total Permits with state registered, sorted by count count memory name 0 0B total Total: permits: 7678, memory: 3499M This allows determining several things at glance: * What are the tables involved * What are the operations involved * Where is the memory This can speed up a follow-up investigation greatly, or it can even be enough on its own to determine that the issue is benign. Tests: unit(dev, debug) " * 'dump-diagnostics-on-semaphore-timeout/v2' of https://github.com/denesb/scylla: reader_concurrency_semaphore: dump permit diagnostics on timeout or queue overflow utils: add to_hr_size() reader_concurrency_semaphore: link permits into an intrusive list reader_concurrency_semaphore: move expiry_handler::operator()() out-of-line reader_concurrency_semaphore: move constructors out-of-line reader_concurrency_semaphore: add state to permits reader_concurrency_semaphore: name permits querier_cache_test: test_immediate_evict_on_insert: use two permits multishard_combining_reader: reader_lifecycle_policy: add permit param to create_reader() multishard_combining_reader: add permit parameter multishard_combining_reader: shard_reader: use multishard reader's permit	2020-10-13 12:44:23 +03:00
Botond Dénes	18454e4a80	reader_concurrency_semaphore: dump permit diagnostics on timeout or queue overflow The reader concurrency semaphore timing out or its queue being overflown are fairly common events both in production and in testing. At the same time it is a hard to diagnose problem that often has a benign cause (especially during testing), but it is equally possible that it points to something serious. So when this error starts to appear in logs, usually we want to investigate and the investigation is lengthy... either involves looking at metrics or coredumps or both. This patch intends to jumpstart this process by dumping a diagnostics on semaphore timeout or queue overflow. The diagnostics is printed to the log with debug level to avoid excessive spamming. It contains a histogram of all the permits associated with the problematic semaphore organized by table, operation and state. Example: DEBUG 2020-10-08 17:05:26,115 [shard 0] reader_concurrency_semaphore - Semaphore _read_concurrency_sem: timed out, dumping permit diagnostics: Permits with state admitted, sorted by memory memory count name 3499M 27 ks.test:data-query 3499M 27 total Permits with state waiting, sorted by count count memory name 1 0B ks.test:drain 7650 0B ks.test:data-query 7651 0B total Permits with state registered, sorted by count count memory name 0 0B total Total: permits: 7678, memory: 3499M This allows determining several things at glance: * What are the tables involved * What are the operations involved * Where is the memory This can speed up a follow-up investigation greatly, or it can even be enough on its own to determine that the issue is benign.	2020-10-13 12:32:14 +03:00
Botond Dénes	0994e8b5e2	utils: add to_hr_size() This utility function converts a potentially large number to a compact representation, composed of at most 4 digits and a letter appropriate to the power of two the number has to multiplied with to arrive to the original number (with some loss of precision). The different powers of two are the conventional 2 ** (N * 10) variants: * N=0: (B)ytes * N=1: (K)bytes * N=2: (M)bytes * N=3: (G)bytes * N=4: (T)bytes Examples: * 87665 will be converted to 87K * 1024 will be converted to 1K	2020-10-13 12:32:14 +03:00
Botond Dénes	27bbf5566d	reader_concurrency_semaphore: link permits into an intrusive list	2020-10-13 12:32:14 +03:00
Botond Dénes	fdb93ae0fd	reader_concurrency_semaphore: move expiry_handler::operator()() out-of-line Soon we will want to add more logic to this now simple handler, move it out-of-line in preparation.	2020-10-13 12:32:14 +03:00
Botond Dénes	85bfd28f4e	reader_concurrency_semaphore: move constructors out-of-line Soon, the semaphore will have a field that will not have a publicly available definition. Move the constructor out-of-line in preparation.	2020-10-13 12:32:13 +03:00
Botond Dénes	70fa543c31	reader_concurrency_semaphore: add state to permits Instead of a simple boolean, designating whether the permit was already admitted or not, add a proper state field with a value for all the different states the permit can be in. Currently there are three such states: * registered - the permit was created and started accounting resource consumption. * waiting - the permit was queued to wait for admission. * admitted - the permit was successfully admitted. The state will be used for debugging purposes, both during coredump debugging as well as for dumping diagnostics data about permits.	2020-10-13 12:32:13 +03:00
Botond Dénes	ff623e70b3	reader_concurrency_semaphore: name permits Require a schema and an operation name to be given to each permit when created. The schema is of the table the read is executed against, and the operation name, which is some name identifying the operation the permit is part of. Ideally this should be different for each site the permit is created at, to be able to discern not only different kind of reads, but different code paths the read took. As not all read can be associated with one schema, the schema is allowed to be null. The name will be used for debugging purposes, both for coredump debugging and runtime logging of permit-related diagnostics.	2020-10-13 12:32:13 +03:00
Takuya ASADA	ff129ee030	install.sh: set LC_ALL=en_US.UTF-8 on python3 thunk scylla-python3 causes segfault when non-default locale specified. As workaround for this, we need to set LC_ALL=en_US.UTF_8 on python3 thunk. Fixes #7408 Closes #7414	2020-10-13 09:38:25 +03:00
Vlad Zolotarov	aec70d9953	cql3/statements/batch_statement.cc: improve batch size warning message Make the warning message clearer: * Include the number of partitions affected by the batch. * Be clear that the warning is about the batch size in bytes. Fixes #7367 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Closes #7417	2020-10-13 09:02:51 +03:00
Avi Kivity	3451579d81	sstables: move component_type formatter to namespace sstables Without this, clang complains that we violate argument dependent lookup rules: note: 'operator<<' should be declared prior to the call site or in namespace 'sstables' std::ostream& operator<<(std::ostream&, const sstables::component_type&); we can't enforce the #include order, but we can easily move it it to namespace sstables (where it belongs anyway), so let's do that. gcc is happy either way. Closes #7413	2020-10-12 21:49:25 +02:00
Tomasz Grabiec	29cf7fde03	Merge 'sstables: prepare bound_kind_m formatter for clang' from Avi Kivity bound_kind_m's formatter violates argument dependent lookup rules according to clang, so fix that. Along the way improve the formatter a little. Closes #7412 * git://github.com/avikivity/scylla.git avikivity-bound_kind_m-formatter: sstables: move bound_kind_m formatter to namespace sstables sstables: move bound_kind_m formatter to its natural place sstables: deinline bound_kind_m formatter	2020-10-12 21:47:53 +02:00
Avi Kivity	5065ae835f	sstables: move bound_kind_m formatter to namespace sstables Without this, clang complains that we violate argument dependent lookup rules: note: 'operator<<' should be declared prior to the call site or in namespace 'sstables' std::ostream& operator<<(std::ostream&, const sstables::bound_kind_m&); we can't enforce the #include order, but we can easily move it it to namespace sstables (where it belongs anyway), so let's do that. gcc is happy either way.	2020-10-12 20:38:11 +03:00
Avi Kivity	a00fca1a69	sstables: move bound_kind_m formatter to its natural place Move bound_kind_m's formatter to the same header file where is is defined. This prevents cases where the compiler decays the type (an enum) to the underlying integral type because it does not see the formatter declaration, resulting in the wrong output.	2020-10-12 20:36:10 +03:00
Avi Kivity	69c3533d97	sstables: deinline bound_kind_m formatter The formatter is by no means hot code and should not be inlined.	2020-10-12 20:35:08 +03:00
Juliusz Stasiewicz	0251cb9b31	transport: Update `connection_stage` in `system.clients`	2020-10-12 18:44:00 +02:00
Juliusz Stasiewicz	6abe1352ba	transport: Retrieve driver's name and version from STARTUP message	2020-10-12 18:37:19 +02:00
Juliusz Stasiewicz	d2d162ece3	transport: Notify `system.clients` about "protocol_version"	2020-10-12 18:32:00 +02:00
Piotr Dulikowski	77a0f1a153	hints: don't read hint files when it's not allowed to send When there are hint files to be sent and the target endpoint is DOWN, end_point_hints_manager works in the following loop: - It reads the first hint file in the queue, - For each hint in the file it decides that it won't be sent because the target endpoint is DOWN, - After realizing that there are some unsent hints, it decides to retry this operation after sleeping 1 second. This causes the first segment to be wholly read over and over again, with 1 second pauses, until the target endpoint becomes UP or leaves the cluster. This causes unnecessary I/O load in the streaming scheduling group. This patch adds a check which prevents end_point_hints_manager from reading the first hint file at all when it is not allowed to send hints. First observed in #6964 Tests: - unit(dev) - hinted handoff dtests Closes #7407	2020-10-12 19:09:57 +03:00
Botond Dénes	40c5474022	querier_cache_test: test_immediate_evict_on_insert: use two permits The test currently uses a single permit shared between two simulated reads (to wait admission twice). This is not a supported way of using a permit and will stop working soon as we make the states the permit is in more pronounced.	2020-10-12 15:56:56 +03:00
Botond Dénes	307cdf1e0d	multishard_combining_reader: reader_lifecycle_policy: add permit param to create_reader() Allow the evictable reader managing the underlying reader to pass its own permit to it when creating it, making sure they share the same permit. Note that the two parts can still end up using different permits, when the underlying reader is kept alive between two pages of a paged read and thus keeps using the permit received on the previous page. Also adjust the `reader_context` in multishard_mutation_query.cc to use the passed-in permit instead of creating a new one when creating a new reader.	2020-10-12 15:56:56 +03:00
Botond Dénes	e09ab09fff	multishard_combining_reader: add permit parameter Don't create an own permit, take one as a parameter, like all other readers do, so the permit can be provided by the higher layer, making sure all parts of the logical read use the same permit.	2020-10-12 15:56:56 +03:00
Botond Dénes	600f1c7853	multishard_combining_reader: shard_reader: use multishard reader's permit Don't create a new permit per shard reader, pass down the multishard reader's one to be used by each shard reader. They all belong to the same read, they should use the same permit. Note that despite its name the shard readers are the local representation of a reader living on a remote shard and as such they live on the same shard the multishard combining reader lives on.	2020-10-12 15:56:56 +03:00
Avi Kivity	73718414e3	data/cell: fix value_writer use before definition Clang parses templates more eagerly than gcc, so it fails on some forward-declared templates. In this case, value_writer was forward-declared and then used in data::cell. As it also uses some definitions local to data::cell, it cannot be defined before it as well as after it. To solve the problem, we define it as a nested class so it can use other local definitions, yet be defined before it is used. No code changes. Closes #7401	2020-10-12 13:41:09 +03:00
Avi Kivity	da3e51d7b8	build: use c++20 for all C++ files, not just those that use the seastar flags A few source files (like those generated by antlr) don't build with seastar, and so don't inherit all of its flags. They then use the compiler default dialect, not C++20. With gcc that's just fine, since gcc supports concepts in earlier dialects, but clang requires C++20. Fix by forcing --std=gnu++20 for all files (same as what Seastar chooses). Closes #7392	2020-10-12 13:16:27 +03:00
Avi Kivity	affa234151	types: don't linearize ascii during validation ascii has no inter-byte dependencies and so can be validated fragment by fragment, reducing large contiguous allocations. Fixes #7393. Closes #7394	2020-10-12 13:15:24 +03:00
Gleb Natapov	9d7c81c1b8	raft: fix boost/raft_fsm_test complication Message-Id: <20201011063802.GA2628121@scylladb.com>	2020-10-12 12:09:21 +02:00
Takuya ASADA	d5ff82dc61	scylla_setup: skip iotune when developer_mode is enabled When developer mode automatically enabled on nonroot mode, we should skip iotune since the parameter won't be used. Closes #7327	2020-10-12 11:08:10 +03:00
Botond Dénes	d35b0c06da	configure.py: add space before appending -ffile-prefix-map to user cflags Otherwise, it concatenates it to the last user provided cflag, creating a gibberish flag that gcc will choke on. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20201012073523.305271-1-bdenes@scylladb.com>	2020-10-12 10:40:02 +03:00
Nadav Har'El	977da3567f	Merge 'Alternator streams: Fix shard lengths, parenting, expiration, filter useless ones and improve paging' from Calle Wilund The remains of the defunct #7246. Fixes #7344 Fixes #7345 Fixes #7346 Fixes #7347 Shard ID length is now within limits. Shard end sequence number should be set when appropriate. Shard parent is selected a bit more carefully (sorting) Shards are filtered by time to exclude cdc generations we cannot get data from (too old) Shard paging improved Closes #7348 * github.com:scylladb/scylla: test_streams: Add some more sanity asserts alternator::streams: Set dynamodb data TTL explicitly in cdc options alternator::streams: Improve paging and fix parent-child calculation alternator::streams: Remove table from shard_id alternator::streams: Filter our cdc streams older than data/table alternator::error: Add a few dynamo exception types	2020-10-12 09:43:12 +03:00
Avi Kivity	4d6739c2e6	Merge "Use max_concurrent_for_each" from Benny " max_concurrent_for_each was added to seastar for replacing sstable_directory::parallel_for_each_restricted by using more efficient concurrency control that doesn't create unlimited number of continuations. The series replaces the use of sstable_directory::parallel_for_each_restricted with max_concurrent_for_each and exposes the sstable_directory::do_for_each_sstable via a static method. This method is used here by table::snapshot to limit concurrency do snapshot operations that suffer from the same unbound concurrency problem sstable_directory solved. In addition sstable_directory::_load_semaphore that was used across calls to do_for_each_sstable was replaced by a static per-shard semaphore that caps concurrency across all calls to `do_for_each_sstable` on that shard. This makes sense since the disk is a shared resource. In the future, we may want to have a load semaphore per device rather than a single global one. We should experiment with that. Test: unit(dev) " * tag 'max_concurrent_for_each-v5' of github.com:bhalevy/scylla: table: snapshot: use max_concurrent_for_each sstable_directory: use a external load_semaphore test: sstable_directory_test: extract sstable_directory creation into with_sstable_directory distributed_loader: process_upload_dir: use initial_sstable_loading_concurrency sstables: sstable_directory: use max_concurrent_for_each	2020-10-12 09:43:12 +03:00
Avi Kivity	54386efe9e	build: add libicui18n library for clang The build with clang fails with ld.lld: error: undefined symbol: icu_65::Collator::createInstance(icu_65::Locale const&, UErrorCode&) >>> referenced by like_matcher.cc >>> build/dev/utils/like_matcher.o:(boost::re_detail_106900::icu_regex_traits_implementation::icu_regex_traits_implementation(icu_65::Locale const&)) >>> referenced by like_matcher.cc >>> build/dev/utils/like_matcher.o:(boost::re_detail_106900::icu_regex_traits_implementation::icu_regex_traits_implementation(icu_65::Locale const&)) That symbol lives in libicui18n. It's not clear why clang fails to resolve it and gcc succeeds (after all, both use lld as the linker) but it is easier to add the library than to attempt to figure out the discrepancy. Closes #7391	2020-10-11 22:14:00 +03:00
Avi Kivity	8d3fcdc600	serializer.hh: remove unneeded semicolon after function definition Closes #7390	2020-10-11 22:12:04 +03:00
Avi Kivity	dfffa4dc71	utils: big_decimal: work around clang difficulty with boost::cpp_int(string_view) constructor Clang has some difficulty with the boost::cpp_int constructor from string_view. In fact it is a mess of enable_if<>s so a human would have trouble too. Work around it by converting to std::string. This is bad for performance, but this constructor is not going to be fast in any case. Hopefully a fix will arrive in clang or boost. Closes #7389	2020-10-11 22:09:19 +03:00
Bentsi Magidovich	7be252e929	dist: fix incorrect AWS user-data url we used http://169.254.169.254/latest/meta-data/user-data but correct one http://169.254.169.254/latest/user-data Fixes: https://github.com/scylladb/scylla-machine-image/issues/63 Closes #7388	2020-10-11 18:20:54 +03:00
Avi Kivity	00864b26c3	query-result-writer: fix idl definition order related failures with clang Following `ad48d8b43c`, fix a similar problem which popped up with higher inlining thresholds in query-result-writer.hh. Since idl/query depends on idl/keys, it must follow in definition order. Closes #7384	2020-10-11 17:57:12 +03:00
Avi Kivity	1145462a05	cql3: select_statement: fix undefined pointer arithmetic We add std::distance(...) + 1 to a vector iterator, but the vector can be empty, so we're adding a non-zero value to nullptr, which is undefined behavior. Rearrange to perform the limit (std::min()) before adding to the pointer. Found by clang's ubsan. Closes #7377	2020-10-11 17:54:08 +03:00
Avi Kivity	610fa83f28	test: database_test: fix threading confusion database_test contains several instances of calling do_with_cql_test_env() with a function that expects to be called in a thread. This mostly works because there is an internal thread in do_with_cql_test_env(), but is not guaranteed to. Fix by switching to the more appropriate do_with_cql_test_env_thread(). Closes #7333	2020-10-11 17:44:30 +03:00
Avi Kivity	b172e4c2ce	sstables: make index_bound a non-nested struct Due to a longstanding bug in clang[1], the compiler doesn't think that such a class is default-constructible. This causes std::optional<index_bound>::optional() not to compile. Because it depends on open_tt_marker, extract that too. [1] https://stackoverflow.com/questions/47974898/clang-5-stdoptional-instantiation-screws-stdis-constructible-trait-of-the-p Closes #7387	2020-10-11 17:40:01 +03:00
Avi Kivity	58e02c216a	test: sstable_datafile_test: sstable_run_based_compaction_test: prevent use of uninitialized variable observer The variable 'observer' (an std::optional) may be left uninitialized if 'incremental_enabled' is false. However, it is used afterwards with a call to disconnect, accessing garbage. Fix by accessing it via the optional wrapper. A call to optional::reset() destroys the observable, which in turn calls disconnect(). Closes #7380	2020-10-11 17:36:08 +03:00
Avi Kivity	af8fd8c8d8	utils: build_id: fix ubsan false positive on pointer arithmetic get_nt_build_id() constructs a pointer by adding a base and an offset, but if the base happens to be zero, that is undefined under C++ rules (altough legal ELF). Fix by performing the addition on integers, and only then casting to a pointer. Closes #7379	2020-10-11 17:23:40 +03:00
Avi Kivity	a36eb586ea	cql3: selection: don't use gcc extension "typeof" typeof is not recognized by clang. Use the modern equivalent "decltype" instead. Closes #7386	2020-10-11 17:21:15 +03:00
Avi Kivity	15ab6a3feb	test: cql_repl: use boost::regex instead of std::regex to avoid stack overflow libstdc++'s std::regex uses recursion[1], with a depth controlled by the input. Together with clang's debug mode, this overflows the stack. Use boost::regex instead, which is immune to the problem. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86164 Closes #7378	2020-10-11 17:12:21 +03:00
Avi Kivity	4fd0ba24ea	Update seastar submodule * seastar ebcb3aeec...35c255dcd (1): > append_challenged_posix_file_impl: allow destructing file with no queued work Fixes #7285.	2020-10-11 16:49:03 +03:00
Avi Kivity	7d025b5cf4	utils: log_heap: relax check for clang's sanitizer `b1e78313fe` added a check for ubsan to squelch a false positive, but that check doesn't work with clang. Relax it to check for debug mode, so clang doesn't hit the same false positive as gcc did. Define a SANITIZE macro so we have a reliable way to detect if we're running with a sanitizer. Closes #7372	2020-10-11 16:07:16 +03:00
Avi Kivity	882ed2017a	test: network_topology_strategy_test: fix overflow in d2t() d2t() scales a fraction in the range [0, 1] to the range of a biased token (same as unsigned long). But x86 doesn't support conversion to unsigned, only signed, so this is a truncating conversion. Clang's ubsan correctly warns about it. Fix by reducing the range before converting, and expanding it afterwards. Closes #7376	2020-10-11 16:05:02 +03:00
Avi Kivity	8932c4e919	compaction: allow _max_sstable_size = 0 Some test (run_based_compaction_test at least) use _max_sstable_size = 0 in order to force one partition per sstable. That triggers an overflow when calculating the expected bloom filter size. The overflow doesn't matter for normal operation, because the result later appears on a divisor, but does trigger a ubsan error. Squelch the error by bot dividing by zero here. I tried using _max_sstable_size = 1, but the test failed for other reasons. Closes #7375	2020-10-11 15:43:51 +03:00
Avi Kivity	fc1fcaa11e	lua: expect overflow when selecting lua types When converting a value to its Lua representation, we choose an integer type if it fits. If it doesn't, we fall back to a more expensive type. So we explicitly try to trigger an overflow. However, clang's ubsan doesn't like the overflow, and kills the test. Tell it that the overflow is expected here. Closes #7374	2020-10-11 15:38:07 +03:00
Avi Kivity	6bc6db8037	utils/array-search: document restrictions Our AVX2 implementation cannot load a partial vector, or mask unused elements (that can be done with AVX-512/SVE2), so it has some restrictions. Document them. Closes #7385	2020-10-11 15:19:54 +03:00
Avi Kivity	3e2707c2bf	utils: fragmented_temporary_buffer: don't add to potentially null pointers Offsetting a null pointer is undefined, and clang's ubsan complains. Rearrange the arithmetic so we never offset a null pointer. A function is introduced for the remaining contiguous bytes so it can cast the result to size_t, avoiding a compare-of-different-signedness warning from gcc. Closes #7373	2020-10-11 15:05:15 +03:00
Benny Halevy	d55985bb7d	build: Upgrade to seastar API level 6 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201011105422.818623-2-bhalevy@scylladb.com>	2020-10-11 14:40:32 +03:00
Benny Halevy	064aae8ffa	flush_queue: call_helper: support no variadic futures Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201011105422.818623-1-bhalevy@scylladb.com>	2020-10-11 14:40:32 +03:00
Avi Kivity	4c63723ead	types: tighten digit count requirement on time nanoseconds components When the number of nanosecond digits is greater than 9, the std::pow() expression that corrects the nanosecond value becomes infinite. This is because sstring::length() is unsigned, and so negative values underflow and become large. Following Cassandra, fix by forbidding more than 9 digits of nanosecond precision. Found by clang's ubsan. Closes #7371	2020-10-11 14:13:46 +03:00
Rafael Ávila de Espíndola	a3bd546197	types: Work around a clang thread-local code generation bug (user_type) Following `5d249a8e27`, apply the same fix for user_type_impl. This works around https://bugs.llvm.org/show_bug.cgi?id=47747 Depending on this might be unstable, as the bug bug can show up at any corner, but this is sufficient right now to get test_user_function_disabled to pass. Closes #7370	2020-10-11 12:36:38 +03:00
Avi Kivity	6fbfff7b31	Update seastar submodule * seastar c62c4a3df...ebcb3aeec (1): > Merge "map_reduce: futurize_invoke reducer" from Benny	2020-10-11 12:17:06 +03:00
Benny Halevy	a0b5529441	flush_queue: use futurator::invoke Attend to the following warning with Seastar_API_LEVEL 5+: ``` ./utils/flush_queue.hh:68:36: warning: ‘static seastar::futurize<T>::type seastar::futurize<T>::apply(Func&&, FuncArgs&& ...) [with Func = test_queue_ordering_random_ops::run_test_case()::<lambda(int)>::<lambda(int)>; FuncArgs = {int}; T = void; seastar::futurize<T>::type = seastar::future<>]’ is deprecated: Use invoke for varargs [-Wdeprecated-declarations] 68 \| return futurator::apply(std::forward<Func>(func), f.get()); ``` Test: flush_queue(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201007112130.474269-1-bhalevy@scylladb.com>	2020-10-11 12:14:17 +03:00
Nadav Har'El	87cfdb69c6	Merge 'cql3: use larger stack for do_with_cql_parser() in debug mode' from Avi Kivity Our cql parser uses large amounts of stack, and can overflow it in debug mode with clang. To prevent this stack overflow, temporarily use a larger (1MB) stack. Closes #7369 * github.com:scylladb/scylla: cql3: use larger stack for do_with_cql_parser() in debug mode cql3: deinline do_with_cql_parser()	2020-10-11 11:29:06 +03:00
Avi Kivity	c41905e986	utils: array-search: deinline, working around clang bug Clang has a bug processing inline ifuncs with intrinsics[1]. Since ifuncs can't be inlined anyway (they are always dispatched via a function pointer that is determined based on the CPU features present), nothing is gained by inlining them. Deinlining therefore reduces compile time and works around the clang bug. [1] https://bugs.llvm.org/show_bug.cgi?id=47691 Closes #7358	2020-10-11 10:29:24 +03:00
Avi Kivity	cb6231d1e2	cql3: use larger stack for do_with_cql_parser() in debug mode Our cql parser uses large amounts of stack, and can overflow it in debug mode with clang. To prevent this stack overflow, temporarily use a larger (1MB) stack. We can't use seastar::thread(), since do_with_cql_parser() does not yield. We can't use std::thread(), since lw_shared_ptr()'s debug mode will scream murder at an lw_shared_ptr used across threads (even though it's perfectly safe in this case). We can't use boost::context2 since that requires the library to be compiled with address sanitizer support, which it isn't on Fedora. So we use a fiber switch using the getcontext() function familty. This requires extra annotations for debu mode, which are added.	2020-10-10 00:31:50 +03:00
Avi Kivity	31886bc562	cql3: deinline do_with_cql_parser() The cql parser causes trouble with the santizers and clang, since it consumes a large amount of stack space (it does so with gcc too, but does not overflow our 128k stacks). In preparation for working around the problem, deinline it so the hacks need not spread to the entire code base via #include. There is no performance impact from the virtual function, as cql parsing will dominate the call.	2020-10-09 23:49:42 +03:00
Tomasz Grabiec	d2dd2b1ef9	Merge "raft: declarative raft testing" from Alejo Raft tests with declarative structure instead of procedural. * https://github.com/alecco/scylla/tree/raft-ale-tests-03d: raft: log failed test case name raft: test add hasher raft: declarative tests raft: test make app return proper exit int value raft: test add support for disconnected server raft: tests use custom server ids for easier debugging raft: make election_elapsed public for testing raft: test remove unnecessary header raft: fix typo snaphot snapshot	2020-10-09 16:01:52 +02:00
Alejo Sanchez	5d408082b6	raft: log failed test case name Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:50:47 +02:00
Alejo Sanchez	664b3eddb1	raft: test add hasher Values seen by nodes were so far added but this does not provide a guarantee the order of these values was respected. Use a digest to check output, implicitly checking order. On the other hand, sum or a simple positional checksum like Fletcher's is easier to debug as rolling sum is evident. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:50:42 +02:00
Alejo Sanchez	670824c6fa	raft: declarative tests For convenience making Raft tests, use declarative structures. Servers are set up and initialized and then updates are processed. For now, updates are just adding entries to leader and change of leader. Updates and leader changes can be specified to run after initial test setup. An example test for 3 nodes, node 0 starting as leader having two entries 0 and 1 for term 1, and with current term 2, then adding 12 entries, changing leader to node 1, and adding 12 more entries. The test will automatically add more entries to the last leader until the test limit of total_values (default 100). {.name = "test_name", .nodes = 3, .initial_term = 2, .initial_states = {{.le = {{1,0},{1,1}}}, .updates = {entries{12},new_leader{1},entries{12}},}, Leader is isolated before change via is_leader returning false. Initial leader (default server 0) will be set with this method, too. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:50:31 +02:00
Alejo Sanchez	7d4b33d834	raft: test make app return proper exit int value Seastar app returns int result exit value. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:50:24 +02:00
Alejo Sanchez	093bc8fbb3	raft: test add support for disconnected server Failure detector support of disconnected servers with a global set of addresses. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:50:02 +02:00
Alejo Sanchez	21d7686766	raft: tests use custom server ids for easier debugging Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:49:57 +02:00
Alejo Sanchez	9f401c517e	raft: make election_elapsed public for testing Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:49:52 +02:00
Alejo Sanchez	56683ae689	raft: test remove unnecessary header Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:49:45 +02:00
Alejo Sanchez	1bff357816	raft: fix typo snaphot snapshot Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:49:39 +02:00
Pekka Enberg	266d2b6f71	Update tools/jmx submodule * tools/jmx c55f3f2...c51906e (1): > StorageService.java: Use the endpoint for getRangeToEndpointMap	2020-10-08 12:09:24 +03:00
Amnon Heiman	48c3c94aa6	api/storage_service.cc: Add the get_range_to_endpoint_map The get_range_to_endpoint_map method, takes a keyspace and returns a map between the token ranges and the endpoint. It is used by some external tools for repair. Token ranges are codes as size-2 array, if start or end are empty, they will be added as an empty string. The implementation uses get_range_to_address_map and re-pack it accordingly. The use of stream_range_as_array it to reduce the risk of large allocations and stalls. Relates to scylladb/scylla-jmx#36 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes #7329	2020-10-08 12:09:09 +03:00
Benny Halevy	1ba9e253c4	table: snapshot: use max_concurrent_for_each Tables may have thousands of sstables and a number of component files for each sstables. Using parallel_for_each on all sstables (and parallel_for_each in sstables::create_links for each file) needlessly overloads the system with unbounded number of continuations. Use max_concurrent_for_each and acquire the db sst_dir_semaphore to limit parallelism. Note that although snapshot is called while scylla already loaded the sstable we use the configured initial_sstable_loading_concurrency(). As a future follow-up we may want to define yet another config variable for on-going operations on sstable directories if we see that it warrants a diffrent setting than the initial loading concurrency. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-08 11:57:06 +03:00
Benny Halevy	57cc5f6ae1	sstable_directory: use a external load_semaphore Although each sstable_directory limits concurrency using max_concurrent_for_each, there could be a large number of calls to do_for_each_sstable running in parallel (e.g per keyspace X per table in the distributed_loader). To cap parallelism across sstable_directory instances and concurrent calls to do_for_each_sstable, start a sharded<semaphore> and pass a shared semaphore& to the sstable_directory:s. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-08 11:57:06 +03:00
Benny Halevy	dc46aaa3fd	test: sstable_directory_test: extract sstable_directory creation into with_sstable_directory Use common code to create, start, and stop the sharded<sstable_directory> for each test. This will be used in the next patch for creating a sharded semaphore and passing it to the sstable_directory. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-08 11:57:06 +03:00
Takuya ASADA	ec68f67d7e	dist/debian/debian_files_gen.py: don't ignore permission error on shutil.rmtree() shutil.rmtree(ignore_errors=True) was for ignores error when directory not exist, but it also ignores permission error, so we shouldn't use that. Run os.path.exists() before shutil.rmtree() instead. Fixes #7337 Closes #7338	2020-10-08 11:49:10 +03:00
Pekka Enberg	db6bb1ba91	Update tools/java submodule * tools/java 4313155ab6...f2e8666d7e (1): > dist/debian/debian_files_gen.py: don't ignore permission error on shutil.rmtree()	2020-10-08 11:49:01 +03:00
Pekka Enberg	02bf30e9f5	Update tools/jmx submodule * tools/jmx e3a381d...c55f3f2 (1): > dist/debian/debian_files_gen.py: don't ignore permission error on shutil.rmtree()	2020-10-08 11:48:57 +03:00
Pekka Enberg	6c133e36d8	Merge 'build: prepare for clang' from Avi Kivity This series prepares the build system for clang support. It deals with the different sets of warnings accepted by clang and gcc, and with detecting clang 10 as a supported compiler. It's still not possible to build with clang after this, but we're another step closer. Closes #7269 * github.com:scylladb/scylla: build: detect and allow clang 10 as a compiler build: detect availablity of -Wstack-usage= build: disable many clang-specific warnings	2020-10-08 10:16:12 +03:00
Avi Kivity	767e30927c	test: suppress ubsan true-positive on rapidjson rapidjson has a harmless (but true) ubsan violation. It was fixed in `16872af889`. Since rapidjson has't released since 2016, we're unlikely to see the fix, so suppress it to prevent the tests failing. In any case the violation is harmless. gcc's ubsan doesn't object to the addition. Closes #7357	2020-10-07 19:27:49 +03:00
Gleb Natapov	0bff15a976	raft: Send multiple entries in one append_entry rpc Send more that one entry in single append_entry message but limit one packets size according to append_request_threshold parameter. Message-Id: <20201007142602.GA2496906@scylladb.com>	2020-10-07 16:43:33 +02:00
Nadav Har'El	bff6fccc9f	Update seastar submodule Updated for the ability to add group names to SMP service groups (https://github.com/scylladb/seastar/pull/809). * seastar 8c8fd3ed...c62c4a3d (3): > smp service group: add optional group name > dpdk: mark link_ready() function override > Merge "sharded: make start, stop, and invoke_on methods noexcept" from Benny	2020-10-07 15:59:48 +03:00
Nadav Har'El	f30e86395a	Merge 'table: fix race and exception handling in on_compaction_completion()' from Avi Kivity Fix a race condition in on_compaction_completion() that can prevent shutdown, as well as an exception handling error. See individual patches for details. Fixes #7331. Closes #7334 * github.com:scylladb/scylla: table: fix mishandled _sstable_deleted_gate exception in on_compaction_completion table: fix on_compaction_completion corrupting _sstables_compacted_but_not_deleted during self-race	2020-10-07 15:27:59 +03:00
Benny Halevy	f4269e3a04	distributed_loader: process_upload_dir: use initial_sstable_loading_concurrency Although process_upload_dir is not called when initially loading the tables, but rather from from storage_service::load_new_sstables, it can use the same sstable_loading_concurrency, rather than constant `4`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-07 14:45:20 +03:00
Benny Halevy	c26c784882	sstables: sstable_directory: use max_concurrent_for_each Use max_concurrent_for_each instead of parallel_for_each in sstable_directory::parallel_for_each_restricted to avoid creating potentially thousands of continuations, one for each sstable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-07 14:45:20 +03:00
Calle Wilund	349c5ee21a	test_streams: Add some more sanity asserts Checking validity of retured shard sets etc.	2020-10-07 08:43:39 +00:00
Calle Wilund	1ed864ce4c	alternator::streams: Set dynamodb data TTL explicitly in cdc options They should be the same by default, but setting it explicitly protects us from any changing defaults.	2020-10-07 08:43:39 +00:00
Calle Wilund	04deacd7e7	alternator::streams: Improve paging and fix parent-child calculation Fixes #7345 Fixes #7346 Do a more efficient collection skip when doing paging, instead of iterating the full sets. Ensure some semblance of sanity in the parent-child relationship between shards by ensuring token order sorting and finding the apparent previous ID coverting the approximate range of new gen. Fix endsequencenumber generation by looking at whether we are last gen or not, instead of the (not filled in) 'expired' column.	2020-10-07 08:43:39 +00:00
Calle Wilund	3cdd7fe191	alternator::streams: Remove table from shard_id Fixes #7344 It is not data really needed, as shard_id:s are not required to be unique across streams, and also because the length limit on shard_id text representation. As a side effect, shard iter instead carries the stream arn.	2020-10-07 08:43:39 +00:00
Pekka Enberg	16ed6fee40	Update tools/jmx submodule * tools/jmx 25bcd76...e3a381d (1): > install.sh: show warning nonroot mode when systemd does not support user mode	2020-10-07 11:39:03 +03:00
Botond Dénes	db56ae695c	types: validate(): linearize values lazily Instead of eagerly linearizing all values as they are passed to validate(), defer linearization to those validators that actually need linearized values. Linearizing large values puts pressure on the memory allocator with large contiguous allocation requests. This is something we are trying to actively avoid, especially if it is not really neaded. Turns out the types, whose validators really want linearized values are a minority, as most validators just look at the size of the value, and some like bytes don't need validation at all, while usually having large values. This is achieved by templating the validator struct on the view and using the FragmentedRange concept to treat all passed in views (`bytes_view` and `fragmented_temporary_buffer_view`) uniformly. This patch makes no attempt at converting existing validators to work with fragmented buffers, only trivial cases are converted. The major offenders still left are ascii/utf8 and collections. Fixes: #7318 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20201007054524.909420-1-bdenes@scylladb.com>	2020-10-07 11:00:18 +03:00
Piotr Grabowski	369895b80f	transport: Delay NEW_NODE until CQL listen started After adding a new node to the cluster, Scylla sends a NEW_NODE event to CQL clients. Some clients immediately try to connect to the new node, however it fails as the node has not yet started listening to CQL requests. In contrast, Apache Cassandra waits for the new node to start its CQL server before sending NEW_NODE event. In practice this means that NEW_NODE and UP events will be sent "jointly" after new node is UP. This change is implemented in the same manner as in Apache Cassandra code. Fixes #7301. Closes #7306	2020-10-07 09:57:27 +03:00
Rafael Ávila de Espíndola	5d249a8e27	types: Work around a clang thread-local code generation bug This works around https://bugs.llvm.org/show_bug.cgi?id=47747 Depending on this might be unstable, as the bug bug can show up at any corner, but this is sufficient right now to get test_user_function_disabled to pass. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20201007000713.1503302-1-espindola@scylladb.com>	2020-10-07 09:49:53 +03:00
Calle Wilund	f1ad66218a	alternator::streams: Filter our cdc streams older than data/table Fixes #7347 If cdc stream id:s are older than either table creation or now - 24h we can skip them in describe_stream, to minimize the amount of shards being returned.	2020-10-07 06:13:28 +00:00
Juliusz Stasiewicz	acf0341e9b	transport: On successful authentication add `username` to system.clients The username becomes known in the course of resolving challenges from `PasswordAuthenticator`. That's why username is being set on successful authentication; until then all users are "anonymous". Meanwhile, `AllowAllAuthenticator` (the default) does not request username, so users logged with it will remain as "anonymous" in `system.clients`. Shuffling of code was necessary to unify existing infrastructure for INSERTing entries into `system.clients` with later UPDATEs.	2020-10-06 18:52:46 +02:00
Avi Kivity	4bbcc81cfe	Merge "Use local reference on query_processor in tracing" from Pavel E " There are few places left that call for global query processor instance, the tracing is one of them. The query pressor is used mainly in table_helper, so this set mostly shuffles its methods' arguments to deliver the needed reference. At the end the main.cc code is patched to provide the query processor, which is still global and not stopped, and is thus safe to be used anywhere. tests: unit(dev), dtest(cql_tracing:dev) " * 'br-tracing-vs-query-processor' of https://github.com/xemul/scylla: tracing: Keep qp anchor on backend tracing: Push query processor through init methods main: Start tracing in main table_helper: Require local query processor in calls table_helper: Use local qp as setup_table argument table_helper: Use local db variable	2020-10-06 18:04:24 +03:00
Avi Kivity	c6a3fa5a49	Merge "querier_cache: use the querier's permit for memory accounting" from Botond " The querier cache has a memory based eviction mechanism, which starts evicting freshly inserted queriers once their collective memory consumption goes above the configured limit. For determining the memory consumption of individual queriers, the querier cache uses `flat_mutation_reader::buffer_size()`. But we now have a much more comprehensive accounting of the memory used by queriers: the reader permit, which also happens to be available in each querier. So use this to determine the querier's memory consumption instead. Tests: unit(dev) " * 'querier-cache-use-permit-for-memory-accounting/v1' of https://github.com/denesb/scylla: flat_mutation_reader: de-virtualize buffer_size() querier_cache: use the reader permit for memory accounting querier_cache_test: use local semaphore not the test global one reader_permit: add consumed_resources() accessor	2020-10-06 16:52:44 +03:00
Calle Wilund	5081d354be	alternator::error: Add a few dynamo exception types	2020-10-06 12:52:58 +00:00
Pavel Emelyanov	e7f74449a6	tracing: Keep qp anchor on backend The query processor is required in table_helper's used by tracing. Now everything is ready to push the query processor reference from main down to the table helpers. Because of the current initialization sequence it's only possible to have the started query processor at the .start_tracing() time. Earlier, when the sharded<tracing> is started the query processor is not yet started, so tracing keeps a pointer on local query processor. When tracing is stopped, the pointer is null-ed. This is safe (but an assert is put when dereferencing it), because on stop trace writes' gate is closed and the query processor is only used in them. Also there's still a chance that tracing remains started in case of start abort, but this is on-par with the current code -- sharded query processor is not stopped, so the memory is not freed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 15:45:19 +03:00
Pavel Emelyanov	87f1223965	tracing: Push query processor through init methods The goal is to make tracing keyspace helper reference query processor, so this patch adds the needed arguments through the initialization stack. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 15:45:12 +03:00
Pavel Emelyanov	b5f136c651	main: Start tracing in main Move the tracing::start_tracing() out of the storage_service::join_cluster. It anyway happens at the end of the join, so the logic is not changed, but it becomes possible to patch tracing further. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 15:44:59 +03:00
Pavel Emelyanov	b18522a7ab	table_helper: Require local query processor in calls Keeping the query processor reference on the table_helper in raii manner seems waistful, the only user of it -- the trace_keyspace_helper -- has a bunch of helpers on board, each would then keep its own copy for no gain. At the same time the trace_keyspace_helper already gets the query processor for its needs, so it can share one with table_helper-s. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 15:44:20 +03:00
Pavel Emelyanov	f5d39b9638	table_helper: Use local qp as setup_table argument The goal is to make table_helper API require the query_processor reference and use it where needed. The .setup_table() is private method, and still grabs the query processor reference itself. Since its futures do noth reshard, it's safe to carry the query processor reference through. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 15:44:00 +03:00
Pavel Emelyanov	2f69e90fc9	table_helper: Use local db variable The .setup_keyspace() method already has the db variable in this continuation lambda. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 15:43:54 +03:00
Tomasz Grabiec	46b7ba8809	Merge "Bring memory footprint test back to work" from Pavel Emelyanov The test was broken by recent sstables manager rework. In the middle the sstables::test_env is destroyed without being closed which leads to broken _closing assertion inside ~sstables_manager(). Fix is to use the test_env::do_with helper. tests: perf.memory_footprint * https://github.com/xemul/scylla/tree/br-memory-footprint-test-fix: test/perf/memory_footprint: Fix indentation after previous patch test/perf/memory_footprint: Don't forget to close sstables::test_env after usage	2020-10-06 11:49:03 +02:00
Pavel Emelyanov	8bceb916ea	test/perf/memory_footprint: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 11:08:09 +03:00
Pavel Emelyanov	3e4de0f748	test/perf/memory_footprint: Don't forget to close sstables::test_env after usage After recent sstables manager rework the sstables::test_env must be .close()d after usage, otherwise the ~sstables_mananger() hits the _closing assertion. Do it with the help of .do_with(). The execution context is already seastar::async in this place, so .get() it explicitly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 11:06:35 +03:00
Pavel Emelyanov	8558339c63	perf_collection: Add test for full scan time Scan here means walking the collection forward using iterator. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 09:57:37 +03:00
Pavel Emelyanov	7284469b24	perf_collection: Add test for destruction with .clear() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 09:57:37 +03:00
Pavel Emelyanov	72ccc43380	perf_collection: Add test for single element insertion In some cases a collection is used to keep several elements, so it's good to know this timing. For example, a mutation_partition keeps a set of rows, if used in cache it can grow large, if used in mutation to apply, it's typically small. Plain replacement of bst into b-tree caused performance degardation of mutation application because b-tree is only better at big sizes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 09:57:37 +03:00
Pavel Emelyanov	207e1aa48f	perf_collection: Add intrusive_set_external_comparator This collection is widely used, any replacement should be compared against it to better understand pros-n-cons. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 09:57:37 +03:00
Pavel Emelyanov	2d09864627	perf_collection: Clear collection between itartions Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 09:57:37 +03:00
Pavel Emelyanov	c891f274dc	test: Generalize perf_bptree into perf_collection Rename into perf_collection and localize the B+ code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 09:57:37 +03:00
Avi Kivity	0ef85a102f	table: fix mishandled _sstable_deleted_gate exception in on_compaction_completion on_compaction_completion tries to handle a gate_closed_exception, but with_gate() throws rather than creating an exceptional future, so the extra handling is lost. This is relatively benign since it will just fail the compaction, requiring that work to be redone later. Fix by using the safer try_with_gate().	2020-10-06 08:31:28 +03:00
Avi Kivity	a43d5079f3	table: fix on_compaction_completion corrupting _sstables_compacted_but_not_deleted during self-race on_compaction_completion() updates _sstables_compacted_but_not_deleted through a temporary to avoid an exception causing a partial update: 1. copy _sstables_compacted_but_not_deleted to a temporary 2. update temporary 3. do dangerous stuff 4. move temporary to _sstables_compacted_but_not_deleted This is racy when we have parallel compactions, since step 3 yields. We can have two invocations running in parallel, taking snapshots of the same _sstables_compacted_but_not_deleted in step 1, each modifying it in different ways, and only one of them winning the race and assigning in step 4. With the right timing we can end with extra sstables in _sstables_compacted_but_not_deleted. Before `a5369881b3`, this was a benign race (only resulting in deleted file space not being reclaimed until the service is shut down), but afterwards, extra sstable references result in the service refusing to shut down. This was observed in database_test in debug mode, where the race more or less reliably happens for system.truncated. Fix by using a different method to protect _sstables_compacted_but_not_deleted. We unconditionally update it, and also unconditionally fix it up (on success or failure) using seastar::defer(). The fixup includes a call to rebuild_statistics() which must happen every time we touch the sstable list. Fixes #7331.	2020-10-06 08:29:34 +03:00
Botond Dénes	dd372c8457	flat_mutation_reader: de-virtualize buffer_size() The main user of this method, the one which required this method to return the collective buffer size of the entire reader tree, is now gone. The remaining two users just use it to check the size of the reader instance they are working with. So de-virtualize this method and reduce its responsibility to just returning the buffer size of the current reader instance.	2020-10-06 08:22:56 +03:00
Botond Dénes	cd8d10873f	querier_cache: use the reader permit for memory accounting The querier cache has a memory limit it enforces on cached queriers. For determining how much memory each querier uses, it currently uses `flat_mutation_reader::buffer_size()`. However, we now have a much more complete accounting of the memory each read consumes, in the form of the reader permit, which also happens to be handy in the queriers. So use it instead of the not very well maintained `buffer_size()`.	2020-10-06 08:22:56 +03:00
Botond Dénes	f7eea06f61	querier_cache_test: use local semaphore not the test global one In the mutation source, which creates the reader for this test, the global test semaphore's permit was passed to the created reader (`tests::make_permit()`). This caused reader resources to be accounted on the global test semaphore, instead of the local one the test creates. Just forward the permit passed to the mutation sources to the reader to fix this.	2020-10-06 08:22:56 +03:00
Botond Dénes	73a6b97c75	reader_permit: add consumed_resources() accessor That allows querying he amount of resources accounted though this permit, and by extension by this logical read.	2020-10-06 08:18:42 +03:00
Nadav Har'El	421f0c729d	merge: counters: Avoid signed integer overflow Merged patch series by Tomasz Grabiec: UBSAN complains in debug mode when the counter value overflows: counters.hh:184:16: runtime error: signed integer overflow: 1 + 9223372036854775807 cannot be represented in type 'long int' Aborting on shard 0. Overflow is supposed to be supported. Let's silence it by using casts. Fixes #7330. Tests: - build/debug/test/tools/cql_repl --input test/cql/counters_test.cql Tomasz Grabiec (2): counters: Avoid signed integer overflow test: cql: counters: Add tests reproducing signed integer overflow in debug mode counters.hh \| 2 +- test/cql/counters_test.cql \| 9 ++++++++ test/cql/counters_test.result \| 48 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 58 insertions(+), 1 deletion(-)	2020-10-05 21:43:19 +03:00
Tomasz Grabiec	f01ffe063a	test: cql: counters: Add tests reproducing signed integer overflow in debug mode Reproduces #7330	2020-10-05 20:06:34 +02:00
Tomasz Grabiec	d9b9952d7c	counters: Avoid signed integer overflow UBSAN complains in debug mode when the counter value overflows: counters.hh:184:16: runtime error: signed integer overflow: 1 + 9223372036854775807 cannot be represented in type 'long int' Aborting on shard 0. Overflow is supposed to be supported. Let's silence it by using casts. Fixes #7330.	2020-10-05 20:04:09 +02:00
Alejo Sanchez	6b38ecc6e0	raft: Forbid server address 0 as it has special meaning Server address UUID 0 is not a valid server id since there is code that assumes if server_id is 0 the value is not set (e.g _voted_for). Prevent users from manually setting this invalid value. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-05 15:04:46 +02:00
Konstantin Osipov	532343f09e	raft: Fix the bug with not setting the current leader. When AppendEntries/InstallSnapshot term is the same as the current server's, and the current servers' leader is not set, we should assign it to avoid starting election if the current leader becomes idle. Restructure the code accordingly - change candidate state to Follower upon InstallSnapshot.	2020-10-05 15:04:45 +02:00
Gleb Natapov	a9674a197b	raft: Get back probe_sent logic in progress::PROBE state. It was erroneously replaced by the logic based on time which caused us to send one probe per tick which is not an intention at all. There can be one outstanding probe message but the moment it gets a reply next one should be sent without waiting for a tick.	2020-10-05 15:04:44 +02:00
Avi Kivity	4f30c479f3	Merge "token_metadata cleanup" from Benny " Misc. cleanups and minor optimizations of token_metadata methods in preparation to futurizing parts of the api around update_pending_ranges and abstract_replication_strategy::calculate_natural_endpoints, to prevent reactor stalls on these paths Test: unit(dev) " * 'token_metadata_cleanup' of github.com:bhalevy/scylla: token_metadata: get rid of unused calculate_pending_ranges_for_* methods token_metadata: get rid of clone_after_all_settled token_metadata_impl: remove_endpoint: do not sort tokens token_metadata_impl: always sort_tokens in place	2020-10-05 13:31:59 +03:00
Takuya ASADA	0f786f05fe	install.sh: logging to scylla-server.log when journalctl --user does not work On some environment such as CentOS8, journalctl --user -xe does not work since journald is running in volatile mode. The issue cannnot fix in non-root mode, as a workaround we should logging to a file instead of journal. Also added scylla_logrotate to ExecStartPre which rename previous log file, since StandardOutput=file:/path/to/file will erase existing file when service restarted. Fixes #7131 Closes #7326	2020-10-05 13:17:27 +03:00
Avi Kivity	d72465531e	build: use consistent version-release strings across submodules Instead of relying on SCYLLA-VERSION-GEN to be consistently updated in each submodule, propagate the top-level product-version-release to all submodules. This reduces the churn required for each release, and makes the release strings consistent (previously, the git hash in each was different). Closes #7268	2020-10-05 12:32:49 +03:00
Nadav Har'El	8e2e2eab7c	alternator test: tests for nested attributes in FilterExpression Alternator does not yet support direct access to nested attributes in expressions (this is issue #5024). But it's still good to have tests covering this feature, to make it easier to check the implementation of this feature when it comes. Until now we did not have tests for using nested attributes in FilterExpression. This patch adds a test for the straightforward case, and also adds tests for the more elaborate combination of FilterExpression and ProjectionExpression. This combination - see issue #6951 - means that some attributes need to be retrieved despite not being projected (because they are needed in a filter). When we support nested attributes there will be special cases when the projected and filtered attributes are parts of the same top-level attribute, so the code will need to handle those cases correctly. As I was working on issue #6951 now, it is a good time to write a test for these special cases, even if nested attributes aren't yet supported - so we don't forget to handle these special cases later. Both new tests pass on DynamoDB, and xfail on Alternator. Refs #5024 (nested attributes) Refs #6951 (FilterExpression with ProjectionExpression) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-10-05 02:19:22 +03:00
Nadav Har'El	a403356ade	alternator test: fix comment A comment in test/alternator/test_lsi.py wrongly described the schema of one of the test tables. Fix that comment. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-10-05 02:19:22 +03:00
Nadav Har'El	85cc535792	alternator tests: additional tests for filter+projection combination This patch provides two more tests for issue #6951. As this issue was already fixed, the two new tests pass. The two new test check two special cases for which were handled correctly but not yet tested - when the projected attribute is a key attribute of the table or of one of its LSIs. Having these two additional tests will ensure that any future refactoring or optimizations in the this area of the code (filtering, projection, and its combination) will not break these special cases. Refs #6951. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-10-05 02:19:22 +03:00
Nadav Har'El	2fc3a30b45	alternator: forbid combining old and new-style parameters The DynamoDB API has for the Query and Scan requests two filtering syntaxes - the old (QueryFilter or ScanFilter) and the new (FilterExpression). Also for projection, it has an old syntax (AttributesToGet) and a new one (ProjectionExpression). Combining an old-style and new-style parameter is forbidden by DynamoDB, and should also be forbidden by Alternator. This patch fixes, and removes the "xfails" tag, of two tests: test_query_filter.py::test_query_filter_and_projection_expression test_filter_expression.py::test_filter_expression_and_attributes_to_get Refs #6951 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-10-05 02:19:22 +03:00
Nadav Har'El	282742a469	alternator: fix query with both projection and filtering We had a bug when a Query/Scan had both projection (ProjectionExpression or AttributesToGet) and filtering (FilterExpression or Query/ScanFilter). The problem was that projection left only the requested attributes, and the filter might have needed - and not got - additional attributes. The solution in this patch is to add the generated JSON item also the extra attributes needed by filtering (if any), run the filter on that, and only at the end remove the extra filtering attributes from the item to be returned. The two tests test_query_filter.py::test_query_filter_and_attributes_to_get test_filter_expression.py::test_filter_expression_and_projection_expression Which failed before this patch now pass so we drop their "xfail" tag. Fixes #6951. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-10-05 02:19:22 +03:00
Avi Kivity	715d50bc85	Update seastar submodule * seastar 292ba734bc...8c8fd3ed28 (15): > semaphore_units: add return_units and return_all > semaphore_units: release: mark as noexcept > circular_buffer: support non-default-constructible allocators correctly > core/shared_ptr: Expose use count through {lw_}enable_shared_from_this > memory: align allocations to std::max_align_t > util/log: logger::do_log(): go easier on allocations > doc: add link to multipage version of tutorial > doc: fix the output directories of split and tutorial.html > build: do not copy htmlsplit.py to build dir > doc: add "--input" and "--output-dir" options to htmlsplit.py > doc: update split script to use xml.etree.ElementTree > Merge "shared_future: make functions noexcept" from Benny > tutorial: add linebreak between sections > tutorial: format "future<int>" as inline code block > docs: specify HTML language code for tutorial.html	2020-10-04 21:30:27 +03:00
Etienne Adam	46f0354cdb	redis: pass request as a reference This patch change the way the request object is passed, using a reference instead of temporaries. 'exists' test is passing in debug mode, whereas it was always failing before. Fixes #7261 by ensuring request object is alive for all commands during the whole request duration. Signed-off-by: Etienne Adam <etienne.adam@gmail.com> Message-Id: <20200924202034.30399-1-etienne.adam@gmail.com>	2020-10-04 14:58:00 +03:00
Avi Kivity	5b5b8b3264	lua: be compatibile with Lua 5.4's lua_resume() Lua 5.4 added an extra parameter to lua_resume()[1]. The parameter denotes the number of arguments yielded, but our coroutines don't yield any arguments, so we can just ignore it. Define a macro to allow adding extra stuff with Lua 5.4, and use it to supply the extra parameter. [1] https://www.lua.org/manual/5.4/manual.html#8.3 Closes #7324	2020-10-04 14:07:51 +03:00
Nadav Har'El	ad48d8b43c	Merge 'idl: fix definition order related build failures with clang' from Avi Kivity Clang eagerly instantiates templates, apparently with the following algorithm: - if both the declaration and definition are seen at the time of instantiation, instantiate the template - if only the declaration is see at the time of instantiation, just emit a reference to the template; even if the definition is later seen, it is not instantiated The "reference" in the second case is a relocation entry in the object file that is satisfied at link time by the linker, but if no other object file instantiated the needed template, a link error results. These problems are hard to diagnose but easy to fix. This series fixes all known such issues in the code base. It was tested on gcc as well. Closes #7322 * github.com:scylladb/scylla: query-result-reader: order idl implementations correctly frozen_schema: order idl implementations correctly idl-compiler: generate views after serializers	2020-10-04 11:16:19 +03:00
Takuya ASADA	d611d74905	dist/common/scripts/scylla_setup: force developer mode on nonroot when NOFILE is too low On Ubuntu 16/18 and Debian 9, LimitNOFILE is set to 4096 and not able to override from user unit. To run scylla-server in such environment, we need to turn on developer mode and show warnings. Fixes #7133 Closes #7323	2020-10-04 10:16:30 +03:00
Avi Kivity	4b40bc5065	query-result-reader: order idl implementations correctly Clang eagerly instantiates templates, so if it needs a template function for which it has a declaration but not a definition, it will not instantiate the definition when it sees it. This causes link errors. Fix by ordering the idl implementation files so that definitions come before uses.	2020-10-03 19:56:29 +03:00
Avi Kivity	94fcec99d1	frozen_schema: order idl implementations correctly Clang eagerly instantiates templates, so if it needs a template function for which it has a declaration but not a definition, it will not instantiate the definition when it sees it. This causes link errors. Fix by ordering the idl implementation files so that definitions come before uses.	2020-10-03 19:56:28 +03:00
Avi Kivity	a99aba9e48	idl-compiler: generate views after serializers Clang eagerly instantiates templates, so if it needs a template function for which it has a declaration but not a definition, it will not instantiate the definition when it sees it. This causes link errors. In this case, the views use the serializer implementations, but are generated before them. Fix by generating the view implementations after the serializer implementations that they use.	2020-10-03 19:56:25 +03:00
Tomasz Grabiec	40b42393d2	Merge "Raft: disable boost tests, add disable to test.py" from Alejo Add disable option for test configuration. Tests in this list will be disabled for all modes. * alejo/next-disable-raft-tests-01: Raft: disable boost tests for now Tests: add disable to configuration Raft: Remove tests for now	2020-10-02 15:51:13 +02:00
Yaron Kaikov	bec0c15ee9	configure.py: Add version to unified tarball filename Let's add the version and release to unified tarball filename to avoid having to do that in release engineering pipelines, for example. Closes #7317	2020-10-02 15:48:11 +03:00
Alejo Sanchez	bb67d15e2f	Raft: disable boost tests for now Disable raft fsm boost tests until raft is part of build. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-02 14:03:01 +02:00
Alejo Sanchez	eff7b63c08	Tests: add disable to configuration For suite.yaml add an extra configuration option disable. Tests in this list will disabled for all modes. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-02 14:01:50 +02:00
Alejo Sanchez	ef170a5088	Raft: Remove tests for now Remove raft C++ tests until raft is included in build process. [tgrabiec]: Fixes test.py failure. Tests are not compiled unless --build-raft is passed to configure.py and we cannot enable it by default yet. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20201002102847.1140775-1-alejo.sanchez@scylladb.com>	2020-10-02 12:42:21 +02:00
Alejo Sanchez	4e26dad3a0	Raft: Remove tests for now Remove raft C++ tests until raft is included in build process. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-02 12:26:05 +02:00
Tomasz Grabiec	864b2c5736	CMakeLists.txt: Add raft directory to source code directories Needed for IDE integration. Not used for building currently. Message-Id: <1601570008-19666-1-git-send-email-tgrabiec@scylladb.com>	2020-10-01 19:38:39 +03:00
Gleb Natapov	3e8dbb3c09	lwt: do not return unavailable exception from the 'learn' stage Unavailable exception means that operation was not started and it can be retried safely. If lwt fails in the learn stage though it most certainly means that its effect will be observable already. The patch returns timeout exception instead which means uncertainty. Fixes #7258 Message-Id: <20201001130724.GA2283830@scylladb.com>	2020-10-01 17:16:52 +02:00
Tomasz Grabiec	ca7f0c61f0	Merge "raft: initial implementation" from Gleb This is the beginning of raft protocol implementation. It only supports log replication and voter state machine. The main difference between this one and the RFC (besides having voter state machine) is that the approach taken here is to implement raft as a deterministic state machine and move all the IO processing away from the main logic. To do that some changes to RPC interface was required: all verbs are now one way meaning that sending a request does not wait for a reply and the reply arrives as a separate message (or not at all, it is safe to drop packets). * scylla-dev/raft-v4: raft: add a short readme file raft: compile raft tests raft: add raft tests raft: Implement log replication and leader election raft: Introduce raft interface header	2020-10-01 17:09:52 +02:00
Konstantin Osipov	9a5f2b87dc	raft: add a short readme file The file has a brief description of the code status, usage and some implementation assumptions.	2020-10-01 14:30:59 +03:00
Gleb Natapov	16cb009ea2	raft: compile raft tests Compilation is not enabled by default as it requires coroutines support and may require special compiler (until distributed one fixes all the bugs related to coroutines). To enable raft tests compilation new configure.py option is added (--build-raft).	2020-10-01 14:30:59 +03:00
Gleb Natapov	4959609589	raft: add raft tests Add test for currently implemented raft features. replication_test tests replication functionality with various initial log configurations. raft_fsm_test test voting state machine functionality.	2020-10-01 14:30:59 +03:00
Gleb Natapov	e1ac1a61c9	raft: Implement log replication and leader election This patch introduces partial RAFT implementation. It has only log replication and leader election support. Snapshotting and configuration change along with other, smaller features are not yet implemented. The approach taken by this implementation is to have a deterministic state machine coded in raft::fsm. What makes the FSM deterministic is that it does not do any IO by itself. It only takes an input (which may be a networking message, time tick or new append message), changes its state and produce an output. The output contains the state that has to be persisted, messages that need to be sent and entries that may be applied (in that order). The input and output of the FSM is handled by raft::server class. It uses raft::rpc interface to send and receive messages and raft::storage interface to implement persistence.	2020-10-01 14:30:59 +03:00
Gleb Natapov	c073997431	raft: Introduce raft interface header This commit introduce public raft interfaces. raft::server represents single raft server instance. raft::state_machine represents a user defined state machine. raft::rpc, raft::rpc_client and raft::storage are used to allow implementing custom networking and storage layers. A shared failure detector interface defines keep-alive semantics, required for efficient implementation of thousands of raft groups.	2020-10-01 14:30:59 +03:00
Piotr Dulikowski	bfbf02a657	transport/config: fix cross-shard use of updateable_value Recently, the cql_server_config::max_concurrent_requests field was changed to be an updateable_value, so that it is updated when the corresponding option in Scylla's configuration is live-reloaded. Unfortunately, due to how cql_server is constructed, this caused cql_server instances on all shards to store an updateable_value which pointed to an updateable_value_source on shard 0. Unsynchronized cross-shard memory operations ensue. The fix changes the cql_server_config so that it holds a function which creates an updateable_value appropriate for the given shard. This pattern is similar to another, already existing option in the config: get_service_memory_limiter_semaphore. This fix can be reverted if updateable_value becomes safe to use across shards. Tests: unit(dev) Fixes: #7310	2020-10-01 14:10:56 +03:00
Etienne Adam	98dc0dc03a	redis: only create required keyspaces/tables The 'redis_database_count' was already existing, but was not used when initializing the keyspaces. This patch merely uses it. I think it's better that way, it seems cleaner not to create 15 x 5 tables when we use only one redis database. Also change a test to test with a higher max number of database. Signed-off-by: Etienne Adam <etienne.adam@gmail.com> Message-Id: <20200930210256.4439-1-etienne.adam@gmail.com>	2020-10-01 10:27:03 +03:00
Wojciech Mitros	e79ad38425	tracing: add username to the session table In order to improve observability, add a username field to the the system_traces.sessions table. The system table should be change while upgrading by running the fix_system_distributed_tables.py script. Until the table is updated, the old behaviour is preserved. Fixes #6737.	2020-10-01 04:46:40 +02:00
Nadav Har'El	d73cf589e7	docs: fix typos in docs/alternator/alternator.md Discovered by running a spell-checker. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200930101046.76710-1-nyh@scylladb.com>	2020-10-01 04:46:40 +02:00
Nadav Har'El	8db01aeeb4	docs: fix typo in alternator/getting-started.md Fix a typo reported by a user. Ran spell-checker to verify there are no other obvious spelling mistakes in that file. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200930084304.74776-1-nyh@scylladb.com>	2020-10-01 04:46:40 +02:00
Avi Kivity	701d24a832	Merge 'Enhance max concurrent requests code' from Piotr Sarna This miniseries enhances the code from #7279 by: * adding metrics for shed requests, which will allow to pinpoint the problem if the max concurrent requests threshold is too low * making the error message more comprehensive by pointing at the variable used to set max concurrent requests threshold Example of an ehanced error message: ``` ConnectionException('Failed to initialize new connection to 127.0.0.1: Error from server: code=1001 [Coordinator node overloaded] message="too many in-flight requests (configured via max_concurrent_requests_per_shard): 18"',)}) ``` Closes #7299 * github.com:scylladb/scylla: transport: make _requests_serving param uint32_t transport: make overloaded error message more descriptive transport: add requests_shed metrics	2020-10-01 04:46:40 +02:00
Benny Halevy	5a250f529f	token_metadata: get rid of unused calculate_pending_ranges_for_* methods They are only called inernally by token_metadata_impl. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:16:23 +03:00
Benny Halevy	41e5a3a245	token_metadata: get rid of clone_after_all_settled It's unused. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:15:11 +03:00
Benny Halevy	105a2f5244	token_metadata_impl: remove_endpoint: do not sort tokens Call sort_tokens at the caller as all call sites from within token_metadata_impl call remove_endpoint for multiple endpoints so the tokens can be re-sorted only once, when done removing all tokens. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:12:32 +03:00
Benny Halevy	86303f4fdd	token_metadata_impl: always sort_tokens in place No need to return the sorted tokens vector as it's always assigned to _sorted_tokens. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:08:56 +03:00
Piotr Sarna	876e9fe51a	transport: make _requests_serving param uint32_t It's not realistic for a shard to have over 4 billion concurrent requests, so this value can be safely represented in 32 bits. Also, since the current concurrency limit is represented in uint32_t, it makes sense for these two to have matching types.	2020-09-30 08:20:52 +02:00
Piotr Sarna	d18f68f1c1	transport: make overloaded error message more descriptive The message now mentions the config variable used to set the limit of max allowed concurrent requests.	2020-09-30 08:20:51 +02:00
Piotr Sarna	792ff3757a	transport: add requests_shed metrics The counter shows a total number of requests shed due to overload.	2020-09-30 08:20:50 +02:00
Avi Kivity	fd1dd0eac7	Merge "Track the memory consumption of reader buffers" from Botond " The last major untracked area of the reader pipeline is the reader buffers. These scale with the number of readers as well as with the size and shape of data, so their memory consumption is unpredictable varies wildly. For example many small rows will trigger larger buffers allocated within the `circular_buffer<mutation_fragment>`, while few larger rows will consume a lot of external memory. This series covers this area by tracking the memory consumption of both the buffer and its content. This is achieved by passing a tracking allocator to `circular_buffer<mutation_fragment>` so that each allocation it makes is tracked. Additionally, we now track the memory consumption of each and every mutation fragment through its whole lifetime. Initially I contemplated just tracking the `_buffer_size` of `flat_mutation_reader::impl`, but concluded that as our reader trees are typically quite deep, this would result in a lot of unnecessary `signal()`/`consume()` calls, that scales with the number of mutation fragments and hence adds to the already considerable per mutation fragment overhead. The solution chosen in this series is to instead track the memory consumption of the individual mutation fragments, with the observation that these are typically always moved and very rarely copied, so the number of `signal()`/`consume()` calls will be minimal. This additional tracking introduces an interesting dilemma however: readers will now have significant memory on their account even before being admitted. So it may happen that they can prevent their own admission via this memory consumption. To prevent this, memory consumption is only forwarded to the semaphore upon admission. This might be solved when the semaphore is moved to the front -- before the cache. Another consequence of this additional, more complete tracking is that evictable readers now consume memory even when the underlying reader is evicted. So it may happen that even though no reader is currently admitted, all memory is consumed from the semaphore. To prevent any such deadlocks, the semaphore now admits a reader unconditionally if no reader is admitted -- that is if all count resources all available. Refs: #4176 Tests: unit(dev, debug, release) " * 'track-reader-buffers/v2' of https://github.com/denesb/scylla: (37 commits) test/manual/sstable_scan_footprint_test: run test body in statement sched group test/manual/sstable_scan_footprint_test: move test main code into separate function test/manual/sstable_scan_footprint_test: sprinkle some thread::maybe_yield():s test/manual/sstable_scan_footprint_test: make clustering row size configurable test/manual/sstable_scan_footprint_test: document sstable related command line arguments mutation_fragment_test: add exception safety test for mutation_fragment::mutate_as_*() test: simple_schema: add make_static_row() reader_permit: reader_resources: add operator== mutation_fragment: memory_usage(): remove unused schema parameter mutation_fragment: track memory usage through the reader_permit reader_permit: resource_units: add permit() and resources() accessors mutation_fragment: add schema and permit partition_snapshot_row_cursor: row(): return clustering_row instead of mutation_fragment mutation_fragment: remove as_mutable_end_of_partition() mutation_fragment: s/as_mutable_partition_start/mutate_as_partition_start/ mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/ mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/ mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/ flat_mutation_reader: make _buffer a tracked buffer mutation_reader: extract the two fill_buffer_result into a single one ...	2020-09-29 16:08:16 +03:00
Pekka Enberg	8f17ca2d1a	scripts/refresh-submodules.sh: Add python3 submodule Message-Id: <20200928075422.377888-1-penberg@scylladb.com>	2020-09-29 16:06:32 +03:00
Yaron Kaikov	d48df44f26	configure.py: build python3, jmx, tools and unified-tar only in relevant dist-{mode} Today when ever we are building scylla in a singel mode we still building jmx, tools and python3 for all dev,release and debug. Let's make sure we build only in relevant build mode Also adding unified-tar to ninja build Closes #7260	2020-09-29 15:41:52 +03:00
Juliusz Stasiewicz	0afa738a8f	tracing: Fix error on slow batches `trace_keyspace_helper::make_slow_query_mutation_data` expected a "query" key in its parameters, which does not appear in case of e.g. batches of prepared statements. This is example of failing `record.parameters`: ``` ...{"query[0]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"}, {"query[1]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"}... ``` In such case Scylla recorded no trace and said: ``` ERROR 2020-09-28 10:09:36,696 [shard 3] trace_keyspace_helper - No "query" parameter set for a session requesting a slow_query_log record ``` Fix here is to leave query empty if not found. The users can still retrieve the query contents from existing info. Fixes #5843 Closes #7293	2020-09-29 13:24:39 +02:00
Asias He	eedcee7f31	gossip: Reduce unncessary VIEW_BACKLOG updates The blacklog of current and max in VIEW_BACKLOG is not update but the nodes are updating VIEW_BACKLOG all the time. For example: ``` INFO 2020-03-06 17:13:46,761 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486026590,718) INFO 2020-03-06 17:13:46,821 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486026531,742) INFO 2020-03-06 17:13:47,765 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486027590,721) INFO 2020-03-06 17:13:47,825 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486027531,745) INFO 2020-03-06 17:13:48,772 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486028590,726) INFO 2020-03-06 17:13:48,833 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486028531,750) INFO 2020-03-06 17:13:49,772 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486029590,729) INFO 2020-03-06 17:13:49,832 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486029531,753) ``` The downside of such updates: - Introduces more gossip exchange traffic - Updates system.peers all the time The extra unnecessary gossip traffic is fine to a cluster in a good shape but when some of the nodes or shards are loaded, such messages and the handling of such messages can make the system even busy. With this patch, VIEW_BACKLOG is updated only when the backlog is really updated. Btw, we can even make the update only when the change of the backlog is great than a threshold, e.g., 5%, which can reduce the traffic even further. Fixes #5970	2020-09-29 13:37:37 +03:00
Avi Kivity	6fdc8f28a9	Update tools/jmx submodule * tools/jmx 45e4f28...25bcd76 (1): > install.sh: stop using symlinks for systemd units on nonroot mode Fixes #7288.	2020-09-29 13:32:45 +03:00
Takuya ASADA	8504332e17	scylla_setup: skip offline warnings on nonroot mode Since most of the scripts requires root privilege, we don't shows up offline warning on nonroot mode. Fixes #7286 Closes #7287	2020-09-29 13:30:13 +03:00
Eliran Sinvani	925cdc9ae1	consistency level: fix wrong quorum calculation whe RF = 0 We used to calculate the number of endpoints for quorum and local_quorum unconditionally as ((rf / 2) + 1). This formula doesn't take into account the corner case where RF = 0, in this situation quorum should also be 0. This commit adds the missing corner case. Tests: Unit Tests (dev) Fixes #6905 Closes #7296	2020-09-29 13:25:41 +03:00
Avi Kivity	6634cbb190	build: detect and allow clang 10 as a compiler While we don't yet fully support clang as a compiler, this at least allows working on it.	2020-09-29 12:48:46 +03:00
Avi Kivity	8bead32be3	build: detect availablity of -Wstack-usage= Detect if the compiler supports -Wstack-usage= and only enable it if supported. Because each mode uses a different threshold, have each mode store the threshold value and merge it into the flags only after we decided we support it. The switch to make it only a warning is made conditional on compiler support.	2020-09-29 12:48:45 +03:00
Takuya ASADA	ba29074c42	install.sh: stop using symlinks for systemd units on nonroot mode On some environment, systemctl enable <service> fails when we use symlink. So just directly copy systemd units to ~/.config/systemd/user, instead of creating symlink. Fixes #7288 Closes #7290	2020-09-29 12:20:41 +03:00
Piotr Sarna	9e5ce5a93c	counters: remove unused 1.7.4 counter order code After cleaning up old cluster features (`253a7640e3`) the code for special handling of 1.7.4 counter order was effectively only used in its own tests, so it can be safely removed. Closes #7289	2020-09-29 12:16:58 +03:00
Avi Kivity	57f377e1fe	Merge 'Add max concurrent requests configuration option to coordinator' from Piotr Sarna This series approaches issue #7072 and provides a very simple mechanism for limiting the number of concurrent CQL requests being served on a shard. Once the limit is hit, new requests will be instantly refused and OverloadedException will be returned to the client. This mechanism has many improvement opportunities: * shedding requests gradually instead of having one hard limit, * having more than one limit per different types of queries (reads, writes, schema changes, ...), * not using a preconfigured value at all, and instead figuring out the limit dynamically, * etc. ... and none of these are taken into account in this series, which only adds a very basic configuration variable. The variable can be updated live without a restart - it can be done by updating the .yaml file and triggering a configuration re-read via sending the SIGHUP signal to Scylla. The default value for this parameter is a very large number, which translates to effectively not shedding any requests at all. Refs #7072 Closes #7279 * github.com:scylladb/scylla: transport: make max_concurrent_requests_per_shard reloadable transport: return exceptional future instead of throwing transport,config: add a param for max request concurrency exceptions: make a single-param constructor explicit exceptions: add a constructor based on custom message	2020-09-29 12:14:03 +03:00
Pekka Enberg	1adf2cc848	Revert "scylla_ntp_setup: use chrony on all distributions" This reverts commit `8366d2231d` because it causes the following "scylla_setup" failure on Ubuntu 16.04: Command: 'sudo /usr/lib/scylla/scylla_setup --nic ens5 --disks /dev/nvme0n1 --swap-directory / ' Exit code: 1 Stdout: Setting up libtomcrypt0:amd64 (1.17-7ubuntu0.1) ... Setting up chrony (2.1.1-1ubuntu0.1) ... Creating '_chrony' system user/group for the chronyd daemon… Creating config file /etc/chrony/chrony.conf with new version Processing triggers for libc-bin (2.23-0ubuntu11.2) ... Processing triggers for ureadahead (0.100.0-19.1) ... Processing triggers for systemd (229-4ubuntu21.29) ... 501 Not authorised NTP setup failed. Stderr: chrony.service is not a native service, redirecting to systemd-sysv-install Executing /lib/systemd/systemd-sysv-install enable chrony Traceback (most recent call last): File "/opt/scylladb/scripts/libexec/scylla_ntp_setup", line 63, in <module> run('chronyc makestep') File "/opt/scylladb/scripts/scylla_util.py", line 504, in run return subprocess.run(cmd, stdout=stdout, stderr=stderr, shell=shell, check=exception, env=scylla_env).returncode File "/opt/scylladb/python3/lib64/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['chronyc', 'makestep']' returned non-zero exit status 1.	2020-09-29 11:23:23 +03:00
Piotr Sarna	4b856cf62d	transport: make max_concurrent_requests_per_shard reloadable This configuration entry is expected to be used as a quick fix for an overloaded node, so it should be possible to reload this value without having to restart the server.	2020-09-29 10:11:36 +02:00
Piotr Sarna	4da8957461	transport: return exceptional future instead of throwing Throwing bears an additional cost, so it's better to simply construct the error in place and return it.	2020-09-29 10:00:30 +02:00
Piotr Sarna	b4db6d2598	transport,config: add a param for max request concurrency The newly introduced parameter - max_concurrent_requests_per_shard - can be used to limit the number of in-flight requests a single coordinator shard can handle. Each surplus request will be immediately refused by returning OverloadedException error to the client. The default value for this parameter is large enough to never actually shed any requests. Currently, the limit is only applied to CQL requests - other frontends like alternator and redis are not throttled yet.	2020-09-29 09:59:30 +02:00
Botond Dénes	2ee026f26f	test/manual/sstable_scan_footprint_test: run test body in statement sched group So that queries are processed in said scheduling group and thus they use the user read concurrency semaphore.	2020-09-28 11:27:49 +03:00
Botond Dénes	272a54b81c	test/manual/sstable_scan_footprint_test: move test main code into separate function	2020-09-28 11:27:49 +03:00
Botond Dénes	29861b068e	test/manual/sstable_scan_footprint_test: sprinkle some thread::maybe_yield():s To avoid stalls.	2020-09-28 11:27:49 +03:00
Botond Dénes	daa9fa72f1	test/manual/sstable_scan_footprint_test: make clustering row size configurable So that large-row workloads can be simulated too.	2020-09-28 11:27:49 +03:00
Botond Dénes	2ff326a41a	test/manual/sstable_scan_footprint_test: document sstable related command line arguments	2020-09-28 11:27:49 +03:00
Botond Dénes	ceb308411c	mutation_fragment_test: add exception safety test for mutation_fragment::mutate_as_*()	2020-09-28 11:27:49 +03:00
Botond Dénes	ceb0b02ee8	test: simple_schema: add make_static_row()	2020-09-28 11:27:49 +03:00
Botond Dénes	63578bf0a7	reader_permit: reader_resources: add operator==	2020-09-28 11:27:49 +03:00
Botond Dénes	256140a033	mutation_fragment: memory_usage(): remove unused schema parameter The memory usage is now maintained and updated on each change to the mutation fragment, so it needs not be recalculated on a call to `memory_usage()`, hence the schema parameter is unused and can be removed.	2020-09-28 11:27:47 +03:00
Botond Dénes	041d71bd6f	mutation_fragment: track memory usage through the reader_permit The memory usage of mutation fragments is now tracked through its lifetime through a reader permit. This was the last major (to my current knowledge) untracked piece of the reader pipeline.	2020-09-28 11:27:29 +03:00
Botond Dénes	52662f17ea	reader_permit: resource_units: add permit() and resources() accessors	2020-09-28 11:27:29 +03:00
Botond Dénes	6ca0464af5	mutation_fragment: add schema and permit We want to start tracking the memory consumption of mutation fragments. For this we need schema and permit during construction, and on each modification, so the memory consumption can be recalculated and pass to the permit. In this patch we just add the new parameters and go through the insane churn of updating all call sites. They will be used in the next patch.	2020-09-28 11:27:23 +03:00
Botond Dénes	54357221f0	partition_snapshot_row_cursor: row(): return clustering_row instead of mutation_fragment It is what its callers want anyway.	2020-09-28 10:53:56 +03:00
Botond Dénes	1e6285d776	mutation_fragment: remove as_mutable_end_of_partition() There is nothing to mutate on a partition_end fragment.	2020-09-28 10:53:56 +03:00
Botond Dénes	5079b9ccf1	mutation_fragment: s/as_mutable_partition_start/mutate_as_partition_start/ We will soon want to update the memory consumption of mutation fragment after each modification done to it, to do that safely we have to forbid direct access to the underlying data and instead have callers pass a lambda doing their modifications. Uses where this method was just used to move the fragment away are converted to use `as_mutation_start() &&`.	2020-09-28 10:53:56 +03:00
Botond Dénes	72a88e0257	mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/ We will soon want to update the memory consumption of mutation fragment after each modification done to it, to do that safely we have to forbid direct access to the underlying data and instead have callers pass a lambda doing their modifications. Uses where this method was just used to move the fragment away are converted to use `as_range_tombstone() &&`.	2020-09-28 10:53:56 +03:00
Botond Dénes	4f5ccf82cb	mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/ We will soon want to update the memory consumption of mutation fragment after each modification done to it, to do that safely we have to forbid direct access to the underlying data and instead have callers pass a lambda doing their modifications. Uses where this method was just used to move the fragment away are converted to use `as_clustering_row() &&`.	2020-09-28 10:53:56 +03:00
Botond Dénes	f2b9cad4c6	mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/ We will soon want to update the memory consumption of mutation fragment after each modification done to it, to do that safely we have to forbid direct access to the underlying data and instead have callers pass a lambda doing their modifications. Uses where this method was just used to move the fragment away are converted to use `as_static_row() &&`.	2020-09-28 10:53:56 +03:00
Botond Dénes	0518571e56	flat_mutation_reader: make _buffer a tracked buffer Via a tracked_allocator. Although the memory allocations made by the _buffer shouldn't dominate the memory consumption of the read itself, they can still be a significant portion that scales with the number of readers in the read.	2020-09-28 10:53:56 +03:00
Botond Dénes	77ea44cb73	mutation_reader: extract the two fill_buffer_result into a single one Currently we have two, nearly identical definitions of said struct. Extract it to a common definition and rename it to `remote_fill_buffer_result`.	2020-09-28 10:53:56 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Pekka Enberg	068b1e3470	Update tools/python3 submodule * tools/python3 b4e52ee...cfa27b3 (1): > build: support passing product-version-release as a parameter	2020-09-28 10:53:11 +03:00
Piotr Sarna	58ae0c5208	exceptions: make a single-param constructor explicit ... since it's good practice.	2020-09-28 09:16:31 +02:00
Piotr Sarna	b0737542f2	exceptions: add a constructor based on custom message OverloadedException was historically only used when the number of in-flight hints got too high. The other constructor will be useful for using OverloadedException in other scenarios.	2020-09-28 09:16:31 +02:00
Botond Dénes	c1215592da	reader_permit: introduce tracking_allocator This can be used with standard containers and other containers that use the std::allocator interface to track the allocations made by them via a reader_permit.	2020-09-28 08:46:22 +03:00
Botond Dénes	f10abf6e35	reader_permit: reader_resources: add with_memory() factory function To make creating reader resource with just memory more convenient and more readable at the same time.	2020-09-28 08:46:22 +03:00
Botond Dénes	4c8ab10563	reader_permit: only forward resource consumption to semaphore after admission In the next patches we plan to start tracking the memory consumption of the actual allocations made by the circular_buffer<mutation_fragment>, as well as the memory consumed by the mutation fragments. This means that readers will start consuming memory off the permit right after being constructed. Ironically this can prevent the reader from being admitted, due to its own pre-admission memory consumption. To prevent this hold on forwarding the memory consumption to the semaphore, until the permit is actually admitted.	2020-09-28 08:46:22 +03:00
Botond Dénes	e1eee0dc34	reader_permit: track resource consumed through permit Track all resources consumed through the permit inside the permit. This allows querying how much memory each read is consuming (as there should be one read per permit). Although this might be interesting, especially when debugging OOM cores, the real reason we are doing this is to be able forward resource consumption to the semaphore only post-admission. More on this in the patch introducing this. Another advantage of tracking resources consumed through the permit is that now we can detect resource leaks in the permit destructor and report them. Even if it is just a case of the holder of the resources wanting to release the resources later, with the permit destroyed it will cause use-after-free.	2020-09-28 08:46:22 +03:00
Botond Dénes	cd953a36fd	reader_permit: move internals to impl In the next patches the reader permit will gain members that are shared across all instances of the same permit. To facilitate this move all internals into an impl class, of which the permit stores a shared pointer. We use a shared_ptr to avoid defining `impl` in the header. This is how the reader permit started in the beginning. We've done a full circle. :)	2020-09-28 08:46:22 +03:00
Botond Dénes	12372731cb	reader_permit: add consume()/signal() And do all consuming and signalling through these methods. These operations will soon be more involved than the simple forwarding they do today, so we want to centralize them to a single method pair.	2020-09-28 08:46:22 +03:00
Botond Dénes	375815e650	reader_permit::resource_units: store permit instead of semaphore In the next patches we want to introduce per-permit resource tracking -- that is, have each permit track the amount of resource consumed through it. For this, we need all consumption to happen through a permit, and not directly with the semaphore.	2020-09-28 08:46:22 +03:00
Botond Dénes	04d83f6678	reader_permit: move resource_units declaration outside the reader_permit class In the next patch we want to store a `reader_permit` instance inside `resource_units` so a full definition of the former must be available.	2020-09-28 08:46:22 +03:00
Botond Dénes	0fe75571d9	reader_concurrency_semaphore: admit one read if no reader is active To ensure progress at all times. This is due to evictable readers, who still hold on to a buffer even when their underlying reader is evicted. As we are introducing buffer and mutation fragment tracking in the next patches, these readers will hold on to memory even in this state, so it may theoretically happen that even though no readers are admitted (all count resources all available) no reader can be admitted due to lack of memory. To prevent such deadlocks we now always admit one reader if all count resource are available.	2020-09-28 08:46:22 +03:00
Botond Dénes	ef0b279c80	reader_concurrency_semaphore: move may_proceed() out-of-line They are only used in the .cc anyway.	2020-09-28 08:46:22 +03:00
Botond Dénes	d692993bdc	mutation_reader_test: test_multishard_combining_reader_non_strictly_monotonic_positions: reset size between buffer fills Current code uses a single counter to produce multiple buffer worth of data. This uses carry-on from on buffer to the other, which happens to work with the current memory accounting but is very fragile. Account each buffer separately, resetting the counter between them.	2020-09-28 08:46:22 +03:00
Botond Dénes	7e909671f4	view_build_test: test_view_update_generator_deadlock: release semaphore resources The test consumes all resources off the semaphore, leaving just enough to admit a single reader. However this amount is calculated based on the base cost of readers, but as we are going to track reader buffers as well, the amount of memory consumed will be much less predictable. So to make sure background readers can finish during shutdown, release all the consumed resources before leaving scope.	2020-09-28 08:46:22 +03:00
Botond Dénes	122ab1aabd	view_build_test: test_view_update_generator_buffering: fail the test early on exceptions No point in continuing processing the entire buffer once a failure was found. Especially that an early failure might introduce conditions that are not handled in the normal flow-path. We could handle these but there is no point in this added complexity, at this point the test is failed anyway.	2020-09-28 08:46:22 +03:00
Botond Dénes	99388590da	querier_cache_test: test_resources_based_cache_eviction: use semaphore::consume() to drain semaphore It is much more reliable and simple this way, than playing with `reader_permit::wait_for_admission()`.	2020-09-28 08:46:22 +03:00
Botond Dénes	3c73cc2a4e	tests: prepare for permit forwarding consumption post admission Some tests rely on `consume*()` calls on the permit to take effect immediately. Soon this will only be true once the permit has been admitted, so make sure the permit is admitted in these tests.	2020-09-28 08:46:22 +03:00
Botond Dénes	5e5c94b064	test/lib/reader_lifecycle_policy: don't destroy reader context eagerly Currently per-shard reader contexts are cleaned up as soon as the reader itself is destroyed. This causes two problems: * Continuations attached to the reader destroy future might rely on stuff in the context being kept alive -- like the semaphore. * Shard 0's semaphore is special as it will be used to account buffers allocated by the multishard reader itself, so it has to be alive until after all readers are destroyed. This patch changes this so that contexts are destroyed only when the lifecycle policy itself is destroyed.	2020-09-28 08:46:22 +03:00
Takuya ASADA	8366d2231d	scylla_ntp_setup: use chrony on all distributions To simplify scylla_ntp_setup, use chrony on all distributions.	2020-09-27 12:30:02 +03:00
Rafael Ávila de Espíndola	2093efceab	build: Upgrade to seastar API level 5 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200923202424.216444-1-espindola@scylladb.com>	2020-09-26 11:07:49 +03:00
Avi Kivity	36d93f586a	Update seastar submodule * seastar e215023c7...292ba734b (4): > future: Fix move of futures of reference type > doc: fix hyper link to tutorial.html > tutorial: fix formatting of code block > README.md: fix the formatting of table	2020-09-25 21:54:44 +03:00
Tomasz Grabiec	97c99ea9f3	Merge "evictable_reader: validate buffer on reader recreation" from Botond The reader recreation mechanism is a very delicate and error-prone one, as proven by the countless bugs it had. Most of these bugs were related to the recreated reader not continuing the read from the expected position, inserting out-of-order fragments into the stream. This patch adds a defense mechanism against such bugs by validating the start position of the recreated reader. The intent is to prevent corrupt data from getting into the system as well as to help catch these bugs as close to the source as possible. Fixes: #7208 Tests: unit(dev), mutation_reader_test:debug (v4) * botond/evictable-reader-validate-buffer/v5: mutation_reader_test: add unit test for evictable reader self-validation evictable_reader: validate buffer after recreation the underlying evictable_reader: update_next_position(): only use peek'd position on partition boundary mutation_reader_test: add unit test for evictable reader range tombstone trimming evictable_reader: trim range tombstones to the read clustering range position_in_partition_view: add position_in_partition_view before_key() overload flat_mutation_reader: add buffer() accessor	2020-09-25 17:02:51 +02:00
Takuya ASADA	eae2aa58fa	dist/common/scripts: move back get_set_nic_and_disks_config_value to scylla_util.py The function mistakenly moved to scylla_sysconfig_setup but it also referenced from scylla_prepare, move back to scylla_util.py Fixes #7276 Closes #7280	2020-09-25 13:05:43 +03:00
Botond Dénes	076c27318b	mutation_reader_test: add unit test for evictable reader self-validation Add both positive (where the validation should succeed) and negative (where the validation should fail) tests, covering all validation cases.	2020-09-25 12:09:01 +03:00
Botond Dénes	0b0ae18a14	evictable_reader: validate buffer after recreation the underlying The reader recreation mechanism is a very delicate and error-prone one, as proven by the countless bugs it had. Most of these bugs were related to the recreated reader not continuing the read from the expected position, inserting out-of-order fragments into the stream. This patch adds a defense mechanism against such bugs by validating the start position of the recreated reader. Several things are checked: * The partition is the expected one -- the one we were in the middle of or the next if we stopped at partition boundaries. * The partition is in the read range. * The first fragment in the partition is the expected one -- has a an equal or larger position than the next expected fragment. * The fragment is in the clustering range as defined by the slice. As these validations are only done on the slow-path of recreating an evicted reader, no performance impact is expected.	2020-09-25 12:09:00 +03:00
Botond Dénes	91020eef73	evictable_reader: update_next_position(): only use peek'd position on partition boundary `evictable_reader::update_next_position()` is used to record the position the reader will continue from, in the next buffer fill. This position is used to create the partition slice when the underlying reader is evicted and has to be recreated. There is an optimization in this method -- if the underlying's buffer is not empty we peek at the first fragment in it and use it as the next position. This is however problematic for buffer validation on reader recreation (introduced in the next patch), because using the next row's position as the next pos will allow for range tombstones to be emitted with before_key(next_pos.key()), which will trigger the validation. Instead of working around this, just drop this optimization for mid-partition positions, it is inconsequential anyway. We keep it for where it is important, when we detect that we are at a partition boundary. In this case we can avoid reading the current partition altogether when recreating the reader.	2020-09-25 12:09:00 +03:00
Botond Dénes	d1b0573e1c	mutation_reader_test: add unit test for evictable reader range tombstone trimming	2020-09-25 12:09:00 +03:00
Botond Dénes	4f2e7a18e2	evictable_reader: trim range tombstones to the read clustering range Currently mutation sources are allowed to emit range tombstones that are out-of the clustering read range if they are relevant to it. For example a read of a clustering range [ck100, +inf), might start with: range_tombstone{start={ck1, -1}, end={ck200, 1}}, clustering_row{ck100} The range tombstone is relevant to the range and the first row of the range so it is emitted as first, but its position (start) is outside the read range. This is normally fine, but it poses a problem for evictable reader. When the underlying reader is evicted and has to be recreated from a certain clustering position, this results in out-of-order mutation fragments being inserted into the middle of the stream. This is not fine anymore as the monotonicity guarantee of the stream is violated. The real solution would be to require all mutation sources to trim range tombstones to their read range, but this is a lot of work. Until that is done, as a workaround we do this trimming in the evictable reader itself.	2020-09-25 12:09:00 +03:00
Botond Dénes	d7d93aef49	position_in_partition_view: add position_in_partition_view before_key() overload	2020-09-25 12:09:00 +03:00
Avi Kivity	f1fcf4f139	Update seastar submodule * seastar 9ae33e67e1...e215023c78 (4): > future: Make futures non variadic > on_internal_error: add noexcept variant > Convert another std::result_of to std::invoke_result > reactor: remove unused declaration abort_on_error()	2020-09-24 20:04:03 +03:00
Tomasz Grabiec	14fdd2f501	Merge "Gossip echo message improvement" from Asias This series improves gossip echo message handling in a loaded cluster. Refs: #7197 * git://github.com/asias/scylla.git gossip_echo_improve_7197: gossiper: Handle echo message on any shard gossiper: Increase echo message timeout gossiper: Remove unused _last_processed_message_at	2020-09-24 15:13:55 +02:00
Pekka Enberg	84a0aca666	configure.py: Rename "mode" to "checkheaders_mode" The "mode" variable name is used everywhere, usually in a loop. Therefore, rename the global "mode" to "checkheaders_mode" so that if your code block happens to be outside of a loop, you don't accidentally use the globally visible "mode" and spend hours debugging why it's always "dev". Spotted by Yaron Kaikov. Message-Id: <20200924112237.315817-1-penberg@scylladb.com>	2020-09-24 15:00:49 +03:00
Nadav Har'El	e1c42f2bb3	scripts/pull_github_pr.sh: show titles of more than 20 patches The script pull_github_pr.sh uses git merge's "--log" option to put in the merge commit the list of titles of the individual patches being merged in. This list is useful when later searching the log for the merge which introduced a specific feature. Unfortunately, "--log" defaults to cutting off the list of commit titles at 20 lines. For most merges involving fewer than 20 commits, this makes no difference. But some merges include more than 20 commits, and get a truncated list, for no good reason. If someone worked hard to create a patch set with 40 patches, the last thing we should be worried about is that the merge commit message will be 20 lines longer. Unfortunately, there appears to be no way to tell "--log" to not limit the length at all. So I chose an arbitrary limit of 1000. I don't think we ever had a patch set in Scylla which exceeded that limit. Yet :-) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200924114403.817893-1-nyh@scylladb.com>	2020-09-24 14:51:58 +03:00
Piotr Dulikowski	39771967bb	hinted handoff: fix race - decomission vs. endpoint mgr init This patch fixes a race between two methods in hints manager: drain_for and store_hint. The first method is called when a node leaves the cluster, and it 'drains' end point hints manager for that node (sends out all hints for that node). If this method is called when the local node is being decomissioned or removed, it instead drains hints managers for all endpoints. In the case of decomission/remove, drain_for first calls parallel_for_each on all current ep managers and tells them to drain their hints. Then, after all of them complete, _ep_managers.clear() is called. End point hints managers are created lazily and inserted into _ep_managers map the first time a hint is stored for that node. If this happens between parallel_for_each and _ep_managers.clear() described above, the clear operation will destroy the new ep manager without draining it first. This is a bug and will trigger an assert in ep manager's destructor. To solve this, a new flag for the hints manager is added which is set when it drains all ep managers on removenode/decommission, and prevents further hints from being written. Fixes #7257 Closes #7278	2020-09-24 14:51:24 +03:00
Nadav Har'El	a5369881b3	Merge 'sstables: make sstable_manager control the lifetime of the sstables it manages' from Avi Kivity Currently, sstable_manager is used to create sstables, but it loses track of them immediately afterwards. This series makes an sstable's life fully contained within its sstable_manager. The first practical impact (implemented in this series) is that file removal stops being a background job; instead it is tracked by the sstable_manager, so when the sstable_manager is stopped, you know that all of its sstable activity is complete. Later, we can make use of this to track the data size on disk, but this is not implemented here. Closes #7253 * github.com:scylladb/scylla: sstables: remove background_jobs(), await_background_jobs() sstables: make sstables_manager take charge of closing sstables test: test_env: hold sstables_manager with a unique_ptr test: drop test_sstable_manager test: sstables::test_env: take ownership of manager test: broken_sstable_test: prepare for asynchronously closed sstables_manager test: sstable_utils: close test_env after use test: sstable_test: dont leak shared_sstable outside its test_env's lifetime test: sstables::test_env: close self in do_with helpers test: perf/perf_sstable.hh: prepare for asynchronously closed sstables_manager test: view_build_test: prepare for asynchronously closed sstables_manager test: sstable_resharding_test: prepare for asynchronously closed sstables_manager test: sstable_mutation_test: prepare for asynchronously closed sstables_manager test: sstable_directory_test: prepare for asynchronously closed sstables_manager test: sstable_datafile_test: prepare for asynchronously closed sstables_manager test: sstable_conforms_to_mutation_source_test: remove references to test_sstables_manager test: sstable_3_x_test: remove test_sstables_manager references test: schema_changes_test: drop use of test_sstables_manager mutation_test: adjust for column_family_test_config accepting an sstables_manager test: lib: sstable_utils: stop using test_sstables_manager test: sstables test_env: introduce manager() accessor test: sstables test_env: introduce do_with_async_sharded() test: sstables test_env: introduce do_with_async_returning() test: lib: sstable test_env: prepare for life as a sharded<> service test: schema_changes_test: properly close sstables::test_env test: sstable_mutation_test: avoid constructing temporary sstables::test_env test: mutation_reader_test: avoid constructing temporary sstables::test_env test: sstable_3_x_test: avoid constructing temporary sstables::test_env test: lib: test_services: pass sstables_manager to column_family_test_config test: lib: sstables test_env: implement tests_env::manager() test: sstable_test: detemplate write_and_validate_sst() test: sstable_test_env: detemplate do_with_async() test: sstable_datafile_test: drop bad 'return' table: clear sstable set when stopping table: prevent table::stop() race with table::query() database: close sstable_manager:s sstables_manager: introduce a stub close() sstable_directory_test: fix threading confusion in make_sstable_directory_for*() functions test: sstable_datafile_test: reorder table stop in compaction_manager_test test: view_build_test: test_view_update_generator_register_semaphore_unit_leak: do not discard future in timer test: view_build_test: fix threading in test_view_update_generator_register_semaphore_unit_leak view: view_update_generator: drop references to sstables when stopping	2020-09-24 13:54:38 +03:00
Botond Dénes	3bb25eefb6	reader_permit: remove unused release() method Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200924090040.240906-1-bdenes@scylladb.com>	2020-09-24 12:28:00 +03:00
Avi Kivity	d45df0c705	Update tools/java submodule * tools/java 583fc658df...4313155ab6 (1): > enable gettraceprobability	2020-09-24 12:19:29 +03:00
Asias He	88b7587755	gossiper: Handle echo message on any shard Echo message does not need to access gossip internal states, we can run it on all shards and avoid forwarding to shard zero. This makes gossip marking node up more robust when shard zero is loaded. There is an argument that we should make echo message return only when all shards have responded so that all shards are live and responding. However, in a heavily loaded cluster, one shard might be overloaded on multiple nodes in the cluster at the same time. If we require echo response on all shards, we have a chance local node will mark all peer nodes as down. As a result, the whole cluster is down. This is much worse than not excluding a node with a slow shard from a cluster. Refs: #7197	2020-09-24 10:10:54 +08:00
Asias He	c7cb638e95	gossiper: Increase echo message timeout Gossip echo message is used to confirm a node is up. In a heavily loaded slow cluster, a node might take a long time to receive a heart beat update, then the node uses the echo message to confirm the peer node is really up. If the echo message timeout too early, the peer node will not be marked as up. This is bad because a live node is marked as down and this could happen on multiple nodes in the cluster which causes cluster wide unavailability issue. In order to prevent multiple nodes to marked as down, it is better to be conservative and less restrictive on echo message timeout. Note, echo message is not used to detect a node down. Increasing the echo timeout does not have any impact on marking a node down in a timely manner. Refs: #7197	2020-09-24 09:50:09 +08:00
Asias He	173d115a64	gossiper: Remove unused _last_processed_message_at It is not used any more. We can get rid of it. Refs: #7197	2020-09-24 09:48:54 +08:00
Avi Kivity	2bd264ec6a	sstables: remove background_jobs(), await_background_jobs() There are no more users for registering background jobs, so remove the mechanism and the remaining calls.	2020-09-23 20:55:17 +03:00
Avi Kivity	5db96170a5	sstables: make sstables_manager take charge of closing sstables Currently, closing sstables happens from the sstable destructor. This is problematic since a destructor cannot wait for I/O, so we launch the file close process in the background. We therefore lose track of when the closing actually takes place. This patch makes sstables_manager take charge of the close process. Every sstable is linked into one of two intrusive lists in its manager: _active or _undergoing_close. When the reference count of the sstable drops to zero, we move it from _active to _undergoing_close and begin closing the files. sstables_manager remembers all closes and when sstables_manager::close() is called, it waits for all of them to complete. Therefore, sstables_manager::close() allows us to know that all files it manages are closed (and deleted if necessary). The sstables_manager also gains a destructor, which disables move construction.	2020-09-23 20:55:17 +03:00
Avi Kivity	ad8620c289	test: test_env: hold sstables_manager with a unique_ptr sstables_manager should not be movable (since sstables hold a reference to it). A following patch will enforce it. Prepare by using unique_ptr to hold test_env::_manager. Right now, we'll invoke sstables_manager move construction when creating a test_env with do_with(). We could have chosen to update sstables when their sstables_manager is moved, but we get nothing for the complexity.	2020-09-23 20:55:16 +03:00
Avi Kivity	fd61ebb095	test: drop test_sstable_manager With no users left (apart from some variants of column_family_test_config which are removed in this patch) there are no more users, so remove it. test_sstable_manager is obstructs sstables_manager from taking charge of sstables ownership, since it a thread-local object. We can't close it, since it will be used in the next test to run.	2020-09-23 20:55:16 +03:00
Avi Kivity	d4c1b62f81	test: sstables::test_env: take ownership of manager Instead of using test_sstables_manager, which we plan to drop, carry our own sstables_manager in test_env, and close it when test_env::stop() is called.	2020-09-23 20:55:16 +03:00
Avi Kivity	4b24e58858	test: broken_sstable_test: prepare for asynchronously closed sstables_manager Instead of using an asynchronously-constructed and destroyed test_env, obtain one using test_env::do_with(), which is prepared for async close.	2020-09-23 20:55:15 +03:00
Avi Kivity	8c3ae648d9	test: sstable_utils: close test_env after use test_env will soon manage its sstable_manager's lifetime, which requires closing, so close the test_env.	2020-09-23 20:55:15 +03:00
Avi Kivity	67a887110d	test: sstable_test: dont leak shared_sstable outside its test_env's lifetime do_write_sst() creates a test_env, creates a shared_sstable using that test_env, and destroys the test_env, and returns the sstable. This works now but will stop working once sstable_manager becomes responsible for sstable lifetime. Fortunately, do_write_sst() has one caller that isn't interested in the return value at all, so fold it into that caller.	2020-09-23 20:55:15 +03:00
Avi Kivity	1dd3079d67	test: sstables::test_env: close self in do_with helpers The test_env::do_with() are convenient for creating a scope containing a test_env. Prepare them for asynchronously closed sstables_manager by closing the test_env after use (which will, in the future, close the embedded sstables_manager).	2020-09-23 20:55:14 +03:00
Avi Kivity	5e292897df	test: perf/perf_sstable.hh: prepare for asynchronously closed sstables_manager Obtain a test_env using do_with_async_returning(); and pass it to column_family_test_config so it can stop using test_sstables_manager.	2020-09-23 20:55:14 +03:00
Avi Kivity	16e6abfa27	test: view_build_test: prepare for asynchronously closed sstables_manager Stop using test_sstables_manager, which is going away. Instead obtain a managed sstables_manager via cql_test_env.	2020-09-23 20:55:14 +03:00
Avi Kivity	97b36c38e6	test: sstable_resharding_test: prepare for asynchronously closed sstables_manager Close an explicity-created test_env, and stop using test_sstables_manager which will disappear.	2020-09-23 20:55:13 +03:00
Avi Kivity	a2c4f65c63	test: sstable_mutation_test: prepare for asynchronously closed sstables_manager sstables_manager will soon be closed asynhronously, with a future-returning close() function. To prepare for that, make the following changes - replace test_sstables_manager with an sstables_manager obtained from test_env - drop unneeded calls to await_background_jobs() These changes allow lifetime management of the sstables_manager used in the tests to be centralized in test_env.	2020-09-23 20:55:13 +03:00
Avi Kivity	f671aa60f3	test: sstable_directory_test: prepare for asynchronously closed sstables_manager sstables_manager will soon be closed asynhronously, with a future-returning close() function. To prepare for that, make the following changes - acquire a test_env with test_env::do_with() (or the sharded variant) - change the sstable_from_existing_file function to be a functor that works with either cql_test_env or test_env (as this is what individual tests want); drop use of test_sstables_manager - change new_sstable() to accept a test_env instead of using test_sstables_manager - replace test_sstables_manager with an sstables_manager obtained from test_env These changes allow lifetime management of the sstables_manager used in the tests to be centralized in test_env.	2020-09-23 20:55:12 +03:00
Avi Kivity	3976066156	test: sstable_datafile_test: prepare for asynchronously closed sstables_manager sstables_manager will soon be closed asynhronously, with a future-returning close() function. To prepare for that, make the following changes - replace on-stack test_env with test_env::do_with() - use the variant of column_family_for_tests that accepts an sstables_manager - replace test_sstables_manager with an sstables_manager obtained from test_env These changes allow lifetime management of the sstables_manager used in the tests to be centralized in test_env. Since test_env now calls await_background_jobs on termination, those calls are dropped.	2020-09-23 20:55:12 +03:00
Avi Kivity	d8c82312e0	test: sstable_conforms_to_mutation_source_test: remove references to test_sstables_manager Use the sstables_manager from test_env. Use do_with_async() to create the test_env, to allow for proper closing. Since do_with_async() also takes care of await_background_jobs(), remove that too.	2020-09-23 20:55:12 +03:00
Avi Kivity	28078928ee	test: sstable_3_x_test: remove test_sstables_manager references test_sstables_manager is going away, so replace it by test_env::manager(). column_family_test_config() has an implicit reference to test_sstables_manager, so pass test_env::manager() as a parameter. Calls to await_background_jobs() are removed, since test_env::stop() performs the same task. The large rows tests are special, since they use a custom sstables_manager, so instead of using a test_env, they just close their local sstables_manager.	2020-09-23 20:55:11 +03:00
Avi Kivity	b19b72455b	test: schema_changes_test: drop use of test_sstables_manager It is going away, so get the manager from the test_env object (more accurate anyway).	2020-09-23 20:55:11 +03:00
Avi Kivity	0134e2f436	mutation_test: adjust for column_family_test_config accepting an sstables_manager Acquire a test_env and extract an sstables_manager from that, passing it to column_familty_test_config, in preparation for losing the default constructor of column_familty_test_config.	2020-09-23 20:55:11 +03:00
Avi Kivity	85087478fc	test: lib: sstable_utils: stop using test_sstables_manager It will be retured soon. Extract the sstable_manager from the sstable itself.	2020-09-23 20:55:10 +03:00
Avi Kivity	f9aa50dcbf	test: sstables test_env: introduce manager() accessor This returns the sstables_manager carried by the test_env. We will soon retire the global test_sstables_manager, so we need to provide access to one.	2020-09-23 20:55:10 +03:00
Avi Kivity	9399f06e86	test: sstables test_env: introduce do_with_async_sharded() Some tests need a test_env across multiple shard. Introduce a variant of do_with_async() that supplies it.	2020-09-23 20:55:10 +03:00
Avi Kivity	a8e7c04fc9	test: sstables test_env: introduce do_with_async_returning() Similar to do_with_async(), but returning a non-void return type. Will be used in test/perf.	2020-09-23 20:55:09 +03:00
Avi Kivity	784d29a75b	test: lib: sstable test_env: prepare for life as a sharded<> service Some tests need a sharded sstables_manager, prepare for that by adding a stop() method and helpers for creating a sharded service. Since test_env doesn't yet contain its own sstable_manager, this can't be used in real life yet.	2020-09-23 20:55:09 +03:00
Avi Kivity	d6bf27be9e	test: schema_changes_test: properly close sstables::test_env sstables::test_env needs to be properly closed (and will soon need it even more). Use test_env::do_with_async() to do that. Removed await_background_jobs(), which is now done by test_env::close().	2020-09-23 20:55:08 +03:00
Avi Kivity	e98e5e0a52	test: sstable_mutation_test: avoid constructing temporary sstables::test_env A test_env contains an sstables_manager, which will soon have a close() method. As such, it can no longer be a temporary. Switch to using test_env::do_with_async(). As a bonus, test_env::do_with_async() performs await_background_jobs() for us, so we can drop it from the call sites.	2020-09-23 20:55:08 +03:00
Avi Kivity	6fd4601cf8	test: mutation_reader_test: avoid constructing temporary sstables::test_env A test_env contains an sstables_manager, which will soon have a close() method. As such, it can no longer be a temporary. Switch to using test_env::do_with_async().	2020-09-23 20:55:08 +03:00
Avi Kivity	15963e1144	test: sstable_3_x_test: avoid constructing temporary sstables::test_env A test_env contains an sstables_manager, which will soon have a close() method. As such, it can no longer be a temporary. Switch to using test_env::do_with_async(). As a bonus, test_env::do_with_async() performs await_background_jobs() for us, so we can drop it from the call sites.	2020-09-23 20:55:07 +03:00
Avi Kivity	0fbdb009d5	test: lib: test_services: pass sstables_manager to column_family_test_config Since we're dropping test_sstables_manager, we'll require callers to pass it to column_family_test_config, so provide overloads that accept it. The original overloads (that don't accept an sstables_manager) remain for the transition period.	2020-09-23 20:55:07 +03:00
Avi Kivity	72c13199d8	test: lib: sstables test_env: implement tests_env::manager() Some tests are now referencing the global test_sstables_manager, which we plan to remove. Add test_env::manager() as a way to reference the sstables_manager that the test_env contains.	2020-09-23 20:55:07 +03:00
Avi Kivity	437e131aef	test: sstable_test: detemplate write_and_validate_sst() Reduce code bloat and improve error messages by replacing a template with noncopyable_function<>.	2020-09-23 20:55:06 +03:00
Avi Kivity	956cd9ee8d	test: sstable_test_env: detemplate do_with_async() Reduce code bloat and improve error messages by using noncopyable_function<> instead of a template.	2020-09-23 20:55:06 +03:00
Avi Kivity	1c1a737eda	test: sstable_datafile_test: drop bad 'return' The pattern return function_returning_a_future().get(); is legal, but confusing. It returns an unexpected std::tuple<>. Here, it doesn't do any harm, but if we try to coerce the surrounding code into a signature (void ()), then that will fail. Remove the unneeded and unexpected return.	2020-09-23 20:55:06 +03:00
Avi Kivity	88ea02bfeb	table: clear sstable set when stopping Drop references to a table's sstables when stopping it, so that the sstable_manager can start deleting it. This includes staging sstables. Although the table is no longer in use at this time, maintain cache synchronity by calling row_cache::invalidate() (this also has the benefit of avoiding a stall in row_cache's destructor). We also refresh the cache's view of the sstable set to drop the cache's references.	2020-09-23 20:55:05 +03:00
Avi Kivity	9932e6a899	table: prevent table::stop() race with table::query() Take the gate in table::query() so that stop() waits for queries. The gate is already waited for in table::stop(). This allows us to know we are no longer using the table's sstables in table::stop().	2020-09-23 20:55:05 +03:00
Avi Kivity	9f886f303c	database: close sstable_manager:s The database class owns two sstable_manager:s - one for user sstables and one for system sstables. Now that they have a close() method, call it.	2020-09-23 20:55:05 +03:00
Avi Kivity	a90a511d36	sstables_manager: introduce a stub close() sstables_manager is going to take charge of its sstables lifetimes, so it will need a close() to wait until sstables are deleted. This patch adds sstables_manager::close() so that the surrounding infrastructure can be wired to call it. Once that's done, we can make it do the waiting.	2020-09-23 20:55:04 +03:00
Avi Kivity	0de2c55f95	sstable_directory_test: fix threading confusion in make_sstable_directory_for() functions The make_sstable_directory_for() functions run in a thread, and call functions that run in a thread, but return a future. This more or less works but is a dangerous construct that can fail. Fix by returning a regular value.	2020-09-23 20:55:04 +03:00
Avi Kivity	c27c2a06bb	test: sstable_datafile_test: reorder table stop in compaction_manager_test Stopping a table will soon close its sstables; so the next check will fail as the number of sstables for the table will be zero. Reorder the stop() call to make it safe. We don't need the stop() for the check, since the previous loop made sure compactions completed.	2020-09-23 20:55:03 +03:00
Avi Kivity	fd1c201ed4	test: view_build_test: test_view_update_generator_register_semaphore_unit_leak: do not discard future in timer test_view_update_generator_register_semaphore_unit_leak creates a continuation chain inside a timer, but does not wait for it. This can result in part of the chain being executed after its captures have been destroyed. This is unlikely to happen since the timer fires only if the test fails, and tests never fail (at least in the way that one expects). Fix by waiting for that future to complete before exiting the thread.	2020-09-23 20:55:03 +03:00
Avi Kivity	33c9563dc9	test: view_build_test: fix threading in test_view_update_generator_register_semaphore_unit_leak test_view_update_generator_register_semaphore_unit_leak uses a thread function in do_with_cql_env(), even though the latter doesn't promise a thread and accepts a regular function-returning-a-future. It happens to work because the function happens to be called in a thread, but this isn't guaranteed. Switch to do_with_cql_env, which guarantees a thread context.	2020-09-23 20:55:03 +03:00
Avi Kivity	844b675520	view: view_update_generator: drop references to sstables when stopping sstable_manager will soon wait for all sstables under its control to be deleted (if so marked), but that can't happen if someone is holding on to references to those sstables. To allow sstables_manager::stop() to work, drop remaining queued work when terminating.	2020-09-23 20:55:02 +03:00
Nadav Har'El	a2cc599a2a	scripts/pull_github_pr.sh: some nicer messages The script scripts/pull_github_pr.sh begins by fetching some information from github, which can cause a noticable wait that the user doesn't understand - so in this patch we add a couple of messages on what is happening in the beginning of the script. Moreover, if an invalid pull-request number is given, the script used to give mysterious errors when incorrect commands ran using the name "null" - in this patch we recognize this case and print a clear "Not Found" error message. Finally, the PR_REPO variable was never used, so this patch removes it. Message-Id: <20200923151905.674565-1-nyh@scylladb.com>	2020-09-23 20:53:23 +03:00
Avi Kivity	a63a00b0ea	scripts/pull_pr: don't pollute local branch namespace Currently, scripts/pull_pr pollutes the local branch namespace by creating a branch and never deleting it. This can be avoided by using FETCH_HEAD, a temporary name automatically assigned by git to fetches with no destination.	2020-09-23 15:47:51 +03:00
Avi Kivity	d3588d72c7	Merge "Per semaphore read metrics" from Botond " Currently all logical read operations are counted in a single pair of metrics (successful/failed) located in the `database::db_stats`. This prevents observing the number of reads executed against the user/system read semaphores. This distinction is especially interesting since `0c6bbc84c` which selects the semaphore for each read based on the scheduling group it is running under. This mini series moves these counters into the semaphore and updates the exported metrics accordingly, the `total_reads` and `total_reads_failed` now has a user/system lable, just like the other semaphore dependent metrics. Tests: manual(checked that new metric works) " * 'per-semaphore-read-metrics/v2' of https://github.com/denesb/scylla: database: move total_reads* metrics to the concurrency semaphore database: setup_metrics(): split the registering database metrics in two reader_concurrency_semaphore: add non-const stats accessor reader_concurrency_semaphore: s/inactive_read_stats/stats/	2020-09-23 15:25:18 +03:00
Botond Dénes	d7e794e565	database: move total_reads* metrics to the concurrency semaphore	2020-09-23 14:10:24 +03:00
Botond Dénes	32ff524454	database: setup_metrics(): split the registering database metrics in two Currently all "database" metrics are registered in a single call to `metric_groups::add_group()`. As all the metrics to-be-registered are passed in a single initializer list, this blows up the stack size, to the point that adding a single new metric causes it to exceed the currently configured max-stack-size of 13696 bytes. To reduce stack usage, split the single call in two, roughly in the middle. While we could try to come up with some logical grouping of metrics and do much arranging and code-movement I think we might as well just split into two arbitrary groups, containing roughly the same amount of metrics.	2020-09-23 14:06:20 +03:00
Pekka Enberg	9a19c028e4	scylla_kernel_check: Switch to os.mkdirs() function Commit `8e1f7d4fc7` ("dist/common/scripts: drop makedirs(), use os.makedirs()") dropped the "mkdirs()" function, but forgot to convert the caller in scylla_kernel_check to os.mkdirs(). Message-Id: <20200923104510.230244-1-penberg@scylladb.com>	2020-09-23 13:54:43 +03:00
Botond Dénes	593232be0a	reader_concurrency_semaphore: add non-const stats accessor In the next patch we will add externally updated stats, which need a non-const reference to the stats member.	2020-09-23 13:11:55 +03:00
Botond Dénes	c18756ce9a	reader_concurrency_semaphore: s/inactive_read_stats/stats/ In preparations of non-inactive read stats being added to the semaphore, rename its existing stats struct and member to a more generic name. Fields, whose name only made sense in the context of the old name are adjusted accordingly.	2020-09-23 13:11:55 +03:00
Pekka Enberg	bc13e596fe	Update tools/java submodule * tools/java d0cfef38d2...583fc658df (1): > build: support passing product-version-release as a parameter	2020-09-23 12:58:34 +03:00
Pekka Enberg	b0447f3245	Update tools/jmx submodule * tools/jmx 6795a22...45e4f28 (1): > build: support passing product-version-release as a parameter	2020-09-23 12:58:30 +03:00
Nadav Har'El	4c2e026e04	alternator streams: fix NextShardIterator for closed shard As the test test_streams_closed_read confirmed, when a stream shard is closed, GetRecords should not return a NextShardIterator at all. Before this patch we wrongly returned an empty string for it. Before this patch, several Alternator Stream tests (in test_streams.py) failed when running against a multi-node Scylla cluster. The reason is as follows: As a multi-node cluster boots and more and more nodes enter the cluster, the cluster changes its mind about the token ownership, and therefore the list of stream shards changes. By the time we have the full cluster, a bunch of shards were created and closed without any data yet. All the tests will see these closed shards, and need to understand them. The fetch_more() utility function correctly assumed that a closed shard does not return a NextShardIterator, and got confused by the empty string we used to return. Now that closed shards can return responses without NextShardIterator, we also needed to fix in this patch a couple of tests which wrongly assumed this can't happen. These tests did not fail on DynamoDB because unlike in Scylla, DynamoDB does not have any closed shards in normal tests which do not specifically cause them (only test_streams_closed_read). We also need to fix test_streams_closed_read to get rid of an unnecessary assumption: It currently assumes that when we read the very last item in a closed shard is read, the end-of-shard is immediately signaled (i.e., NextShardIterator is not returned). Although DynamoDB does in fact do this, it is also perfectly legal for Alternator's implementation to return the last item with a new NextShardIterator - and only when the client reads from that iterator, we finally return the signal the end of the shard. Fixes #7237. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200922082529.511199-1-nyh@scylladb.com>	2020-09-23 09:25:10 +02:00
Avi Kivity	0852f33988	build: disable many clang-specific warnings While many of these warnings are important, they can be addressed at at a lower priority. In any case it will be easier to enforce the warnings if/when we switch to clang.	2020-09-22 23:09:48 +03:00
Tomasz Grabiec	a22645b7dd	Merge "Unfriend rows_entry, cache_tracker and mutation_partition" from Pavel Emelyanov The classes touche private data of each other for no real reason. Putting the interaction behind API makes it easier to track the usage. * xemul/br-unfriends-in-row-cache-2: row cache: Unfriend classes from each other rows_entry: Move container/hooks types declarations rows_entry: Simplify LRU unlink mutation_partition: Define .replace_with method for rows_entry mutation_partition: Use rows_entry::apply_monotonically	2020-09-22 21:18:14 +02:00
Nadav Har'El	73cb9e3f61	merge: Fix some issues found by clang Merged pull request https://github.com/scylladb/scylla/pull/7264 by Avi Kivity (29 commits): This series fixes issues found by clang. Most are real issues that gcc just doesn't find, a few are due to clang lagging behind on some C++ updates. See individual explanations in patches. The series is not sufficient to build with clang; it just addresses the simple problems. Two larger problems remain: clang isn't able to compile std::ranges (not clear yet whether this is a libstdc++ problem or a clang problem) and clang can't capture structured binding variables (due to lagging behind on the standard). The motivation for building with clang is gaining access to a working implementation of coroutines and modules. This series compiles with gcc and the unit tests pass.	2020-09-22 21:42:28 +03:00
Botond Dénes	a0107ba1c6	reader_permit: reader_resources: make true RAII class Currently in all cases we first deduct the to-be-consumed resources, then construct the `reader_resources` class to protect it (release it on destruction). This is error prone as it relies on no exception being thrown while constructing the `reader_resources`. Albeit the `reader_resources` constructor is `noexcept` right now this might change in the future and as the call sites relying on this are disconnected from the declaration, the one modifying them might not notice. To make this safe going forward, make the `reader_resources` a true RAII class, consuming the units in its constructor and releasing them in its destructor. Fixes: #7256 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200922150625.1253798-1-bdenes@scylladb.com>	2020-09-22 18:13:35 +03:00
Avi Kivity	31a5378a82	utils: utf8: avoid harmless integer overflow 240 doesn't fit in char without overflow, so cast it explicitly to avoid a clang warning.	2020-09-22 17:24:33 +03:00
Avi Kivity	e12c72ad55	utils: multiprecision_int: disambiguate operator templates by adding overloads We have templates for multiprecision_int for both sides of the operator, for example: template <typename T> bool operator==(const T& x) const and template <typename T> friend bool operator==(const T& x, const multiprecision_int& y) Clang considers them equally satisfying when both operands are multiprecision_int, so provide a disambiguating overload.	2020-09-22 17:24:33 +03:00
Avi Kivity	d1c049b202	utils: error_injection: remove forward-declared function returning auto Clang dislikes forward-declared functions returning auto, so declare the type up front. Functions returning auto are a readability problem anyway. To solve a circular dependency problem (get_local_injector() -> error_injection<> -> get_local_injector()), which is further compounded by problems in using template specializations before they are defined (which is forbidden), the storage for get_local_injector() was moved to error_injection<>, and get_local_injector() is just an accessor. After this, error_injection<> does not depend on get_local_injector().	2020-09-22 17:24:33 +03:00
Avi Kivity	765e632626	utils: bptree: remove redundant and possibly wrong friend declaration Clang complains about befriending a constructor. It's possibly correct. In any case it's redundant, so remove it.	2020-09-22 17:24:33 +03:00
Avi Kivity	c7105019b2	utils: bptree: add missing typename for clang Clang does not implement p0634r3, so we must add more typenames.	2020-09-22 17:24:33 +03:00
Avi Kivity	0d25ea5a67	utils: bloom_calculations: avoid gratuitous conversion to double The conversion to double evokes a complaint about precision loss from clang, and is unneeded anyway, so use integral types throughout.	2020-09-22 17:24:33 +03:00
Avi Kivity	4c93ec8351	utils: updateable_value: fix nullptr_t name nullptr_t's full name is std::nullptr_t. gcc somehow allows plain nullptr_t, but that's not correct. Clang rejects it. Use std::nullptr_t.	2020-09-22 17:24:33 +03:00
Avi Kivity	3570533e8f	tracing: fix nullptr_t name nullptr_t's full name is std::nullptr_t. gcc somehow allows plain nullptr_t, but that's not correct. Clang rejects it. Use std::nullptr_t.	2020-09-22 17:24:33 +03:00
Avi Kivity	dba07440c9	test: sstable_directory_test: make new_sstable() not a template new_sstable is defined as a template, and later used in a context that requires an object. Somehow gcc uses an instantiation with an empty template parameter list, but I don't think it's right, and clang refuses. Since the template is gratuitous anyway, just make it a regular function.	2020-09-22 17:24:33 +03:00
Avi Kivity	70ea785cc7	test: cql_query_test: don't use std::pow() in constexpr context std::pow() is not constexpr, and clang correctly refuses to assign its result in constexpr context. Add a constexpr replacement.	2020-09-22 17:24:25 +03:00
Nadav Har'El	c1e8d077a4	alternator test: add test for behavior of closed stream shards This patch adds a test, test_streams_closed_read, which reproduces two issues in Alternator Streams, regarding the behavior of closed stream shards: Refs #7239: After streaming is disabled, the stream should still be readable, it's just that all its shards are now "closed". Refs #7237: When reaching the end of a closed shard, NextShardIterator should be missing. Not set to an empty string as we do today. The test passes on DynamoDB, and xfails on Alterator, and should continue to do so until both issues are fixed. This patch changes the implementation of the disable_stream() function. This function was never actually used by the existing code, and now that I wanted to use it, I discovered it didn't work as expected and had to fix it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200915134643.236273-1-nyh@scylladb.com>	2020-09-22 10:18:01 +02:00
Pavel Emelyanov	a75b048616	gossiper: Unregister verbs if shadow round aborts start The gossiper verbs are registered in two places -- start_gossiping and do_shadow_round(). And unregistered in one -- stop_gossiping iff the start took place. Respectively, there's a chance that after a shadow round scylla exits without starting gossiping thus leaving verbs armed. Fix by unregistering verbs on stop if they are still registered. fixes: #7262 tests: manual(start, abort start after shadow round), unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200921140357.24495-1-xemul@scylladb.com>	2020-09-22 10:18:01 +02:00
Pavel Emelyanov	550fc734d9	query_pager: Fix continuation handling for noop visitor Before updating the _last_[cp]key (for subsequent .fetch_page()) the pager checks is 'if the pager is not exhausted OR the result has data'. The check seems broken: if the pager is not exhausted, but the result is empty the call for keys will unconditionally try to reference the last element from empty vector. The not exhausted condition for empty result can happen if the short_read is set, which, in turn, unconditionally happens upon meeting partition end when visiting the partition with result builder. The correct check should be 'if the pager is not exhausted AND the result has data': the _last_[pc]key-s should be taken for continuation (not exhausted), but can be taken if the result is not empty (has data). fixes: #7263 tests: unit(dev), but tests don't trigger this corner case Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200921124329.21209-1-xemul@scylladb.com>	2020-09-22 10:18:01 +02:00
Pavel Emelyanov	13281c2d79	results-view: Abort early if messing with empty vector The .get_last_partition_and_clustering_key() method gets the last partition from the on-board vector of partitions. The vector in question is assumed not to be empty, but if this assumption breaks, the result will look like memory corruption (docs say that accessing empty's vector back() results in undefined behavior). tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200921122948.20585-1-xemul@scylladb.com>	2020-09-22 10:18:01 +02:00
Pekka Enberg	ea8e545e4e	Update tools/java submodule * tools/java 2e2c056c07...d0cfef38d2 (1): > sstableloader: Support range boundary tombstones	2020-09-22 10:18:01 +02:00
Ivan Prisyazhnyy	f4412029f4	docs/docker-hub.md: add quickstart section with --smp 1 Also provide formula to calculate proper value for aio-max-nr. Closes #7252	2020-09-22 10:18:01 +02:00
Ivan Prisyazhnyy	59463d6e5f	docs/docker-hub.md: add jmx doc Describe flags that allow override JMX service startup for docker container. Closes #7250	2020-09-22 10:18:01 +02:00
Takuya ASADA	f243ccfa89	scylla_cpuscaling_setup: add Install section for scylla-cpupower.service Install section requires for enable/disable services. Fixes #7230	2020-09-22 10:18:01 +02:00
Avi Kivity	bf8c8d292a	test: cql_query_test: disambiguate single-element intializer_list for clang Clang has a hard time dealing with single-element initializer lists. In this case, adding an explicit conversion allows it to match the initializer_list<data_value> parameter.	2020-09-21 20:16:11 +03:00
Avi Kivity	11835f7aa6	test: avoid using literal suffix 'd' There is no literal suffix 'd', yet we use it for double-precision floats. Clang rightly complains, so remove it.	2020-09-21 16:32:53 +03:00
Avi Kivity	d19c6c0d98	sstables: size_tiered_backlog_tracker: avoid assignment of non-constexpr expression to constexpr object std::log() is not constexpr, so it cannot be assigned to a constexpr object. Make it non-constexpr and automatic. The optimizer still figures out that it's constant and optimizes it. Found by clang. Apparently gcc only checks the expression is constant, not constexpr.	2020-09-21 16:32:53 +03:00
Avi Kivity	a155b2bced	sstables: leveled_manifest: prevent benign precision loss warning Casting from the maximum int64_t to double loses precision, because int64_t has 64 bits of precision while double has only 53. Clang warns about it. Since it's not a real problem here, add an explicit cast to silence the warning.	2020-09-21 16:32:53 +03:00
Avi Kivity	aa7426bde6	sstables: index_reader: make 'index_bound' public index_reader::index_bound must be constructible by non-friend classes since it's used in std::optional (which isn't anyone's friend). This now works in gcc because gcc's inter-template access checking is broken, but clang correctly rejects it.	2020-09-21 16:32:53 +03:00
Avi Kivity	bd42bdd6b5	sstables: index_reader: disambiguate promoted_index_blocks_reader "state" type and data member promoted_index_blocks_reader has a data member called "state", and a type member called "state". Somehow gcc manages to disambiguate the two when used, but clang doesn't. I believe clang is correct here, one member should subsume the other. Change the type member to have a different name to disambiguate the two.	2020-09-21 16:32:53 +03:00
Avi Kivity	422a7e07a3	timestamp_based_splitting_writer: supply a parameter to std::out_of_range contructor std::out-of-range does not have a default constructor, yet gcc somehow accepts a no-argument construction. Clang (correctly) doesn't, so add a parameter.	2020-09-21 16:32:53 +03:00
Avi Kivity	a0ffcabd66	view: use nonwrapping_interval instead of nonwrapping_range to avoid clang deduction failure We use class template argument deduction (CTAD) in a few places, but it appears not to work for alias templates in clang. While it looks like a clang bug, using the class name is an improvement, so let's do that.	2020-09-21 16:32:53 +03:00
Avi Kivity	933bc7bd99	cql3: select_statement: fix incorrect implicit conversion of bool_class to bool bool_class only has explicit conversion to bool, so an assignment such as bool x = bool_class<foo>(true); ought to fail. Somehow gcc allows it, but I believe clang is correct in disallowing it. Fix by using 'auto' to avoid the conversion.	2020-09-21 16:32:53 +03:00
Avi Kivity	ef20afea7c	counters: unconfuse clang in counter_cell_builder::inserter_iterator Clang gets confused in this operator=() implementation. Frankly, I can't see why. But adding this-> helps it.	2020-09-21 16:32:53 +03:00
Avi Kivity	186c6cef57	cdc: sprinkle parentheses in EntryContainer concept Due to a bug, clang does not decay a type to a reference, failing the concept evaluation on correct input. Add parentheses to force it to decay the type.	2020-09-21 16:32:53 +03:00
Avi Kivity	30f2b3ba2f	bytes: define contructor for fmt_hex clang 10 does not implement p0960r3, so we must define a constructor for fmt_hex.	2020-09-21 16:32:53 +03:00
Avi Kivity	388dcf126c	atomic_cell.hh: forward-declare atomic_cell_or_collection atomic_cell_or_collection is also declared as a friend class further down, and gcc appears to inject this friend declration into the global namespace. Clang appears not to, and so complains when atomic_cell_or_collection is mentioned in the declaration of merge_column(). Add a forward declaration in the global namespace to satisfy clang.	2020-09-21 16:32:53 +03:00
Avi Kivity	cc3c9ba03a	alternator/streams: don't use non-existent std::ostringstream::view() We call ostringstream::view(), but that member doesn't exist. It works because it is guarded by an #ifdef and the guard isn't satisified, but if it is (as with clang) it doesn't compile. Remove it.	2020-09-21 16:32:10 +03:00
Avi Kivity	2d33a3f73c	alternator/base64: fix harmless integer overflow We assign 255 to an int8_t, but 255 isn't representable as an int8_t. Change to the bitwise equivalent but representable -1. Found by clang.	2020-09-21 16:32:10 +03:00
Avi Kivity	cf3e779180	alternator/base64: fix misuse of strlen() in constexpt context base64_chars() calls strlen() from a static_assert, but strlen() isn't (and can't be) constexpr. gcc somehow allows it, but clang rightfully complains. Fix by using a character array and sizeof, instead of a pointer and strlen().	2020-09-21 16:32:10 +03:00
Avi Kivity	ee980ee32f	hashers: convert illegal contraint to static_assert The constraint on on cryptopp_hasher<>::impl is illegal, since it's not on the base template. Convert it to a static_assert. We could have moved it to the base template, but that would have undone the work to push all the implementation details in .cc and reduce #include load. Found by clang.	2020-09-21 16:32:10 +03:00
Avi Kivity	c5312618d0	hashers: relax noexcept requirement from CryptoPP Update() functions While we need CryptoPP Update() functions not to throw, they aren't marked noexcept. Since there is no reason for them to throw, we'll just hope they don't, and relax the requirement. Found by clang. Apparently gcc didn't bother to check the constraint here.	2020-09-21 16:32:10 +03:00
Avi Kivity	22781ab7e3	hashers: add missing typename in Hashers concept Found by clang. Likely due to clang not implementing p0634r3, not a gcc bug.	2020-09-21 16:31:40 +03:00
Botond Dénes	ab59e7c725	flat_mutation_reader: add buffer() accessor To allow outsiders to inspect the contents of the reader's buffer.	2020-09-21 13:33:42 +03:00
Etienne Adam	208a721253	redis: add hexists command Add HEXISTS command which return 1 if the key/field of a hash exist, otherwise return 0. Signed-off-by: Etienne Adam <etienne.adam@gmail.com> Message-Id: <20200917200259.338-1-etienne.adam@gmail.com>	2020-09-21 12:32:33 +03:00
Avi Kivity	75e72d18d2	Merge 'Simplify scylla_util.py' from Takuya ASADA scylla_util.py becomes large, some code are unused, some code are only for specific script. Drop unnecessary things, move non-common functions to caller script, make scylla_util.py simple. Closes #7102 * syuu1228-refactor_scylla_util: scylla_util.py: de-duplicate code on parse_scylla_dirs_with_default() and get_scylla_dirs() scylla_util.py: remove rmtree() and redhat_version() since these are unused dist/common/scripts: drop makedirs(), use os.makedirs() dist/common/scripts: drop hex2list.py since it's nolonger used dist/common/scripts: drop is_systemd() since we nolonger support non-systemd environment dist/common/scripts: drop dist_name() and dist_ver() dist/common/scripts: move functions that are only called from single file	2020-09-19 21:01:24 +03:00
Takuya ASADA	48223022f7	scylla_util.py: de-duplicate code on parse_scylla_dirs_with_default() and get_scylla_dirs() Seems like parse_scylla_dirs_with_default() and get_scylla_dirs() shares most of the code, de-duplicate it. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2020-09-20 00:50:37 +09:00
Takuya ASADA	f8321bc66a	scylla_util.py: remove rmtree() and redhat_version() since these are unused	2020-09-20 00:50:05 +09:00
Takuya ASADA	8e1f7d4fc7	dist/common/scripts: drop makedirs(), use os.makedirs() Since os.makedirs() has exist_ok option, no need to create wrapper function.	2020-09-20 00:48:06 +09:00
Takuya ASADA	85f76e80b4	dist/common/scripts: drop hex2list.py since it's nolonger used Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2020-09-20 00:45:25 +09:00
Takuya ASADA	0f5c83f73d	dist/common/scripts: drop is_systemd() since we nolonger support non-systemd environment	2020-09-20 00:45:02 +09:00
Takuya ASADA	82701dc5ed	dist/common/scripts: drop dist_name() and dist_ver() It can be replaced with distro.name() and distro.version(). Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2020-09-20 00:42:27 +09:00
Takuya ASADA	79d8192dc7	dist/common/scripts: move functions that are only called from single file scylla_util.py is a library for common functions across setup scripts, it should not include private function of single file. So move all those functions to caller file. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2020-09-20 00:42:19 +09:00
Avi Kivity	4e7ad448f4	Update seastar submodule * seastar dc06cd1f0f...9ae33e67e1 (9): > expiring_fifo: mark methods noexcept > chunked_fifo: mark methods noexcept > circular_buffer: support stateful allocators > net/posix-stack: fix sockaddr forward reference > rpc: fix LZ4_DECODER_RING_BUFFER_SIZE not defined > futures_test: fix max_concurrent_for_each concepts error > core: limit memory size for each process to 64GB > core/reactor_backend: kill unused nr_retry > Merge "IO tracing" from Pavel E	2020-09-19 16:40:48 +03:00
Avi Kivity	311b6b827c	build: unified tarball: reduce log spam Don't print out the names of all files archived, there are too many of them. Closes #7191	2020-09-18 15:00:59 +03:00
Pekka Enberg	db272ba799	Update tools/java submodule * tools/java 6c1c484140...2e2c056c07 (1): > sstableloader: Add verbose message if sstable/file cannot be opened	2020-09-17 17:19:41 +03:00
Pekka Enberg	9650b8d4b5	Update tools/java submodule * tools/java b0114f64bc...6c1c484140 (1): > sstableloader: fix generating CQL statements	2020-09-17 12:25:08 +03:00
Avi Kivity	8722cb97ae	Merge "storage_service: set_tables_autocompaction: run_with_api_lock" from Benny " Based on https://github.com/scylladb/scylla/issues/7199, it looks like storage_service::set_tables_autocompaction may be called on shards other than 0. Use run_with_api_lock to both serialize the action and to check _initialized on shard 0. Fixes #7199 Test: unit(dev), compaction_test:TestCompaction_with_SizeTieredCompactionStrategy.disable_autocompaction_nodetool_test " * tag 'set_tables_autocompaction-v1' of github.com:bhalevy/scylla: storage_service: set_tables_autocompaction: fixup indentation storage_service: set_tables_autocompaction: run_with_api_lock storage_service: set_tables_autocompaction: use do_with to hold on to args storage_service: set_tables_autocompaction: log message in info level	2020-09-17 12:21:06 +03:00
Avi Kivity	12ca7f7ace	Merge 'dist/common/scripts: skip internet access on offline installation' from Takuya ASADA We need to skip internet access on offline installation. To do this we need following changes: - prevent running yum/apt for each script - set default "NO" for scripts it requires package installation - set default "NO" for scripts it requires internet access, such as NTP See #7153 Closes #7224 * syuu1228-offline_setup: dist/common/scripts: skip internet access on offline installation scylla_ntp_setup: use shutil.witch() to lookup command	2020-09-17 12:14:20 +03:00
Avi Kivity	809a13d0f4	Merge 'Gce image support' from Bentsi - Add necessary changes to `scylla_util.py` in order to support Scylla Machine Image in GCE. - Fixes and improvements for `curl` function. Closes #7080 * bentsi-gce-image-support: scylla_util.py: added GCE instance/image support scylla_util.py: make max_retries as a curl function argument scylla_util.py: adding timeout to curl function scylla_util.py: styling fixes to curl function scylla_util.py: change default value for headers argument in curl function	2020-09-17 12:09:19 +03:00
Pekka Enberg	d6bf424127	configure.py: Build scylla-unified-package.tar.gz to build/<mode>/dist/tar Let's build scylla-unified-package.tar.gz in build/<mode>/dist/tar for symmetry. The old location is still kept for backward compatibility for now. Also document the new official artifact location. Message-Id: <20200917071131.126098-1-penberg@scylladb.com>	2020-09-17 11:01:02 +03:00
Avi Kivity	e43d6d1460	Merge "Unregister all RPC verbs" from Pavel E " ... and make sure nothing is left. Whith the help of fresh seastar this can be done quickly. Before doing this check -- unregister remaining verbs in repair and storage_service and fix tests not to register verbs, because they are all local. tests: unit(dev), manual " * 'br-messaging-service-stop-all' of https://github.com/xemul/scylla: messaging_service: Report still registered services as errors repair: Move CHECKSUM_RANGE verb into repair/ repair: Toss messaging init/uninit calls storage_service: Uninit RPC verbs test: Do not init messaging verbs	2020-09-17 10:59:10 +03:00
Nadav Har'El	b81e3d9a4e	alternator, doc: link to the separate repository on load balancing Add in docs/alternator/alternator.md a link to the external repository devoted to Alternator load balancing instructions, example, and code - https://github.com/scylladb/alternator-load-balancing/. Fixes #5030. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200917061328.371557-1-nyh@scylladb.com>	2020-09-17 09:56:29 +03:00
Pavel Emelyanov	2fde6bbfe7	messaging_service: Report still registered services as errors On stop -- unregister the CLIENT_ID verb, which is registerd in constructor, then check for any remaining ones. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-17 09:52:57 +03:00
Pavel Emelyanov	9a15ebfe6a	repair: Move CHECKSUM_RANGE verb into repair/ The verb is sent by repair code, so it should be registered in the same place, not in main. Also -- the verb should be unregistered on stop. The global messaging service instance is made similarly to the row-level one, as there's no ready to use repair service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-17 09:52:48 +03:00
Pavel Emelyanov	d5769346d7	repair: Toss messaging init/uninit calls There goal is to make it possible to reg/unreg not only row-level verbs. While at it -- equip the init call with sharded<database>& argument, it will be needed by the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-17 09:52:48 +03:00
Pavel Emelyanov	949a258809	storage_service: Uninit RPC verbs The service does this on stop, which is never called, so do it separately. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-17 09:52:45 +03:00
Pavel Emelyanov	2d45d71413	test: Do not init messaging verbs The CQL tests do not use networking, so there is no need in registering any verbs Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-17 09:51:33 +03:00
Avi Kivity	a81b731a1a	scripts: pull_pr.sh: resolve pull request user name Convert the github handle to a real name. Closes #7247	2020-09-17 08:27:00 +02:00
Pekka Enberg	f6f9f832ee	test.py: Add "--list" option to show a list of tests This patch adds a "--list" option to test.py that shows a list of tests instead of executing them. This is useful for people and scripts, which need to discover the tests that will be run. For example, when Jenkins needs to store failing tests, it can use "test.py --list" to figure out what to archive. Message-Id: <20200916135714.89350-1-penberg@scylladb.com>	2020-09-16 16:02:48 +02:00
Benny Halevy	f207cff73d	token_metadata: set_pending_ranges: prep new interval_map out of line And move-assign to _pending_ranges_interval_map[keyspace_name] only when done. This is more effient since there's no need to look up _pending_ranges_interval_map[keyspace_name] for every insert to the interval_map. And it is exception safe in case we run out of memory mid-way. Refs #7220 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200916115059.788606-1-bhalevy@scylladb.com>	2020-09-16 15:28:42 +03:00
Avi Kivity	81844fb476	Update tools/java submodule * tools/java 2d49ded77b...b0114f64bc (1): > Merge "dist: do not install build dependencies on build script" from Takuya Fixes #7219.	2020-09-16 12:52:07 +03:00
Bentsi Magidovich	12e018ad04	scylla_util.py: added GCE instance/image support	2020-09-16 11:32:57 +03:00
Nadav Har'El	5e8bdf6877	alternator: fix corruption of PutItem operation in case of contention This patch fixes a bug noted in issue #7218 - where PutItem operations sometimes lose part of the item's data - some attributes were lost, and the name of other attributes replaced by empty strings. The problem happened when the write-isolation policy was LWT and there was contention of writes to the same partition (not necessarily the same item). To use CAS (a.k.a. LWT), Alternator builds an alternator::rmw_operation object with an apply() function which takes the old contents of the item (if needed) and a timestamp, and builds a mutation that the CAS should apply. In the case of the PutItem operation, we wrongly assumed that apply() will be called only once - so as an optimization the strings saved in the put_item_operation were moved into the returned mutation. But this optimization is wrong - when there is contention, apply() may be called again when the changed proposed by the previous one was not accepted by the Paxos protocol. The fix is to change the one place where put_item_operation moved strings out of the saved operations into the mutations, to be a copy. But to prevent this sort of bug from reoccuring in future code, this patch enlists the compiler to help us verify that it can't happen: The apply() function is marked "const" - it can use the information in the operation to build the mutation, but it can never modify this information or move things out of it, so it will be fine to call this function twice. The single output field that apply() does write (_return_attributes) is marked "mutable" to allow the const apply() to write to it anyway. Because apply() might be called twice, it is important that if some apply() implementation sometimes sets _return_attributes, then it must always set it (even if to the default, empty, value) on every call to apply(). The const apply() means that the compiler verfies for us that I didn't forget to fix additional wrong std::move()s. Additionally, a test I wrote to easily reproduce issue #7218 (which I will submit as a dtest later) passes after this fix. Fixes #7218. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200916064906.333420-1-nyh@scylladb.com>	2020-09-16 10:30:19 +02:00
Bentsi Magidovich	5aef44fc82	scylla_util.py: make max_retries as a curl function argument	2020-09-16 11:25:01 +03:00
Bentsi Magidovich	1f31e976cc	scylla_util.py: adding timeout to curl function	2020-09-16 11:25:01 +03:00
Bentsi Magidovich	f5d97afaa2	scylla_util.py: styling fixes to curl function - rename deprecated logging.warn to logging.warning - remove redundant round brackets in the if statement	2020-09-16 11:25:01 +03:00
Bentsi Magidovich	a24ec2686f	scylla_util.py: change default value for headers argument in curl function - It was set to {} that is incorrect and can lead to unexpected behavior https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments - Order of the arguments changed to more convinient way	2020-09-16 11:25:01 +03:00
Avi Kivity	4456645f97	scripts: pull_pr.sh: auto-close pull request after merge Add a "Closes #$PR_NUM" annotation at the end of the commit message to tell github to close the pull request, preventing manual work and/or dangling pull requests. Closes #7245	2020-09-16 10:23:34 +02:00
Avi Kivity	253a7640e3	Merge 'Clean up old cluster features' from Piotr Sarna " This series follows the suggestion from https://github.com/scylladb/scylla/pull/7203#issuecomment-689499773 discussion and deprecates a number of cluster features. The deprecation does not remove any features from the strings sent via gossip to other nodes, but it removes all checks for these features from code, assuming that the checks are always true. This assumption is quite safe for features introduced over 2 years ago, because the official upgrade path only allows upgrading from a previous official release, and these feature bits were introduced many release cycles ago. All deprecated features were picked from a `git blame` output which indicated that they come from 2018: ```git `e46537b7d3` 2016-05-31 11:44:17 +0200 RANGE_TOMBSTONES_FEATURE = "RANGE_TOMBSTONES"; `85c092c56c` 2016-07-11 10:59:40 +0100 LARGE_PARTITIONS_FEATURE = "LARGE_PARTITIONS"; `02bc0d2ab3` 2016-12-09 22:09:30 +0100 MATERIALIZED_VIEWS_FEATURE = "MATERIALIZED_VIEWS"; `67ca6959bd` 2017-01-30 19:50:13 +0000 COUNTERS_FEATURE = "COUNTERS"; `815c91a1b8` 2017-04-12 10:14:38 +0300 INDEXES_FEATURE = "INDEXES"; `d2a2a6d471` 2017-08-03 10:53:22 +0300 DIGEST_MULTIPARTITION_READ_FEATURE = "DIGEST_MULTIPARTITION_READ"; `ecd2bf128b` 2017-09-01 09:55:02 +0100 CORRECT_COUNTER_ORDER_FEATURE = "CORRECT_COUNTER_ORDER"; `713d75fd51` 2017-09-14 19:15:41 +0200 SCHEMA_TABLES_V3 = "SCHEMA_TABLES_V3"; `2f513514cc` 2017-11-29 11:57:09 +0000 CORRECT_NON_COMPOUND_RANGE_TOMBSTONES = "CORRECT_NON_COMPOUND_RANGE_TOMBSTONES"; `0be3bd383b` 2017-12-04 13:55:36 +0200 WRITE_FAILURE_REPLY_FEATURE = "WRITE_FAILURE_REPLY"; `0bab3e59c2` 2017-11-30 00:16:34 +0000 XXHASH_FEATURE = "XXHASH"; `fbc97626c4` 2018-01-14 21:28:58 -0500 ROLES_FEATURE = "ROLES"; `802be72ca6` 2018-03-18 06:25:52 +0100 LA_SSTABLE_FEATURE = "LA_SSTABLE_FORMAT"; `71e22fe981` 2018-05-25 10:37:54 +0800 STREAM_WITH_RPC_STREAM = "STREAM_WITH_RPC_STREAM"; ``` Tests: unit(dev) manual(verifying with cqlsh that the feature strings are indeed still set) " Closes #7234. * psarna-clean_up_features: gms: add comments for deprecated features gms: remove unused feature bits streaming: drop checks for RPC stream support roles: drop checks for roles schema support service: drop checks for xxhash support service: drop checks for write failure reply support sstables: drop checks for non-compound range tombstones support service: drop checks for v3 schema support repair: drop checks for large partitions support service: drop checks for digest multipartition read support sstables: drop checks for correct counter order support cql3: drop checks for materialized views support cql3: drop checks for counters support cql3: drop checks for indexing support	2020-09-16 10:53:25 +03:00
Avi Kivity	888fde59f8	Update tools/jmx submodule * tools/jmx d3096f3...6795a22 (1): > Merge "dist: do not install build dependencies on build script" from Takuya Ref #7219.	2020-09-16 10:30:33 +03:00
Takuya ASADA	db9e6f50f3	dist/common/scripts: skip internet access on offline installation We need to skip internet access on offline installation. To do this we need following changes: - prevent running yum/apt for each script - set default "NO" for scripts it requires package installation - set default "NO" for scripts it requires internet access, such as NTP See #7153 Fixes #7182	2020-09-16 10:05:20 +09:00
Takuya ASADA	ca8f0ff588	scylla_ntp_setup: use shutil.witch() to lookup command The command installed directory may different between distributions, we can abstract the difference using shutil.witch(). Also the script become simpler than passing full path to os.path.exists().	2020-09-16 10:04:23 +09:00
Avi Kivity	9421cfded4	reconcilable_result_builder: don't aggrevate out-of-memory condition during recovery Consider an unpaged query that consumes all of available memory, despite `fea5067dfa` which limits them (perhaps the user raised the limit, or this is a system query). Eventually we will see a bad_alloc which will abort the query and destroy this reconcilable_result_builder. During destruction, we first destroy _memory_accounter, and then _result. Destroying _memory_accounter resumes some continuations which can then allocate memory synchronously when increasing the task queue to accomodate them. We will then crash. Had we not crashed, we would immediately afterwards release _result, freeing all the memory that we would ever need. Fix by making _result the last member, so it is freed first. Fixes #7240.	2020-09-15 19:53:05 +02:00
Pavel Solodovnikov	6e10f2b530	schema_registry: make grace period configurable Introduce new database config option `schema_registry_grace_period` describing the amount of time in seconds after which unused schema versions will be cleaned up from the schema registry cache. Default value is 1 second, the same value as was hardcoded before. Tests: unit(debug) Refs: #7225 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200915131957.446455-1-pa.solodovnikov@scylladb.com>	2020-09-15 17:53:27 +02:00
Tomasz Grabiec	c9e1694c58	Merge "Some optimizations on cache entry lookup" from Pavel Emelyanov The set contains 3 small optimizations: - avoid copying of partition key on lookup path - reduce number of args carried around when creating a new entry - save one partition key comparison on reader creation Plus related satellite cleanups. * https://github.com/xemul/scylla/tree/br-row-cache-less-copies: row_cache: Revive do_find_or_create_entry concepts populating reader: Do not copy decorated key too early populating reader: Less allocator switching on population populating reader: Fix indentation after previous patch row_cache: Move missing entry creation into helper test: Lookup an existing entry with its own helper row_cache: Do not copy partition tombstone when creating cache entry row_cache: Kill incomplete_tag row_cache: Save one key compare on direct hit	2020-09-15 17:49:47 +02:00
Avi Kivity	7cf4c450cd	Update seastar submodule * seastar 8933f76d33...dc06cd1f0f (3): > lz4_fragmented_compressor: Fix buffer requirements Fixes #6925 > net: tls: Added feature to register callback for TLS verification > alien: be compatible use API Level 5	2020-09-15 17:33:24 +03:00
Avi Kivity	64ebb9c052	Merge 'Remove _pending_ranges and _pending_ranges_map in token_metadata' from Asias " This PR removes _pending_ranges and _pending_ranges_map in token_metadata. This removal of makes copying of token_metadata faster and reduces the chance to cause reactor stall. Refs: #7220 " * asias-token_metadata_replication_config_less_maps: token_metadata: Remove _pending_ranges token_metadata: Get rid of unused _pending_ranges_map	2020-09-15 17:16:35 +03:00
Piotr Sarna	7c8728dd73	Merge 'Add progress metrics for replace decommission removenode' from Asias. This series follows "repair: Add progress metrics for node ops #6842" and adds the metrics for the remaining node operations, i.e., replace, decommission and removenode. Fixes #1244, #6733 * asias-repair_progress_metrics_replace_decomm_removenode: repair: Add progress metrics for removenode ops repair: Add progress metrics for decommission ops repair: Add progress metrics for replace ops	2020-09-15 12:19:11 +02:00
Benny Halevy	0dc45529c8	abstract_replication_strategy: get_ranges_in_thread: copy _token_metadata if func may yield Change `94995acedb` added yielding to abstract_replication_strategy::do_get_ranges. And `07e253542d` used get_ranges_in_thread in compaction_manager. However, there is nothing to prevent token_metadata, and in particular its `_sorted_tokens` from changing while iterating over them in do_get_ranges if the latter yields. Therefore copy the the replication strategy `_token_metadata` in `get_ranges_in_thread(inet_address ep)`. If the caller provides `token_metadata` to get_ranges_in_thread, then the caller must make sure that we can safely yield while accessing token_metadata (like in `do_rebuild_replace_with_repair`). Fixes #7044 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200915074555.431088-1-bhalevy@scylladb.com>	2020-09-15 11:33:55 +03:00
Asias He	c38ec98c6e	token_metadata: Remove _pending_ranges - Remove get_pending_ranges and introduce has_pending_ranges, since the caller only needs to know if there is a pending range for the keyspace and the node. - Remove print_pending_ranges which is only used in logging. If we really want to log the new pending token ranges, we can log when we set the new pending token ranges. This removal of _pending_ranges makes copying of token_metadata faster and reduces the chance to cause reactor stall. Refs: #7220	2020-09-15 16:27:50 +08:00
Avi Kivity	19ffc9455d	Merge "Don't expose exact collection from range_tombstone_list" from Pavel E " The range_tombstone_list provides an abstraction to work with sorted list of range tombstones with methods to add/retrive them. However, there's a tombstones() method that just returns modifiable reference to the used collection (boost::intrusive_set) which makes it hard to track the exact usage of it. This set encapsulates the collaction of range tombstones inside the mentioned ..._list class. tests: unit(dev) " * 'br-range-tombstone-encapsulate-collection' of https://github.com/xemul/scylla: range_tombstone_list: Do not expose internal collection range_tombstone_list: Introduce and use pop-and-lock helper range_tombstone_list: Introduce and use pop_as<>() flat_mutation_reader: Use range_tombstone_list begin/end API repair: Mark some partition_hasher methods noexcept hashers: Mark hash updates noexcept	2020-09-15 10:09:15 +02:00
Botond Dénes	3c3b63c2b7	scylla-gdb.py: histogram: don't use shared default argument The histogram constructor has a `counts` parameter defaulted to `defaultdict(int)`. Due to how default argument values work in python -- the same value is passed to all invocations -- this results in all histogram instances sharing the same underlying counts dict. Solve it the way this is usually solved -- default the parameter to `None` and when it is `None` create a new instance of `defaultdict(int)` local to the histogram instance under construction. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200908142355.1263568-1-bdenes@scylladb.com>	2020-09-15 10:09:15 +02:00
Botond Dénes	c1bb648f90	scylla-gdb.py: managed_bytes_printer: print blobs in hex format Currently blobs are converted to python bytes objects and printed by simply converting them to string. This results in hard to read blobs as the bytes' __str__() attempts to interpret the data as a printable string. This patch changes this to use bytes.hex() which prints blobs in hex format. This is much more readable and it is also the format that scylla uses when printing blobs. Also the conversion to bytes is made more efficient by using gdb's gdb.inferior.read_memory() function to read the data. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200911085439.1461882-1-bdenes@scylladb.com>	2020-09-15 10:09:15 +02:00
Tomasz Grabiec	1f6c4f945e	mutation_partition: Fix typo drien -> driven Message-Id: <1600103287-4948-1-git-send-email-tgrabiec@scylladb.com>	2020-09-15 10:09:15 +02:00
Asias He	d38506fbf0	token_metadata: Get rid of unused _pending_ranges_map It is not used anymore. The size of _pending_ranges_map is is O(number of keyspaces). It can be very big when we have lots of keyspaces. Refs: #7220	2020-09-15 14:47:00 +08:00
Pavel Solodovnikov	e02301890b	schema_tables: extract `fill_column_info` helper The patch extracts a little helper function that populates a schema_mutation with various column information. Will be used in a subsequent patch to serialize column mappings. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-09-15 05:38:21 +03:00
Pavel Solodovnikov	778230f8b8	frozen_mutation: introduce `unfreeze_upgrading` method This helper function is similar to the ordinary `unfreeze` of `frozen_mutation` but in addition to the schema_ptr supplies a custom column_mapping which is being used when upgrading the mutation. Needed for a subsequent patch regarding column mappings history. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-09-15 05:26:44 +03:00
Takuya ASADA	5f541fbdc5	scylla_setup: drop hugepages package installation hugepages and libhugetlbfs-bin packages is only required for DPDK mode, and unconditionally installation causes error on offline mode, so drop it. Fixes #7182	2020-09-14 17:05:09 +03:00
Botond Dénes	192bcc5811	data/cell: don't overshoot target allocation sizes data::cell targets 8KB as its maximum allocations size to avoid pressuring the allocator. This 8KB target is used for internal storage -- values small enough to be stored inside the cell itself -- as well for external storage. Externally stored values use 8KB fragment sizes. The problem is that only the size of data itself was considered when making the allocations. For example when allocating the fragments (chunks) for external storage, each fragment stored 8KB of data. But fragments have overhead, they have next and back pointers. This resulted in a 8KB + 2 * sizeof(void*) allocation. IMR uses the allocation strategy mechanism, which works with aligned allocations. As the seastar allocation only guarantees aligned allocations for power of two sizes, it ends up allocating a 16KB slot. This results in the mutation fragment using almost twice as much memory as would be required. This is a huge waste. This patch fixes the problem by considering the overhead of both internal and external storage ensuring allocations are 8KB or less. Fixes: #6043 Tests: unit(debug, dev, release) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200910171359.1438029-1-bdenes@scylladb.com>	2020-09-14 14:21:46 +03:00
Piotr Sarna	d85a32ce70	mutation_partition: use proper hasher in row hashing Instead of using the default hasher, hasing specializations should use the hasher type they were specialized for. It's not a correctness issue now because the default hasher (xx_hasher) is compatible with its predecessor (legacy_xx_hasher_without_null_digest), but it's better to be future-proof and use the correct type in case we ever change the default hasher in a backward-incompatible way. Message-Id: <c84ce569d12d9b4f247fb2717efa10dc2dabd75b.1600074632.git.sarna@scylladb.com>	2020-09-14 14:17:36 +03:00
Piotr Sarna	dd085b146a	gms: add comments for deprecated features Features which are propagated to other nodes via gossip, but assumed they are supported in the code, are now marked with comments.	2020-09-14 12:59:19 +02:00
Benny Halevy	0410e11213	storage_service: set_tables_autocompaction: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-14 13:55:28 +03:00
Benny Halevy	39cf84f291	storage_service: set_tables_autocompaction: run_with_api_lock Based on https://github.com/scylladb/scylla/issues/7199, it looks like storage_service::set_tables_autocompaction may be called on shards other than 0. Use run_with_api_lock to both serialize the action and to check _initialized on shard 0. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-14 13:53:08 +03:00
Benny Halevy	1ca02c756f	storage_service: set_tables_autocompaction: use do_with to hold on to args In preparation to calling is_initialized() which may yield. Plus, the way the tables vector is currently captured is inefficient. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-14 13:45:00 +03:00
Benny Halevy	e85a6c4853	storage_service: set_tables_autocompaction: log message in info level This is rare enough and important for the operator to be logged in info level. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-14 13:38:59 +03:00
Piotr Sarna	defe6f49df	gms: remove unused feature bits Checks for features introduced over 2 years ago were removed in previous commits, so all that is left is removing the feature bits itself. Note that the feature strings are still sent to other nodes just to be double sure, but the new code assumes that all these features are implicitly enabled.	2020-09-14 12:35:28 +02:00
Piotr Sarna	f7a7931377	streaming: drop checks for RPC stream support Streaming with RPC stream is supported for over 2 years and upgrades are only allowed from versions which already have the support, so the checks are hereby dropped.	2020-09-14 12:18:13 +02:00
Pavel Emelyanov	dff8aebe58	partition_snapshot_reader: Do not fill buffer in constructor The reader fills up the buffer upon construction, which is not what other readers do, and is considered to be waste of cycles, as the reader can be dropped early. Refs #1671 test: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200910171134.11287-2-xemul@scylladb.com>	2020-09-14 12:18:03 +02:00
Piotr Sarna	d1480a5260	roles: drop checks for roles schema support Roles are supported for over 2 years and upgrades are only allowed from versions which already have the support, so the checks are hereby dropped.	2020-09-14 12:17:26 +02:00
Piotr Sarna	e19e86a6e7	service: drop checks for xxhash support xxhash algorithm is supported for over 2 years and upgrades are only allowed from versions which already have the support, so the checks are hereby dropped.	2020-09-14 12:13:03 +02:00
Piotr Sarna	f05bf78716	service: drop checks for write failure reply support Write failure reply is supported for over 2 years and upgrades are only allowed from versions which already have the support, so the checks are hereby dropped.	2020-09-14 12:11:42 +02:00
Piotr Sarna	16b4b86697	sstables: drop checks for non-compound range tombstones support Correct non-compound range tombstones are supported for over 2 years and upgrades are only allowed from versions which already have the support, so the checks are hereby dropped.	2020-09-14 12:09:51 +02:00
Piotr Sarna	cc57f7b154	service: drop checks for v3 schema support Schema v3 is supported for over 2 years and upgrades are only allowed from versions which already have the support, so the checks are hereby dropped.	2020-09-14 12:07:51 +02:00
Piotr Sarna	9e6098a422	repair: drop checks for large partitions support Large partitions are supported for over 2 years and upgrades are only allowed from versions which already have the support, so the checks are hereby dropped.	2020-09-14 12:07:20 +02:00
Piotr Sarna	854a44ff9b	service: drop checks for digest multipartition read support Digest multipartition read is supported for over 2 years and upgrades are only allowed from versions which already have the support, so the checks are hereby dropped.	2020-09-14 12:06:32 +02:00
Piotr Sarna	f8ed1b5b67	sstables: drop checks for correct counter order support Correct counter order is supported for over 2 years and upgrades are only allowed from versions which already have the support, so the checks are hereby dropped.	2020-09-14 12:05:11 +02:00
Piotr Sarna	18bd710dca	cql3: drop checks for materialized views support Views are supported for over 2 years and upgrades are only allowed from versions which already have the support, so the checks are hereby dropped.	2020-09-14 12:03:52 +02:00
Piotr Sarna	720d17a9c7	cql3: drop checks for counters support Counters are supported for over 2 years and upgrades are only allowed from versions which already have the support, so the checks are hereby dropped.	2020-09-14 12:03:41 +02:00
Piotr Sarna	7ba7d35aad	cql3: drop checks for indexing support Indexing is supported for over 2 years and upgrades are only allowed from versions which already have the support, so the checks are hereby dropped.	2020-09-14 12:03:37 +02:00
Avi Kivity	dcaf4ea4dd	Merge "Fix race in schema version recalculation leading to stale schema version in gossip" from Tomasz " Migration manager installs several cluster feature change listeners. The listeners will call update_schema_version_and_announce() when cluster features are enabled, which does this: return update_schema_version(proxy, features).then([] (utils::UUID uuid) { return announce_schema_version(uuid); }); It first updates the schema version and then publishes it via gossip in announce_schema_version(). It is possible that the announce_schema_version() part of the first schema change will be deferred and will execute after the other four calls to update_schema_version_and_announce(). It will install the old schema version in gossip instead of the more recent one. The fix is to serialize schema digest calculation and publishing. Refs #7200 This problem also brought my attention to initialization code, which could be prone to the same problem. The storage service computes gossiper states before it starts the gossiper. Among them, node's schema version. There are two problems with that. First is that computing the schema version and publishing it is not atomic, so is not safe against concurrent schema changes or schema version recalculations. It will not exclude with recalculate_schema_version() calls, and we could end up with the old (and incorrect) schema version being advertised in gossip. Second problem is that we should not allow the database layer to call into the gossiper layer before it is fully initialized, as this may produce undefined behavior. Maybe we're not doing concurrent schema changes/recalculations now, but it is easy to imagine that this could change for whatever reason in the future. The solution for both problems is to break the cyclic dependency between the database layer and the storage_service layer by having the database layer not use the gossiper at all. The database layer publishes schema version inside the database class and allows installing listeners on changes. The storage_service layer asks the database layer for the current version when it initializes, and only after that installs a listener which will update the gossiper. Tests: - unit (dev) - manual (3 node ccm) " * tag 'fix-schema-digest-calculation-race-v1' of github.com:tgrabiec/scylla: db, schema: Hide update_schema_version_and_announce() db, storage_service: Do not call into gossiper from the database layer db: Make schema version observable utils: updateable_value_source: Introduce as_observable() schema: Fix race in schema version recalculation leading to stale schema version in gossip	2020-09-14 12:37:46 +03:00
Etienne Adam	f3ce5f0cbb	redis: remove lambda in command_factory This follows the patch removing the commands classes, and removes unnecessary lambdas. Signed-off-by: Etienne Adam <etienne.adam@gmail.com> Message-Id: <20200914071651.28802-1-etienne.adam@gmail.com>	2020-09-14 11:30:20 +03:00
Avi Kivity	05229d1d31	Merge "Add unified tarball to build "dist" target" from Pekka " This pull request fixes unified relocatable package dependency issues in other build modes than release, and then adds unified tarball to the "dist" build target. Fixes #6949 " * 'penberg/build/unified-to-dist/v1' of github.com:penberg/scylla: configure.py: Build unified tarball as part of "dist" target unified/build_unified: Use build/<mode>/dist/tar for dependency tarballs configure.py: Use build/<mode>/dist/tar for unified tarball dependencies	2020-09-14 11:29:28 +03:00
Etienne Adam	bd82b4fc03	redis: remove commands classes This patch is a proposal for the removal of the redis classes describing the commands. 'prepare' and 'execute' class functions have been merged into a function with the name of the command. Note: 'command_factory' still needs to be simplified. Signed-off-by: Etienne Adam <etienne.adam@gmail.com> Message-Id: <20200913183315.9437-1-etienne.adam@gmail.com>	2020-09-14 11:29:28 +03:00
Avi Kivity	764866ed02	Update tools/jmx submodule * tools/jmx 8d92e54...d3096f3 (1): > dist: debian: fix detection of debuild	2020-09-13 16:26:53 +03:00
Raphael S. Carvalho	6a7409ef4c	scylla-gdb: Fix scylla tasks it's failing as so: Python Exception <class 'TypeError'> unsupported operand type(s) for +: 'int' and 'str': it's a regression caused by `e4d06a3bbf`. _mask() should use the ref stored in the ctor to dereference _impl. Fixes #7058. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200908154342.26264-1-raphaelsc@scylladb.com>	2020-09-13 16:25:27 +03:00
Avi Kivity	a8f40b3e89	Update seastar submodule * seastar 52f0f38994...8933f76d33 (10): > future-util: add max_concurrent_for_each > rwlock: define rwlock_for_read, rwlock_for_write, as classes, not structs > loopback_socket: mark get_sockopt, set_sockopt as override > native-stack: mark get_sockopt, set_sockopt as override > treewide: remove unused lambda captures > future: prevent spurious unused-lambda-capture warning in future<>::then > future: make futurize<> a friend of future_state_base > future: fix uninitialized constexpr in is_tuple_effectively_trivially_move_constructible_and_destructible > net: expose hidden method from parent class > future: s/std::result_of_t/std::invoke_result_t/	2020-09-13 16:19:09 +03:00
Takuya ASADA	233d0fc0e5	unified: don't proceed offline install when openjdk is not available Currently, we run openjdk existance check after scylla main program installed. We should do it before installing anything.	2020-09-13 12:39:05 +03:00
Pavel Emelyanov	bf4063d78e	row cache: Unfriend classes from each other Now cache_tracker, mutation_partition and rows_entry do not need to be friends. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-11 16:35:51 +03:00
Pavel Emelyanov	7a1265a338	rows_entry: Move container/hooks types declarations Define container types near the containing elements' hook members, so that they could be private without the need to friend classes with each other. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-11 16:35:51 +03:00
Pavel Emelyanov	7ed1e18a13	rows_entry: Simplify LRU unlink The cache_tracker tries to access private member of the rows_entry to unlink it, but the lru_type is auto_unlink and can unlink itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-11 16:35:51 +03:00
Pavel Emelyanov	7f2c6aed50	mutation_partition: Define .replace_with method for rows_entry The one is needed to hide the guts of rows_entry from mutation_partition. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-11 16:35:51 +03:00
Pavel Emelyanov	a946326daf	mutation_partition: Use rows_entry::apply_monotonically There is no need in touching the private member of rows_entry, as it exposes a method for this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-11 16:13:10 +03:00
Tomasz Grabiec	691009bc1e	db, schema: Hide update_schema_version_and_announce()	2020-09-11 14:42:48 +02:00
Tomasz Grabiec	9f58dcc705	db, storage_service: Do not call into gossiper from the database layer The storage service computes gossiper states before it starts the gossiper. Among them, node's schema version. There are two problems with that. First is that computing the schema version and publishing it is not atomic, so is not safe against concurrent schema changes or schema version recalculations. It will not exclude with recalculate_schema_version() calls, and we could end up with the old (and incorrect) schema version being advertised in gossip. Second problem is that we should not allow the database layer to call into the gossiper layer before it is fully initialized, as this may produce undefined behavior. The solution for both problems is to break the cyclic dependency between the database layer and the storage_service layer by having the database layer not use the gossiper at all. The database layer publishes schema version inside the database class and allows installing listeners on changes. The storage_service layer asks the database layer for the current version when it initializes, and only after that installs a listener which will update the gossiper. This also allows us to drop unsafe functions like update_schema_version().	2020-09-11 14:42:41 +02:00
Tomasz Grabiec	ad0b674b13	db: Make schema version observable	2020-09-11 14:42:41 +02:00
Tomasz Grabiec	fed89ee23e	utils: updateable_value_source: Introduce as_observable()	2020-09-11 14:42:41 +02:00
Tomasz Grabiec	1a57d641d1	schema: Fix race in schema version recalculation leading to stale schema version in gossip Migration manager installs several feature change listeners: if (this_shard_id() == 0) { _feature_listeners.push_back(_feat.cluster_supports_view_virtual_columns().when_enabled(update_schema)); _feature_listeners.push_back(_feat.cluster_supports_digest_insensitive_to_expiry().when_enabled(update_schema)); _feature_listeners.push_back(_feat.cluster_supports_cdc().when_enabled(update_schema)); _feature_listeners.push_back(_feat.cluster_supports_per_table_partitioners().when_enabled(update_schema)); } They will call update_schema_version_and_announce() when features are enabled, which does this: return update_schema_version(proxy, features).then([] (utils::UUID uuid) { return announce_schema_version(uuid); }); So it first updates the schema version and then publishes it via gossip in announce_schema_version(). It is possible that the announce_schema_version() part of the first schema change will be deferred and will execute after the other four calls to update_schema_version_and_announce(). It will install the old schema version in gossip instead of the more recent one. The fix is to serialize schema digest calculation and publishing. Refs #7200	2020-09-11 14:40:28 +02:00
Botond Dénes	e4798d9551	scylla-gdb.py: add scylla schema command To pretty print a schema. Example: (gdb) scylla schema $s (schema*) 0x604009352380 ks="scylla_bench" cf="test" id=a3eadd80-f2a7-11ea-853c-000000000004 version=47e0bf13-6cc8-3421-93c6-a9fe169b1689 partition key: byte_order_equal=true byte_order_comparable=false is_reversed=false "org.apache.cassandra.db.marshal.LongType" clustering key: byte_order_equal=true byte_order_comparable=false is_reversed=true "org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.LongType)" columns: column_kind::partition_key id=0 ordinal_id=0 "pk" "org.apache.cassandra.db.marshal.LongType" is_atomic=true is_counter=false column_kind::clustering_key id=0 ordinal_id=1 "ck" "org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.LongType)" is_atomic=true is_counter=false column_kind::regular_column id=0 ordinal_id=2 "v" "org.apache.cassandra.db.marshal.BytesType" is_atomic=true is_counter=false To preserve easy inspection of schema objects the printer is a command, not a pretty-printer. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200911100039.1467905-1-bdenes@scylladb.com>	2020-09-11 13:38:20 +02:00
Botond Dénes	4d7e2bf117	scylla-gdb.py: add pretty-printer for bytes Reusing the sstring pretty-printer. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200911094846.1466600-1-bdenes@scylladb.com>	2020-09-11 13:36:32 +02:00
Botond Dénes	d49c87ff47	scylla-gdb.py: don't use the string display hint for UUIDs It causes gdb to print UUIDs like this: "a3eadd80-f2a7-11ea-853c-", '0' <repeats 11 times>, "4" This is quite hard to read, let's drop the string display hint, so they are displayed like this: a3eadd80-f2a7-11ea-853c-000000000004 Much better. Also technically UUID is a 128 bit integer anyway, not a string. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200911090135.1463099-1-bdenes@scylladb.com>	2020-09-11 13:19:49 +02:00
Pekka Enberg	b50063e84a	configure.py: Build unified tarball as part of "dist" target Let's include the unified tarball as part of the "dist" target. Fixes #6949	2020-09-11 12:38:47 +03:00
Pekka Enberg	138e723e56	unified/build_unified: Use build/<mode>/dist/tar for dependency tarballs The build_unified.sh script has the same bug as configure.py had: it looks for the python tarball in build/<mode>/scylla-python3-package.tar.gz, but it's never generated there. Fix up the problem by using build/<mode>/dist/tar location for all dependency tarballs.	2020-09-11 12:37:44 +03:00
Pekka Enberg	1af17f56f9	configure.py: Use build/<mode>/dist/tar for unified tarball dependencies The build target for scylla-unified-package.tar.gz incorrectly depends on "build/<mode>/scylla-python3-package.tar.gz", which is never generated. Instead, the package is either generated in "build/release/scylla-python3-package.tar.gz" (for legacy reasons) or "build/<mode>/dist/tar/scylla-python3-package.tar.gz". This issues causes building unified package in other modes to fail. To solve the problem, let's switch to using the "build/<mode>/dist/tar" locations for unified tarball dependencies, which is the correct place to use anyway.	2020-09-11 11:57:58 +03:00
Nadav Har'El	3322328b21	alternator test: fix two tests that failed in HTTPS mode When the test suite is run with Scylla serving in HTTPS mode, using test/alternator/run --https, two Alternator Streams tests failed. With this patch fixing a bug in the test, the tests pass. The bug was in the is_local_java() function which was supposed to detect DynamoDB Local (which behaves in some things differently from the real DynamoDB). When that detection code makes an HTTPS request and does not disable checking the server's certificate (which on Alternator is self-signed), the request fails - but not in the way that the code expected. So we need to fix the is_local_java() to allow the failure mode of the self-signed certificate. Anyway, this case is not DynamoDB Local so the detection function would return false. Fixes #7214 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200910194738.125263-1-nyh@scylladb.com>	2020-09-11 08:10:13 +02:00
Nadav Har'El	02ee0483b2	alternator test: add reproducing tests for several issues This patch adds regression tests for four recently-fixed issues which did not yet have tests: Refs #7157 (LatestStreamArn) Refs #7158 (SequenceNumber should be numeric) Refs #7162 (LatestStreamLabel) Refs #7163 (StreamSpecification) I verified that all the new tests failed before these issues were fixed, but now pass. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200907155334.562844-1-nyh@scylladb.com>	2020-09-10 17:36:23 +02:00
Avi Kivity	0e03c979d2	Merge 'Fix ignoring cells after null in appending hash' from Piotr Sarna " This series fixes a bug in `appending_hash<row>` that caused it to ignore any cells after the first NULL. It also adds a cluster feature which starts using the new hashing only after the whole cluster is aware of it. The series comes with tests, which reproduce the issue. Fixes #4567 Based on #4574 " * psarna-fix_ignoring_cells_after_null_in_appending_hash: test: extend mutation_test for NULL values tests/mutation: add reproducer for #4567 gms: add a cluster feature for fixed hashing digest: add null values to row digest mutation_partition: fix formatting appending_hash<row>: make publicly visible	2020-09-10 15:35:38 +03:00
Piotr Sarna	fe5cd846b5	test: extend mutation_test for NULL values The test is extended for another possible corner case: [1, NULL, 2] vs [1, 2, NULL] should have different digests. Also, a check for legacy behavior is added.	2020-09-10 13:16:44 +02:00
Paweł Dziepak	287d0371fa	tests/mutation: add reproducer for #4567	2020-09-10 13:16:44 +02:00
Piotr Sarna	21a77612b3	gms: add a cluster feature for fixed hashing The new hashing routine which properly takes null cells into account is now enabled if the whole cluster is aware of it.	2020-09-10 13:16:44 +02:00
Piotr Sarna	7b329f7102	digest: add null values to row digest With the new hashing routine, null values are taken into account when computing row digest. Previous behavior had a regression which stopped computing the hash after the first null value is encountered, but the original behavior was also prone to errors - e.g. row [1, NULL, 2] was not distinguishable from [1, 2, NULL], because their hashes were identical. This hashing is not yet active - it will only be used after the next commit introduces a proper cluster feature for it.	2020-09-10 13:16:44 +02:00
Piotr Sarna	5ffd929eaa	mutation_partition: fix formatting	2020-09-10 12:20:32 +02:00
Paweł Dziepak	6f46010235	appending_hash<row>: make publicly visible appending_hash<row> specialisation is declared and defined in a .cc file which means it cannot have a dedicated unit test. This patch moves the declaration to the corresponding .hh file.	2020-09-10 12:20:32 +02:00
Avi Kivity	d55a8148ed	tools: toolchain: update for gnutls-3.6.15 GNUTLS-SA-2020-09-04 / GNUTLS-SA-2020-09-04. Fixes #7212.	2020-09-10 12:52:00 +03:00
Raphael S. Carvalho	86b9ea6fb2	storage_service: Fix use-after-free when calculating effective ownership Use-after-free happens because we take a ref to keyspace_name, which is stack allocated, and ceases to exist after the next deferring action. Fixes #7209. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200909210741.104397-1-raphaelsc@scylladb.com>	2020-09-10 11:33:50 +03:00
Dejan Mircevski	9d02f10c71	cql3: Fix NULL reference in get_column_defs_for_filtering There was a typo in get_column_defs_for_filtering(): it checked the wrong pointer before dereferencing. Add a test exposing the NULL dereference and fix the typo. Tests: unit (dev) Fixes #7198. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-09-10 08:45:07 +02:00
Asias He	3ba6e3d264	storage_service: Fix a TOKENS update race for replace operation In commit `7d86a3b208` (storage_service: Make replacing node take writes), application state of TOKENS of the replacing node is added into gossip and propagated to the cluster after the initial start of gossip service. This can cause a race below 1. The replacing node replaces the old dead node with the same ip address 2. The replacing node starts gossip without application state of the TOKENS 3. Other nodes in the cluster replace the application states of old dead node's version with the new replacing node's version 4. replacing node dies 5. replace operation is performed again, the TOKENS application state is not preset and replace operation fails. To fix, we can always add TOKENS application state when the gossip service starts. Fixes: #7166 Backports: 4.1 and 4.2	2020-09-09 15:24:21 +02:00
Tomasz Grabiec	de6aa668f5	Merge "Do not reimplement deletable_row in clustering_row" from Pavel Emelyanov The clustering_row class looks as a decorated deletable_row, but re-implements all its logic (and members). Embed the deletable_row into clustering_row and keep the non-static row logic in one class instead of two.	2020-09-09 14:27:18 +02:00
Takuya ASADA	59a6e08cb9	Add support passing python3 dependencies from main repo to scylla-python3 script We don't want to update scylla-python3 submodule for every python3 dependency update, bring python3 package list to python3-dependencies.txt, pass it on package building time. See #6702 See scylladb/scylla-python3#6 [avi: add * tools/python3 19a9cd3...b4e52ee (1): > Allow specify package dependency list by --packages to maintain bisectability]	2020-09-08 23:39:34 +03:00
Avi Kivity	291117ea9c	Update seastar submodule * seastar 4ff91c4c3a...52f0f38994 (4): > rpc: Return actual chosen compressor in server reponse - not all avail Fixes #6925. > net/tls: fix compilation guards around sec_param(). > future: improved printing of seastar::nested_exception > merge: http: fix issues with the request parser and testing	2020-09-08 23:39:34 +03:00
Takuya ASADA	e1b15ba09e	dist/common/scripts: abort scylla_prepare with better error message When configuration files for perftune contain invalid parameter, scylla_prepare may cause traceback because error handling is not eough. Throw all errors from create_perftune_conf(), catch them on scylla_prepare, print user understandable error. Fixes #6847	2020-09-08 23:39:34 +03:00
Pavel Emelyanov	4e264b9e4f	clustering_row: Do not re-implement deletable_row The clustering_row is deletable_row + clustering_key, all its internals work exactly as the relevant deletable_row's ones. The similar relation is between static_row and row, and the former wrapes the latter, so here's the same trick for the non-static row classes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-08 22:21:15 +03:00
Pavel Emelyanov	ca148acbf9	deletable_row: Do not mess with clustering_row The deletable_row accepts clustering_row in constructor and .apply() method. The next patch will make clustering_row embed the deletable_row inside, so those two methods will violate layering and should be fixed in advance. The fix is in providing a clustering_row method to convert itself into a deletable_row. There are two places that need this: mutation_fragment_applier and partition_snapshot_row_cursor. Both methods pass temporary clustering_row value, so the method in question is also move-converter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-08 22:18:15 +03:00
Kamil Braun	42fb4fe37c	cdc: fix deadlock inside check_and_repair_cdc_streams check_and_repair_cdc_streams, in case it decides to create a new CDC generation, updates the STATUS application state so that other nodes gossiped with pick up the generation change. The node which runs check_and_repair_cdc_streams also learns about a generation change: the STATUS update causes a notification change. This happens during add_local_application_state call which caused the STATUS update; it would lead to calling handle_cdc_generation, which detects a generation change and calls add_local_application_state with the new generation's timestamp. Thus, we get a recursive add_local_application_state call. Unforunately, the function takes a lock before doing on_change notifications, so we get a deadlock. This commit prevents the deadlock. We update the local variable which stores the generation timestamp before updating STATUS, so handle_cdc_generation won't consider the observed generation to be new, hence it won't perform the recursive add_local_application_state call.	2020-09-08 16:33:47 +03:00
Avi Kivity	c075539fea	Merge 'storage_proxy: add a separate smp_group for hints' from Eliran Hints writes are handled by storage_proxy in the exact same way regular writes are, which in turn means that the same smp service group is used for both. The problem is that it can lead to a priority inversion where writes of the lower priority kind occupies a lot of the semaphores units making the higher priority writes wait for an empty slot. This series adds a separate smp group for hints as well as a field to pass the correct smp group to mutate_locally functions, and then uses this field to properly classify the writes. Fixes #7177 * eliransin-hint_priority_inversion: Storage proxy: use hints smp group in mutate locally Storage proxy: add a dedicated smp group for hints	2020-09-08 16:12:37 +03:00
Wojciech Mitros	66e8214606	cql: Forbid adding new fields to UDTs used in partition key columns Changing a user type may allow adding apparently duplicate rows to tables where this type is used in a partitioning key. Fix by checking all types of existing partitioning columns before allowing to add new fields to the type. Fixes #6941	2020-09-08 16:08:07 +03:00
Avi Kivity	7ac59dcc98	lsa: decay reserves The log-structured allocator (LSA) reserves memory when performing operations, since its operations are performed with reclaiming disabled and if it runs out, it cannot evict cache to gain more. The amount of memory to reserve is remembered across calls so that it does not have to repeat the fail/increase-reserve/retry cycle for every operation. However, we currently lack decaying the amount to reserve. This means that if a single operation increased the reserve in the distant past, all current operations also require this large reserve. Large reserves are expensive since they can cause large amounts of cache to be evicted. This patch adds reserve decay. The time-to-decay is inversely proportional to reserve size: 10GB/reserve. This means that a 20MB reserve is halved after 500 operations (10GB/20MB) while a 20kB reserve is halved after 500,000 operations (10GB/20kB). So large, expensive reserves are decayed quickly while small, inexpensive reserves are decayed slowly to reduce the risk of allocation failures and exceptions. A unit test is added. Fixes #325.	2020-09-08 15:59:25 +03:00
Takuya ASADA	4deb245198	scylla_ntp_setup: don't install ntpd package when it's already exists Don't install ntpd package when it's already exists. Related with #7153	2020-09-08 13:59:04 +03:00
Etienne Adam	63a1a4cbb9	redis: add hgetall and hdel commands This patch adds support for 2 hash commands HDEL and HGETALL. Internally it introduces the hashes_result_builder class to read hashes and stored them in a std::map. Other changes: - one exception return string was fixed - tests now use pytest.raises Signed-off-by: Etienne Adam <etienne.adam@gmail.com> Message-Id: <20200907202528.4985-1-etienne.adam@gmail.com>	2020-09-08 11:59:52 +03:00
Eliran Sinvani	933b44f676	Storage proxy: use hints smp group in mutate locally We are using mutate_locally to handle hint mutations that arrived through RPC. The current implementation makes no distinction whether the mutation came through hint verb or a mutation verb resulting in using the same smp group for both. This commit adds the ability to reference different smp group in mutate_locally private calls and makes the handlers pass the correct smp group to mutate_locally.	2020-09-08 10:03:50 +03:00
Benny Halevy	f8d9e81bdb	types: time_point_to_string: prevent overflow of nanoseconds Due to #7175, microseconds are stored in a db_clock::time_point as if they were milliseconds. std::chrono::duration_cast<std::chrono::nanoseconds> may cause overflow and end up with invalid/negative nanos. This change specializes time_point_to_string to std::chrono::milliseconds since it's currently only called to print db_clock::time_point and uses boost::posix_time::milliseconds to print the count. This would generate an exception in today's time stamps and the output will look like: 1599493018559873 milliseconds (Year is out of valid range: 1400..9999) instead of: 1799-07-16T19:57:52.175010 It is preferrable to print the numeric value annotated as out of valid range than to print a bogus date in the past. Test: unit(dev), commitlog_test:TestCommitLog.test_mixed_mode_commitlog_same_partition_smp_1 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200907162845.147477-1-bhalevy@scylladb.com>	2020-09-08 10:02:02 +03:00
Pavel Emelyanov	b9a4a06381	range_tombstone_list: Do not expose internal collection Now all work with the list is described as API calls, it's finally possible to stop exposing the boost::set outside the class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-07 23:17:41 +03:00
Pavel Emelyanov	f19b85b61d	range_tombstone_list: Introduce and use pop-and-lock helper There's an optimization in flat_mutation_reader_from_mutations that folds the list from left-to-right in linear time. In case of currently used boost::set the .unlink_leftmost_without_rebalance helper is used, so wrap this exception with a method of the range_tombstone_list. This is the last place where caller need to mess with the exact internal collection. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-07 23:17:41 +03:00
Pavel Emelyanov	a89c7198c2	range_tombstone_list: Introduce and use pop_as<>() The method extracts an element from the list, constructs a desired object from it and frees. This is common usage of range_tombstone_list. Having a helper helps encapsulating the exact collection inside the class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-07 23:17:41 +03:00
Pavel Emelyanov	27912375b2	flat_mutation_reader: Use range_tombstone_list begin/end API The goal is to stop revealing the exact collection from the range_tombstone_list, so make use of existing begin/end methods and extend with rbegin() where needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-07 23:17:41 +03:00
Pavel Emelyanov	f19ade31ee	repair: Mark some partition_hasher methods noexcept The net patch will change the way range tombstones are fed into hasher. To make sure the codeflow doesn't become exception-unsafe, mark the relevant methods as nont-throwing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-07 23:17:41 +03:00
Pavel Emelyanov	5adb8e555c	hashers: Mark hash updates noexcept All those methods end up with library calls, whose code is not marked noexcept, but is such according to code itself or docs. The primary goal is to make some repair partition_hasher methods noexcept (next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-07 23:17:41 +03:00
Nadav Har'El	f76c519c1f	merge: Alternator streams - fix table description and sequence number Merged pull request https://github.com/scylladb/scylla/pull/7160 By Calle Wilund: Stream descriptions, as returned from create, update and describe stream was missing "latest" stream arn. Shard descriptions sequence number (for us timeuuid:s) were formatted wrongly. The spec states they should be numeric only. Both these defects break Kinesis operations. alternator: Set CDC delta to keys only for alternator streams alternator: Include stream spec in desc for create/update/describe alternator: Include LatestStreamLabel in resulting desc for create/update table alternator: Make "StreamLabel" an iso8601 timestamp alternator: Alloc BILLING_MODE in update_table cdc: Add setter for delta mode alternator: Fix sequence number range using wrong format alternator: Include stream arn in table description if enabled	2020-09-07 18:26:21 +03:00
Piotr Grabowski	ffd8c8c505	utf8: Print invalid UTF-8 character position Add new validate_with_error_position function which returns -1 if data is a valid UTF-8 string or otherwise a byte position of first invalid character. The position is added to exception messages of all UTF-8 parsing errors in Scylla. validate_with_error_position is done in two passes in order to preserve the same performance in common case when the string is valid.	2020-09-07 18:11:21 +03:00
Piotr Grabowski	462d12f555	db: Propagate enable_cache to system keyspaces Make enable_cache configuration option also affect caching of system keyspaces. Fixes #2909.	2020-09-07 17:54:46 +03:00
Calle Wilund	7224ae6d38	alternator: Set CDC delta to keys only for alternator streams Fixes #7190 Since we don't use any delta value when translating cdc -> streams it is wasteful to write these to the log table, esp. since we already write big fat pre- and post images.	2020-09-07 14:27:54 +00:00
Calle Wilund	f7bb0baba7	alternator: Include stream spec in desc for create/update/describe Fixes #7163 If enabled, the resulting table description should include a StreamDescription object with the appropriate members describing current stream settings.	2020-09-07 14:26:21 +00:00
Calle Wilund	e6266d5652	alternator: Include LatestStreamLabel in resulting desc for create/update table Fixes #7162 Same value as 'StreamLabel' in the currently active stream (cdc log) if enabled.	2020-09-07 14:24:48 +00:00
Calle Wilund	fa68493d64	alternator: Make "StreamLabel" an iso8601 timestamp Fixes #7164 See https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TableDescription.html StreamLabel: A timestamp, in ISO 8601 format, for this stream Scylla tables do not have a timestamp as such, but the UUID for a given schema is a timeuuid, so we can misuse this to fake a creation timestamp.	2020-09-07 14:24:00 +00:00
Calle Wilund	f16792aad0	alternator: Alloc BILLING_MODE in update_table While it does not do anything, we want something to update for testing (dynamo python libs refuse empty update).	2020-09-07 14:15:17 +00:00
Calle Wilund	d29d676955	cdc: Add setter for delta mode	2020-09-07 14:14:04 +00:00
Botond Dénes	c01af1d9d2	tests/boost/multishard_mutation_query_test: remove last BOOST_REQUIRE* macros Previous patches removed those `BOOST_REQUIRE` macros that could be invoked from shards other than 0. The reason is that said macros are not thread-safe, so calling them from multiple shards produces mangled output to stdout as well as the XML report file. It was assumed that only these invocations -- from a non-0 shard -- are problematic, but it turns out even these can race with seastar log messages emitted from other shards. This patch removes all such macros, replacing them with the thread safe `require` functions from `test/lib/test_utils.hh`. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200907125309.1199104-1-bdenes@scylladb.com>	2020-09-07 17:07:26 +03:00
Eliran Sinvani	342fc07bd6	Storage proxy: add a dedicated smp group for hints Hints and regular writes currently uses the same cross shard operation semaphore, which can lead to priority inversion, making cross shard writes wait for cross shard hints. This commit adds an smp_service_group for hints and adds it usage in the mutate_hint function.	2020-09-07 15:46:12 +03:00
Calle Wilund	a7d021ee57	alternator: Fix sequence number range using wrong format Fixes #7158 A streams shard descriptions has a sequence range describing start/end (if available) of the shard. This is specified as being "numeric only". Alternator incorrectly used UUID here, which breaks kinesis. v2: * Fix uint128_t parsing from string. bmp::number constructor accepted sstring, but did not interpret it as std::string/chars. Weird results.	2020-09-07 12:01:22 +00:00
Pekka Enberg	1ed9a336a5	Update tools/jmx submodule * tools/jmx 12ab6aa...8d92e54 (1): > Merge 'JMX footprint work' from Calle Fixes scylladb/scylla-jmx#133 Fixes scylladb/scylla-jmx#134	2020-09-07 13:56:47 +03:00
Benny Halevy	0c474b1c01	types: time_point_to_string: handle errors from boost::posix_time::to_iso_extended_string As seen in https://github.com/scylladb/scylla/issues/7175, `1e676cd845` that was merged in `bc77939ada` exposed a preexisting problem in time_point_to_string where it tried printing a timestamp that was in microseconds (taken from an api::timestamp_type instead of db_clock::time_point) and hit `boost::wrapexcept<boost::gregorian::bad_year> (Year is out of valid range: 1400..9999)` If hit, this patch with print the offending time_stamp in nanoseconds and the error message. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200907083303.33229-1-bhalevy@scylladb.com>	2020-09-07 11:36:43 +03:00
Calle Wilund	f5c79d15a8	alternator: Include stream arn in table description if enabled Fixes #7157 When creating/altering/describing a table, if streams are enabled, the "latest active" stream arn should be included as LatestStreamArn. Not doing so breaks java kinesis.	2020-09-07 08:16:11 +00:00
Pekka Enberg	e4266ead98	Improve build documentation This improves the build documentation beyond just packaging: - Explain how the configure.py step works - Explain how to build just executables and tests (for development) - Explain how to build for specific build mode if you didn't specify a build mode in configure.py step - Fix build artifact locations, missing .debs, and add executables and tests Message-Id: <20200904084443.495137-1-penberg@iki.fi>	2020-09-07 10:51:31 +03:00
Benny Halevy	66ce3a4c25	types: time_point_to_string: do not assume tp is in milliseconds T& tp may have other period than milliseconds. Cast the time_point duration to nanoseconds (or microseconds if boost doesn't supports it) so it is printed in the best possible resolution. Note that we presume that the time_point epoch is the Unix epoch of 1970-01-01, but the c++ standard doesn't guwarntee that. See https://github.com/scylladb/scylla/issues/5498 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200906171106.690872-1-bhalevy@scylladb.com>	2020-09-07 10:44:52 +03:00
Pekka Enberg	478e831d4f	Update tools/jmx submodule * tools/jmx d5d1efd...12ab6aa (1): > Merge "Fix JMX startup after offline installation" from Amos Fixes: scylladb/scylla#7098 Fixes: scylladb/scylla-jmx#129	2020-09-07 09:42:26 +03:00
Piotr Jastrzebski	4499a37eae	docs: Improve protocol-extensions documentation Documentation states that `SCYLLA_LWT_OPTIMIZATION_META_BIT_MASK` is a 32-bit integer that represents bit mask. What it fails to mention is that it's a unsigned value and in fact it takes value of 2147483648. This is problematic for clients in languages that don't have unsigned types (like Java). This patch improves the documentation to make it clear that `SCYLLA_LWT_OPTIMIZATION_META_BIT_MASK` is represented by unsigned value. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <7166b736461ae6f3d8ffdf5733e810a82aa02abc.1599382184.git.piotr@scylladb.com>	2020-09-06 13:35:12 +03:00
Dejan Mircevski	a127f5615b	cql3: Simplify pk test in statement_restrictions statement_restrictions::process_partition_key_restrictions() was checking has_unrestricted_components(), whereas just an empty() check suffices there, because has_unrestricted_components() is implicitly checked five lines down by needs_filtering(). The replacement check is cheaper and simpler to understand. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-09-06 12:39:36 +03:00
Dejan Mircevski	df3ea2443b	cql3: Drop all uses_function methods No one seems to call them except for other uses_function methods. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-09-04 17:27:30 +02:00
Piotr Grabowski	561753fe71	mutation: Improve log print of mutations Changes format of mutation, mutation_partition log messages to more human-readable. Fixes #826.	2020-09-04 16:33:25 +02:00
Tomasz Grabiec	bcdcf06ec7	Merge "lwt: for each statement in cas_request provide a row in CAS result set" from Pavel Solodovnikov Previously batch statement result set included rows for only those updates which have a prefetch data present (i.e. there was an "old" (pre-existing) row for a key). Also, these rows were sorted not in the order in which statements appear in the batch, but in the order of updated clustering keys. If we have a batch which updates a few non-existent keys, then it's impossible to figure out which update inserted a new key by looking at the query response. Not only because the responses may not correspond to the order of statements in the batch, but even some rows may not show up in the result set at all. Please see #7113 on Github for detailed description of the problem: https://github.com/scylladb/scylla/issues/7113 The patch set proposes the following fix: For conditional batch statements the result set now always includes a row for each LWT statement, in the same order in which individual statements appear in the batch. This way we can always tell which update did actually insert a new key or update the existing one. Technically, the following changes were made: * `update_parameters::prefetch_data::row::is_in_cas_result_set` member removed as well as the supporting code in `cas_request::applies_to` which iterated through cas updates and marked individual `prefetch_data` rows as "need to be in cas result set". * `cas_request::applies_to` substantially simplified since it doesn't do anything more than checking `stmt.applies_to()` in short-circuiting manner. * `modification_statement::build_cas_result_set` method moved to `cas_request`. This allows to easily iterate through individual `cas_row_update` instances and preserve the order of the rows in the result set. * A little helper `cas_request::find_old_row` is introduced to find a row in `prefetch_data` based on the (pk, ck) combination obtained from the current `cas_request` and a given `cas_row_update`. * A few tests for the issue #7113 are written, other lwt-batch-related tests adjusted accordingly.	2020-09-04 16:09:45 +02:00
Pavel Solodovnikov	92fd515186	lwt: for each statement in cas_request provide a row in CAS result set Previously batch statement result set included rows for only those updates which have a prefetch data present (i.e. there was an "old" (pre-existing) row for a key). Also, these rows were sorted not in the order in which statements appear in the batch, but in the order of updated clustering keys. If we have a batch which updates a few non-existent keys, then it's impossible to figure out which update inserted a new key by looking at the query response. Not only because the responses may not correspond to the order of statements in the batch, but even some rows may not show up in the result set at all. The patch proposes the following fix: For conditional batch statements the result set now always includes a row for each LWT statement, in the same order in which individual statements appear in the batch. This way we can always tell which update did actually insert a new key or update the existing one. `update_parameters::prefetch_data::row::is_in_cas_result_set` member variable was removed as well as supporting code in `cas_request::applies_to` which iterated through cas updates and marked individual `prefetch_data` rows as "need to be in cas result set". Instead now `cas_request::applies_to` is significantly simplified since it doesn't do anything more than checking `stmt.applies_to()` in short-circuiting manner. A few tests for the issue are written, other lwt-batch-related tests were adjusted accordingly to include rows in result set for each statement inside conditional batches. Tests: unit(dev, debug) Co-authored-by: Konstantin Osipov <kostja@scylladb.com> Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-09-04 13:13:26 +03:00
Pavel Solodovnikov	feaf2b6320	cas_request: move `modification_statement::build_cas_result_set` to `cas_request` This is just a plain move of the code from `modification_statement` to `cas_request` without changes in the logic, which will further help to refactor `build_cas_result_set` behavior to include a row for each LWT statement and order rows in the order of statements in a batch. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-09-04 12:25:06 +03:00
Pavel Solodovnikov	0f0ff73a58	cas_request: extract `find_old_row` helper function Factor out little helper function which finds a pre-existing row for a given `cas_row_update` (matching the primary key). Used in `cas_request::applies_to`. Will be used in a subsequent patch to move `modification_statement::build_cas_result_set` into `cas_request`. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-09-04 12:09:31 +03:00
Pavel Emelyanov	fabf849fcb	row_cache: Save one key compare on direct hit The partitions_type::lower_bound() method can return a hint that saves info about the "lower-ness of the bound", in particular when the search key is found, this can be guessed from the hint without comparison. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	ada174c932	row_cache: Kill incomplete_tag The incomplete entry is created in one place. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	240b966695	row_cache: Do not copy partition tombstone when creating cache entry The row_cache::find_or_create is only used to put (or touch) an entry in cache having the partition_start mutation at hands. Thus, theres no point in carrying key reference and tombstone value through the calls, just the partition_start reference is enough. Since the new cache entry is created incomplete, rename the creation method to reflect this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	84a6d439ad	test: Lookup an existing entry with its own helper The only caller of find_or_create() in tests works on already existing (.populate()-d) entry, so patch this place for explicity and for the sake of next patching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	3f33a71c0c	row_cache: Move missing entry creation into helper No functional changes, just move the code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	4662082748	populating reader: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	e680bdc59c	populating reader: Less allocator switching on population Now when the key for new partition is copied inside do_find_or_create_entry we may call this function without allocator set, as it sets the allocator inside. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	449f9e1218	populating reader: Do not copy decorated key too early When the missing partition is created in cache the decorated key is copied from the ring position view too early -- to do the lookup. However, the read context had been already entered the partition and already has the decorated key on board, so for lookup we can use the reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Pavel Emelyanov	5a29e17a5f	row_cache: Revive do_find_or_create_entry concepts Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-03 21:13:21 +03:00
Botond Dénes	e88b8a9a07	docs/debugging.md: document how TLS variables work We use a lot of TLS variables yet GDB is not of much help when working with these. So in this patch I document where they are located in memory, how to calculate the address of a known TLS variable and how to find (identify) one given an address. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200903131802.1068288-1-bdenes@scylladb.com>	2020-09-03 17:41:15 +02:00
Avi Kivity	bc77939ada	Update seastar submodule * seastar 7f7cf0f232...4ff91c4c3a (47): > core/reactor: complete_timers(): restore previous scheduling group Fixes #7117. > future: Drop spurious 'mutable' from a lambda > future: Don't put a std::tuple in future_state > future: Prepare for changing the future_state storage type > future: Add future_state::get0() > when_all: Replace another untuple with get0 > when_all: Use get0 instead of untuple > rpc: Don't assume that a future stores a std::tuple > future: Add a futurize_base > testing: Stop boost from installing its own signal handler > future: Define futurize::value_type with future::value_type > future: Move futurize down the file > Merge "Put logging onto {fmt} rails" from Pavel E > Merge "Make future_state non variadic" from Rafael > sharded: propagate sharded instances stop error > log: Fix misprint in docs > future: Use destroy_at in destructor > future: Add static_assert rejecting variadic future_state > futures_test: Drop variadic test > when_all: Drop a superfluous use of futurize::from_tuples > everywhere: Use future::get0 when appropriate > net: upgrade sockopt comments to doxygen > iostream: document the read_exactly() function > net:tls: fix clang-10 compilation duration cast > future: Move continuation_base_from_future to future.hh > repeat: Drop unnecessary make_tuple > shared_future: Don't use future_state_type > future: Add a static_assert against variadic futures > future: Delete warn_variadic_future > rpc_demo: Don't use a variadic future > rpc_test: Don't use a variadic future > futures_test: Don't use a variadic future > future: Move disable_failure_guard to promise::schedule > net: add an interface for custom socket options > posix: add one more setsockopt overload > Merge "Simplify pollfn and its inheritants" from Pavel E > util:std-compat.hh: add forward declaration of std::pmr for clang-10 > rpc: Add protocol::has_handlers() helper > Add a Seastar_DEBUG_ALLOCATIONS build option > futures_test: Add a test for futures of references > future: Simplify destruction of future_state > Use detect_stack_use_after_return=1 > repeat: Fix indentation > repeat: Delete try/catch > repeat: Simplify loop > Avoid call to std::exception_ptr's destructor from a .hh > file: Add missing include	2020-09-03 15:56:12 +03:00
Avi Kivity	64c7c81bac	Merge "Update log messages to {fmt} rules" from Pavel E " Before seastar is updated with the {fmt} engine under the logging hood, some changes are to be made in scylla to conform to {fmt} standards. Compilation and tests checked against both -- old (current) and new seastar-s. tests: unit(dev), manual " * 'br-logging-update' of https://github.com/xemul/scylla: code: Force formatting of pointer in .debug and .trace code: Format { and } as {fmt} needs streaming: Do not reveal raw pointer in info message mp_row_consumer: Provide hex-formatting wrapper for bytes_view heat_load_balance: Include fmt/ranges.h	2020-09-03 15:10:09 +03:00
Nadav Har'El	1d06da18fc	alternator test: test for the TRIM_HORIZON stream iterator This patch adds a test for the TRIM_HORIZON option of GetShardIterator in Alternator Streams. This option asks to fetch again all the available history in this shard stream. We had an implementation for it, but not a test - so this patch adds one. The test passes. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200830131458.381350-1-nyh@scylladb.com>	2020-09-02 18:37:06 +02:00
Nadav Har'El	3d4183863a	alternator test: add tests for sequence-number based iterators Alternator Streams already support the AT_SEQUENCE_NUMBER and AFTER_SEQUENCE_NUMBER options for iterators. These options allow to replay a stream of changes from a known position or after that known position. However, we never had a test verifying that these features actually work as intended, beyond just checking syntax. Having such tests is important because recently we changed the implementation of these iterators, but didn't have a test verifying that they still work. So in this patch we add such tests. The tests pass (as usual, on both Alternator and DynamoDB). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200830115817.380075-1-nyh@scylladb.com>	2020-09-02 18:36:59 +02:00
Nadav Har'El	c879b23b82	alternator test: add another passing Alternator Streams test We had a test, test_streams_last_result, that verifies that after reading from an Alternator Stream the last event, reading again will find nothing. But we didn't actually have a test which checks that if at that point a new event does arrive, we can read it. This test checks this case, and it passes (we don't have a bug there, but it's good as a regression test for NextShardIterator). This test also verifies that after reading an event for a particular key on a a specific stream "shard", the next event for the same key will arrive on the same shard. This test passes on both Alternator and DynamoDB. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200830105744.378790-1-nyh@scylladb.com>	2020-09-02 18:36:50 +02:00
Raphael S. Carvalho	adf576f769	compaction_manager: export method that returns if table has ongoing compaction A compaction strategy, that supports parallel compaction, may want to know if the table has compaction running on its behalf before making a decision. For example, a size-tiered-like strategy may not want to trigger a behavior, like cross-tier compaction, when there's ongoing compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200901134306.23961-1-raphaelsc@scylladb.com>	2020-09-02 16:46:49 +03:00
Botond Dénes	90042746bf	scylla-gdb.py: scylla_sstables::filename(): add md format support Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200902090929.879377-1-bdenes@scylladb.com>	2020-09-02 16:12:17 +03:00
Dejan Mircevski	0c73ac107d	cql3: Drop get_partition_key_unrestricted_components Not used anywhere. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-09-02 08:14:54 +03:00
Botond Dénes	b3f00685ec	scylla-gdb.py: scylla memory: better summary of semaphore memory usage If available, use the recently added `reader_concurrency_semaphore::_initial_resources` to calculate the amount of memory used out of the initially configured amount. If not available, the summary falls back to the previous mode of just printing the remaining amount of memory. Example: Replica: Read Concurrency Semaphores: user sstable reads: 11/100, 263621214/ 42949672 B, queued: 847 streaming sstable reads: 0/ 10, 0/ 42949672 B, queued: 0 system sstable reads: 1/ 10, 251584/ 42949672 B, queued: 0 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200901091452.806419-1-bdenes@scylladb.com>	2020-09-01 16:26:57 +02:00
Nadav Har'El	52f92b886b	alternator streams: fix bug returning the same change again This patch fixes a bug which caused sporadic failures of the Alternator test - test_streams.py::test_streams_last_result. The GetRecords operation reads from an Alternator Streams shard and then returns an "iterator" from where to continue reading next time. Because we obviously don't want to read the same change again, we "incremented" the current position, to start at the incremented position on the next read. Unfortunately, the implementation of the increment() function wasn't quite right. The position in the CDC log is a timeuuid, which has a really bizarre comparison function (see compare_visitor in types.cc). In particular the least-sigificant bytes of the UUID are compared as signed bytes. This means that if the last byte of the UUID was 127, and increment() increased it to 128, and this was wrong because the comparison function later deemed that as a signed byte, where 128 is lower than 127, not higher! The result was that with 1/256 probability (whenever the last byte of the position was 127) we would return an item twice. This was reproduced (with 1/256 probability) by the test test_streams_last_result, as reported in issue #7004. The fix in this patch is to drop the increment() and replace it by a flag whether an iterator is inclusive of the threshold (>=) or exclusive (>). The internal representation of the iterator has a boolean flag "inclusive", and the string representation uses the prefixes "I" or "i" to indicate an inclusive or exclusive range, respectively - whereas before this patch we always used the prefix "I". Although increment() could have been fixed to work correctly, the result would have been ugly because of the weirdness of the timeuuid comparison function. increment() would also require extensive new unit-tests: we were lucky that the high-level functional tests caught a 1 in 256 error, but they would not have caught rarer errors (e.g., 1 in 2^32). Furthermore, I am looking at Alternator as the first "user" of CDC, and seeing how complicated and error-prone increment() is, we should not recommend to users to use this technique - they should use exclusive (>) range queries instead. Fixes #7004. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200901102718.435227-1-nyh@scylladb.com>	2020-09-01 12:28:39 +02:00
Avi Kivity	37352a73b8	Update tools/python3 submodule * tools/python3 f89ade5...19a9cd3 (1): > dist: redhat: reduce log spam from unpacking sources when building rpm	2020-09-01 12:36:24 +03:00
Pavel Emelyanov	86897aa040	partition_version: Remove dead code The rows_iterator is no longer in use since `70c72773` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200831191208.18418-1-xemul@scylladb.com>	2020-09-01 10:19:47 +03:00
Raphael S. Carvalho	7f7f366cb5	compaction: add debug msg to inform the amount of expired ssts skipped by compaction this information is useful when debugging compaction issues that involve fully expired ssts. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200828140401.96440-1-raphaelsc@scylladb.com>	2020-08-31 17:18:47 +03:00
Amos Kong	5785947e28	unified/install.sh: set default python3/sysconfdir smartly Users can set python3 and sysconfdir from cmdline of install.sh according to the install mode (root or nonroot) and distro type. It's helpful to correct the default python3/sysconfdir, otherwise setup scripts or scylla-jmx doesn't work. Fixes #7130 Signed-off-by: Amos Kong <amos@scylladb.com>	2020-08-31 15:54:51 +03:00
Amos Kong	83d7454787	install.sh: clean tmp scylla.yaml after installation If the install.sh is executed by sudo, then the tmp scylla.yaml is owned by root. It's difficult to overwrite it by non-privileged users. Signed-off-by: Amos Kong <amos@scylladb.com>	2020-08-31 15:54:51 +03:00
Kamil Braun	ff78a3c332	cdc: rename CDC description tables... again Commit `a6ad70d3da` changed the format of stream IDs: the lower 8 bytes were previously generated randomly, now some of them have semantics. In particular, the least significant byte contains a version (stream IDs might evolve with further releases). This is a backward-incompatible change: the code won't properly handle stream IDs with all lower 8 bytes generated randomly. To protect us from subtle bugs, the code has an assertion that checks the stream ID's version. This means that if an experimental user used CDC before the change and then upgraded, they might hit the assertion when a node attempts to retrieve a CDC generation with old stream IDs from the CDC description tables and then decode it. In effect, the user won't even be able to start a node. Similarly as with the case described in `d89b7a0548`, the simplest fix is to rename the tables. This fix must get merged in before CDC goes out of experimental. Now, if the user upgrades their cluster from a pre-rename version, the node will simply complain that it can't obtain the CDC generation instead of preventing the cluster from working. The user will be able to use CDC after running checkAndRepairCDCStreams. Since a new table is added to the system_distributed keyspace, the cluster's schema has changed, so sstables and digests need to be regenerated for schema_digest_test.	2020-08-31 11:33:14 +03:00
Nadav Har'El	f3dfb6e011	merge: cdc: Remove post-filterings for keys-only/off cdc delta generation Merged pull request https://github.com/scylladb/scylla/pull/7121 By Calle Wilund: Refs #7095 Fixes #7128 CDC delta!=full both relied on post-filtering to remove generated log row and/or cells. This is inefficient. Instead, simply check if the data should be created in the visitors. Also removed delta_mode=off mode. cdc: Remove post-filterings for keys-only/off cdc delta generation cdc: Remove cdc delta_mode::off	2020-08-31 11:22:09 +03:00
Calle Wilund	70a282ced2	cdc: Remove post-filterings for keys-only/off cdc delta generation Refs #7095 CDC delta!=full both relied on post-filtering to remove generated log row and/or cells. This is inefficient. Instead, simply check if the data should be created in the visitors. v2: * Fixed delta logs rows created (empty) even when delta == off v3: * Killed delta == off v4: * Move checks into (const) member var(s)	2020-08-31 07:59:43 +00:00
Calle Wilund	78236c015a	cdc: Remove cdc delta_mode::off Fixes #7128 CDC logs are not useful without at least delta_mode==keys, since pre/post image data has no info on _what_ was actually done to base table in source mutation.	2020-08-31 07:59:40 +00:00
Asias He	8b4530a643	repair: Add progress metrics for removenode ops The following metric is added: scylla_node_maintenance_operations_removenode_finished_percentage{shard="0",type="gauge"} 0.650000 It is the number of finished percentage for removenode operation so far. Fixes #1244, #6733	2020-08-31 14:43:39 +08:00
Asias He	25e03233f1	repair: Add progress metrics for decommission ops The following metric is added: scylla_node_maintenance_operations_decommission_finished_percentage{shard="0",type="gauge"} 0.650000 It is the number of finished percentage for decommission operation so far. Fixes #1244, #6733	2020-08-31 14:43:39 +08:00
Asias He	80cb157669	repair: Add progress metrics for replace ops The following metric is added: scylla_node_maintenance_operations_replace_finished_percentage{shard="0",type="gauge"} 0.650000 It is the number of finished percentage for replace operation so far. Fixes #1244, #6733	2020-08-31 14:03:05 +08:00
Etienne Adam	19683d04c6	redis: add hget and hset commands hget and hset commands using hashes internally, thus they are not using the existing write_strings() function. Limitations: - hset only supports 3 params, instead of multiple field/value list that is available in official redis-server. - hset should return 0 when the key and field already exists, but I am not sure it's possible to retrieve this information without doing read-before-write, which would not be atomic. I factorized a bit the query_* functions to reduce duplication, but I am not 100% sure of the naming, it may still be a bit confusing between the schema used (strings, hashes) and the returned format (currently only string but array should come later with hgetall). Signed-off-by: Etienne Adam <etienne.adam@gmail.com> Message-Id: <20200830190128.18534-1-etienne.adam@gmail.com>	2020-08-30 22:05:41 +03:00
Takuya ASADA	f1255cb2d0	unified: add uninstall.sh Provide an uninstaller for offline & nonroot installation. Fixes #7076	2020-08-29 20:55:06 +03:00
Botond Dénes	f063dc22af	scylla-gdb: add scylla compaction-tasks command Summarize the compaction_manager::task instances. Useful for detecting compaction related problems. Example: (gdb) scylla compaction-task 2116 type=sstables::compaction_type::Compaction, running=false, "cdc_test"."test_table_postimage_scylla_cdc_log" 769 type=sstables::compaction_type::Compaction, running=false, "cdc_test"."test_table_scylla_cdc_log" 750 type=sstables::compaction_type::Compaction, running=false, "cdc_test"."test_table_preimage_postimage_scylla_cdc_log" 731 type=sstables::compaction_type::Compaction, running=false, "cdc_test"."test_table_preimage_scylla_cdc_log" 293 type=sstables::compaction_type::Compaction, running=false, "cdc_test"."test_table" 286 type=sstables::compaction_type::Compaction, running=false, "cdc_test"."test_table_preimage" 230 type=sstables::compaction_type::Compaction, running=false, "cdc_test"."test_table_postimage" 58 type=sstables::compaction_type::Compaction, running=false, "cdc_test"."test_table_preimage_postimage" 4 type=sstables::compaction_type::Compaction, running=true , "cdc_test"."test_table_postimage_scylla_cdc_log" 2 type=sstables::compaction_type::Compaction, running=true , "cdc_test"."test_table" 2 type=sstables::compaction_type::Compaction, running=true , "cdc_test"."test_table_preimage_postimage_scylla_cdc_log" 2 type=sstables::compaction_type::Compaction, running=true , "cdc_test"."test_table_preimage" 1 type=sstables::compaction_type::Compaction, running=true , "cdc_test"."test_table_preimage_postimage" 1 type=sstables::compaction_type::Compaction, running=true , "cdc_test"."test_table_scylla_cdc_log" 1 type=sstables::compaction_type::Compaction, running=true , "cdc_test"."test_table_preimage_scylla_cdc_log" Total: 5246 instances of compaction_manager::task Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200828135030.689188-1-bdenes@scylladb.com>	2020-08-28 16:00:14 +02:00
Botond Dénes	727e9be342	scylla-gdb.py: scylla sstables: add --histogram option Allowing to print a summary of per-table sstables. Example: (gdb) scylla sstables --histogram 2103 "cdc_test"."test_table_postimage_scylla_cdc_log" 751 "cdc_test"."test_table_preimage_postimage_scylla_cdc_log" 734 "cdc_test"."test_table_preimage_scylla_cdc_log" 723 "cdc_test"."test_table_scylla_cdc_log" 285 "cdc_test"."test_table" 164 "cdc_test"."test_table_postimage" 150 "cdc_test"."test_table_preimage" 55 "cdc_test"."test_table_preimage_postimage" 1 "system"."clients" 1 "system"."compaction_history" 1 "system_auth"."roles" 1 "system"."peers" total (shard-local): count=4969, data_file=171953448, in_memory=19195136 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200828091848.673398-1-bdenes@scylladb.com>	2020-08-28 11:36:37 +02:00
Pavel Solodovnikov	88ba184247	paxos: use schema_registry when applying accepted proposal if there is schema mismatch Try to look up and use schema from the local schema_registry in case when we have a schema mismatch between the most recent schema version and the one that is stored inside the `frozen_mutation` for the accepted proposal. When such situation happens the stored `frozen_mutation` is able to be applied only if we are lucky enough and column_mapping in the mutation is "compatible" with the new table schema. It wouldn't work if, for example, the columns are reordered, or some columns, which are referenced by an LWT query, are dropped. With the patch we are able to mitigate these cases as long as the referenced schema is still present in the node cache (e.g. it didn't restart/crash or the cache entry is not too old to be evicted). Tests: unit(dev, debug), dtest(paxos_tests.schema_mismatch_*_test) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200827150844.624017-1-pa.solodovnikov@scylladb.com>	2020-08-27 19:04:09 +02:00
Amnon Heiman	68b3ed1c9a	storage_service.cc: get_natural_endpoints should translate key The get_natural_endpoints returns the list of nodes holding a key. There is a variation of the method that gets the key as string, the current implementation just cast the string to bytes_view, which will not work. Instead, this patch changes the implementation to use from_nodetool_style_string to translate the key (in a nodetool like format) to a token. Fixes #7134 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-08-27 18:25:15 +03:00
Rafael Ávila de Espíndola	d18af34205	everywhere: Use future::get0 when appropriate This works with current seastar and clears most of the way for updating to a version that doesn't use std::tuple in futures. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200826231947.1145890-1-espindola@scylladb.com>	2020-08-27 15:05:51 +03:00
Nadav Har'El	1da3af5420	alternator test: enable a passing test After issue #7107 was fixed (regarding the correctness of OldImage and NewImage in Alternator Streams) we forgot to remove the "xfail" tag from one of the tests for this issue. This test now passes, as expected, so in this patch we remove the xfail tag. Refs #7107 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200827103054.186555-1-nyh@scylladb.com>	2020-08-27 14:15:32 +03:00
Nadav Har'El	0faf91f254	docs: fix typo in alternator/getting-started.md alternator/getting-started.md had a missing grave accent (`) character, resulting in messed up rendering of the involved paragraph. Add the missing quote. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200827110920.187328-1-nyh@scylladb.com>	2020-08-27 14:11:00 +03:00
Piotr Sarna	ca9422ca73	Merge 'Fix view_builder lockup and crash on shutdown' from Pavel The lockup: When view_builder starts all shards at some point get to a barrier waiting for each other to pass. If any shard misses this checkpoint, all others stuck forever. As this barrier lives inside the _started future, which in turn is waited on stop, the stop stucks as well. Reasons to miss the barrier -- exception in the middle of the fun^w start or explicit abort request while waiting for the schema agreement. Fix the "exception" case by unlocking the barrier promise with exception and fix the "abort request" case by turning it into an exception. The bug can be reproduced by hands if making one shard never see the schema agreement and continue looping until the abort request. The crash: If the background start up fails, then the _started future is resolved into exception. The view_builder::stop then turns this future into a real exception caught-and-rethrown by main.cc. This seems wrong that a failure in a background fiber aborts the regular shutdown that may proceed otherwise. tests: unit(dev), manual start-stop branch: https://github.com/xemul/scylla/tree/br-view-builder-shutdown-fix-3 fixes: #7077 Patch #5 leaves the seastar::async() in the 1-st phase of the start() although can also be tuned not to produce a thread. However, there's one more (painless) issue with the _sem usage, so this change appears too large for the part of the bug-fix and will come as a followup. * 'br-view-builder-shutdown-fix-3' of git://github.com/xemul/scylla: view_builder: Add comment about builder instances life-times view_builder: Do sleep abortable view_builder: Wakeup barrier on exception view_builder: Always resolve started future to success view_builder: Re-futurize start view_builder: Split calculate_shard_build_step into two view_builder: Populate the view_builder_init_state view_builder: Fix indentation after previous patch view_builder: Introduce view_builder_init_state	2020-08-27 11:51:46 +02:00
Nadav Har'El	95afadfe21	merge: alternator_streams: Include keys in OldImage/NewImage Merged pull request https://github.com/scylladb/scylla/pull/7063 By Calle Wilund: Fixes #6935 DynamoDB streams for some reason duplicate the record keys into both the "Keys" and "OldImage"/"NewImage" sub-objects when doing GetRecords. But only if there is other data to include. This patch appends the pk/ck parts into old/new image iff we had any record data. Updated to handle keys-only updates, and distinguish creating vs. updating rows. Changes cdc to not generate preimage for non-existent/deleted rows, and also fixes missing operations/ttls in keys-only delta mode. alternator_streams: Include keys in OldImage/NewImage cdc: Do not generate pre/post image for non-existent rows	2020-08-27 11:23:35 +03:00
Pekka Enberg	0f1b54fa6e	Update tools/java submodule * tools/java d6c0ad1e2e...2d49ded77b (1): > sstableloader: remove wrong check that breaks range tombstones	2020-08-27 09:05:34 +03:00
Calle Wilund	678ecc7469	alternator_streams: Include keys in OldImage/NewImage Fixes #6935 Fixes #7107 DynamoDB streams for some reason duplicate the record keys into both the "Keys" and "OldImage"/"NewImage" sub-objects when doing GetRecords. This patch appends the pk/ck parts into old/new image, and also removes the previous restrictions on image generation since cdc now generates more consistent pre/post image data.	2020-08-26 18:14:09 +00:00
Calle Wilund	e50911e5b0	cdc: Do not generate pre/post image for non-existent rows Fixes #7119 Fixes #7120 If preimage select came up empty - i.e. the row did not exist, either due to never been created, or once delete, we should not bother creating a log preimage row for it. Esp. since it makes it harder to interpret the cdc log. If an operation in a cdc batch did a row delete (ranged, ck, etc), do not generate postimage data, since the row does no longer exist. Note that we differentiate deleting all (non-pk/ck) columns from actual row delete.	2020-08-26 18:14:09 +00:00
Pavel Emelyanov	812eed27fe	code: Force formatting of pointer in .debug and .trace ... and tests. Printin a pointer in logs is considered to be a bad practice, so the proposal is to keep this explicit (with fmt::ptr) and allow it for .debug and .trace cases. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 20:44:11 +03:00
Pavel Emelyanov	366b4e8a8f	code: Format { and } as {fmt} needs There are two places that want to print "{<text>}" strings, but do not format the curly braces the {fmt}-way. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 20:44:11 +03:00
Pavel Emelyanov	78f2193956	streaming: Do not reveal raw pointer in info message Showing raw pointer values in logs is not considered to be good practice. However, for debugging/tracing this might be helpful. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 20:44:11 +03:00
Pavel Emelyanov	50e3a30dae	mp_row_consumer: Provide hex-formatting wrapper for bytes_view By default {fmt} doesn't know how to format this type (although it's a basic_string_view instantiated), and even providing formatter/operator<< does not help -- it anyway hits an earlier assertion in args mapper about the disallowance of character types mixing. The hex-wrapper with own operator<< solves the problem. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 20:44:11 +03:00
Pavel Emelyanov	fe33e3ed78	heat_load_balance: Include fmt/ranges.h To provide vector<> formatter for {fmt} Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 20:44:08 +03:00
Avi Kivity	3daa49f098	Merge "materialized views: Fix undefined behavior on base table schema changes" from Tomasz " The view_info object, which is attached to the schema object of the view, contains a data structure called "base_non_pk_columns_in_view_pk". This data structure contains column ids of the base table so is valid only for a particular version of the base table schema. This data structure is used by materialized view code to interpret mutations of the base table, those coming from base table writes, or reads of the base table done as part of view updates or view building. The base table schema version of that data structure must match the schema version of the mutation fragments, otherwise we hit undefined behavior. This may include aborts, exceptions, segfaults, or data corruption (e.g. writes landing in the wrong column in the view). Before this patch, we could get schema version mismatch here after the base table was altered. That's because the view schema did not change when the base table was altered. Another problem was that view building was using the current table's schema to interpret the fragments and invoke view building. That's incorrect for two reasons. First, fragments generated by a reader must be accessed only using the reader's schema. Second, base_non_pk_columns_in_view_pk of the recorded view ptrs may not longer match the current base table schema, which is used to generate the view updates. Part of the fix is to extract base_non_pk_columns_in_view_pk into a third entity called base_dependent_view_info, which changes both on base table schema changes and view schema changes. It is managed by a shared pointer so that we can take immutable snapshots of it, just like with schema_ptr. When starting the view update, the base table schema_ptr and the corresponding base_dependent_view_info have to match. So we must obtain them atomically, and base_dependent_view_info cannot change during update. Also, whenever the base table schema changes, we must update base_dependent_view_infos of all attached views (atomically) so that it matches the base table schema. Fixes #7061. Tests: - unit (dev) - [v1] manual (reproduced using scylla binary and cqlsh) " * tag 'mv-schema-mismatch-fix-v2' of github.com:tgrabiec/scylla: db: view: Refactor view_info::initialize_base_dependent_fields() tests: mv: Test dropping columns from base table db: view: Fix incorrect schema access during view building after base table schema changes schema: Call on_internal_error() when out of range id is passed to column_at() db: views: Fix undefined behavior on base table schema changes db: views: Introduce has_base_non_pk_columns_in_view_pk()	2020-08-26 17:37:52 +03:00
Pavel Emelyanov	cf1cb4d145	view_builder: Add comment about builder instances life-times The barrier passing is tricky and deserves a description about objects' life-times. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 15:56:38 +03:00
Pavel Emelyanov	643c431ce4	view_builder: Do sleep abortable If one shard delays in seeing the schema agreement and returns on abort request, other shards may get stuck waiting for it on the status read barrier. Luckily with the previous patch the barrier is exception-proof, so we may abort the waiting loop with exception and handle the lock-up. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 15:56:38 +03:00
Pavel Emelyanov	c36bbc37c9	view_builder: Wakeup barrier on exception If an exception pops up during the view_builder::start while some shards wait for the status-read barrier, these shards are not woken up, thus causing the shutdown to stuck. Fix this by setting exception on the barrier promise, resolving all pending and on-going futures. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 15:56:38 +03:00
Pavel Emelyanov	8f8ed625ab	view_builder: Always resolve started future to success If the view builder background start fails, the _started future resolves to exceptional state. In turn, stopping the view builder keeps this state through .finally() and aborts the shutdown very early, while it may and should proceed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 15:56:38 +03:00
Pavel Emelyanov	60e21bb59a	view_builder: Re-futurize start Step two turning the view_builder::start() into a chain of lambdas -- rewrite (most of) the seastar::async()'s lambda into a more "classical" form. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 15:56:38 +03:00
Pavel Emelyanov	77c7d94f85	view_builder: Split calculate_shard_build_step into two The calculate_shard_build_step() has a cross-shard barrier in the middle and passing the barrier is broken wrt exceptions that may happen before it. The intention is to prepare this barrier passing for exception handling by turning the view_builder::start() into a dedicated continuation lambda. Step one in this campaign -- split the calculate_shard_build_step() into steps called by view_builder::start(): - before the barrier - barrier - after the barrier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 15:56:38 +03:00
Pavel Emelyanov	fe0326b75b	view_builder: Populate the view_builder_init_state Keep the internal calculate_shard_build_step()'s stuff on the init helper struct, as the method in question is about to be split into a chain of continuation lambdas. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 15:56:35 +03:00
Pavel Emelyanov	2d2d04c6b7	view_builder: Fix indentation after previous patch No functional changes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 15:46:36 +03:00
Pavel Emelyanov	d0393d92a2	view_builder: Introduce view_builder_init_state This is the helper initialization struct that will carry the needed objects accross continuation lambdas. The indentation in ::start() will be fixed in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 15:45:15 +03:00
Avi Kivity	7416b3c34b	Merge 'scylla-gdb.py: Add scylla repairs command' from Asias " This series adds scylla repairs command to help debug repair. Fixes #7103 " * asias-repair_help_debug_scylla_repairs_cmd: scylla-gdb.py: Add scylla repairs command repair: Add repair_state to track repair states scylla-gdb.py: Print the pointers of elements in boost_intrusive_list_printer scylla-gdb.py: Add printer for gms::inet_address scylla-gdb.py: Fix a typo in boost_intrusive_list repair: Fix the incorrect comments for _all_nodes repair: Add row_level_repair object pointer in repair_meta repair: Add counter for reads issued and finished for repair_reader	2020-08-26 13:57:31 +03:00
Avi Kivity	2b308a973f	Merge 'Move temporaries to value view' from Piotr S " Issue https://github.com/scylladb/scylla/issues/7019 describes a problem of an ever-growing map of temporary values stored in query_options. In order to mitigate this kind of problems, the storage for temporary values is moved from an external data structure to the value views itself. This way, the temporary lives only as long as it's accessible and is automatically destroyed once a request finishes. The downside is that each temporary is now allocated separately, while previously they were bundled in a single byte stream. Tests: unit(dev) Fixes https://github.com/scylladb/scylla/issues/7019 " * psarna-move_temporaries_to_value_view: cql3: remove query_options::linearize and _temporaries cql3: remove make_temporary helper function cql3: store temporaries in-place instead of in query_options cql3: add temporary_value to value view cql3: allow moving data out of raw_value cql3: split values.hh into a .cc file	2020-08-26 13:19:17 +03:00
Benny Halevy	f5ffd5fc5f	sstables: Fix reactor stall in sstables::seal_summary() With relatively big summaries, reactor can be stalled for a couple of milliseconds. This patch: a. allocates positions upfront to avoid excessive reallocation. b. returns a future from seal_summary() and uses `seastar::do_for_each` to iterate over the summary entries so the loop can yield if necessary. Fixes #7108. Based on 2470aad5a389dfd32621737d2c17c7e319437692 by Raphael S. Carvalho <raphaelsc@scylladb.com> Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200826091337.28530-1-bhalevy@scylladb.com>	2020-08-26 12:18:05 +03:00
Botond Dénes	6ee36eeeb2	scylla-gdb.py: scylla memory: update w.r.t. moved per-shed group data Per sheduling-group data was moved from the task queues to a separate data member in the reactor itself. Update `scylla memory` to use the new location to get the per sheduling group data for the storage proxy stats. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200825140256.309299-1-bdenes@scylladb.com>	2020-08-26 11:17:08 +02:00
Avi Kivity	6ff12b7f79	repair: apply_rows_on_follower(): remove copy of repair_rows list We copy a list, which was reported to generate a 15ms stall. This is easily fixed by moving it instead, which is safe since this is the last use of the variable. Fixes #7115.	2020-08-26 11:52:39 +03:00
Benny Halevy	78a44dda57	sstables: avoid double close in file_writer destructor If file_writer::close() fails to close the output stream closing will be retried in file_writer::~file_writer, leading to: ``` include/seastar/core/future.hh:1892: seastar::future<T ...> seastar::promise<T>::get_future() [with T = {}]: Assertion `!this->_future && this->_state && !this->_task' failed. ``` as seen in https://github.com/scylladb/scylla/issues/7085 Fixes #7085 Test: unit(dev), database_test with injected error in posix_file_impl::close() Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200826062456.661708-1-bhalevy@scylladb.com>	2020-08-26 11:33:23 +03:00
Nadav Har'El	d4b452002a	alternator test: tests for NewImage feature of Alternator Streams This patch adds tests for the "NewImage" attribute in Alternator Streams in NEW_IMAGE and NEW_AND_OLD_IMAGES mode. It reproduces issue #7107, that items' key attributes are missing in the NewImage. It also verifies the risky corner cases where the new item is "empty" and NewImage should include just the key, vs. the case where the item is deleted, so NewImage should be missing. This test currently passes on AWS DynamoDB, and xfails on Alternator. Refs #7107. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200825113857.106489-1-nyh@scylladb.com>	2020-08-26 11:33:23 +03:00
Nadav Har'El	868194cd17	redis: fix another use-after-free crash in "exists" command Never trust Occam's Razor - it turns out that the use-after-free bug in the "exists" command was caused by two separate bugs. We fixed one in commit `9636a33993`, but there is a second one fixed in this patch. The problem fixed here was that a "service_permit" object, which is designed to be copied around from place to place (it contains a shared pointer, so is cheap to copy), was saved by reference, and the reference was to a function argument and was destroyed prematurely. This time I tested many times that that test_strings.py passes on both dev and debug builds. Note that test/run/redis still fails in a debug build, but due to a different problem. Fixes #6469 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200825183313.120331-1-nyh@scylladb.com>	2020-08-26 11:33:23 +03:00
Nadav Har'El	8e06734893	redis test: add default host and port test/redis/README.md suggests that when running "pytest" the default is to connect to a local redis on localhost:6379. This default was recently lost when options were added to use a different host and port. It's still good to have the default suggested in README.md. It also makes it easier to run the tests against the standard redis, which by default runs on localhost:6379 - by just running "pytest". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200825195143.124429-1-nyh@scylladb.com>	2020-08-26 11:33:23 +03:00
Rafael Ávila de Espíndola	0f9ad5151c	auth: Inline standard_role_manager_name into only use This is just a leftover cleanup I found in my git repo while rebasing. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200819184911.60687-1-espindola@scylladb.com>	2020-08-26 11:33:23 +03:00
Rafael Ávila de Espíndola	5fcfbd76a9	sstables: Delete duplicated code For some reason date_tiered_compaction_strategy had its own identical copy of get_value. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200819211509.106594-1-espindola@scylladb.com>	2020-08-26 11:33:23 +03:00
Piotr Sarna	7055297649	cql3: remove query_options::linearize and _temporaries query_options::linearize was the only user of _temporaries helper attribute, and it turns out that this function is never used - - and is therefore removed.	2020-08-26 09:45:49 +02:00
Piotr Sarna	c0a7eda2a8	cql3: remove make_temporary helper function Since temporary values will no longer be stored inside query options, the helper function is removed altogether.	2020-08-26 09:45:49 +02:00
Piotr Sarna	70b09dcdf1	cql3: store temporaries in-place instead of in query_options As a first step towards removing _temporaries from query options altogether, all usages of query_options::make_temporary are removed.	2020-08-26 09:45:49 +02:00
Piotr Sarna	ddd36de0ff	cql3: add temporary_value to value view When a value_view needs to store a temporarily instantiated object, it can use the new variant field. The temporary value will live only as long as the view itself.	2020-08-26 09:45:48 +02:00
Piotr Sarna	94a258d06c	cql3: allow moving data out of raw_value in order to be able to elide copying when transferring data from raw_value.	2020-08-26 09:35:53 +02:00
Piotr Sarna	a4b07955c5	cql3: split values.hh into a .cc file Some bigger functions are moved out-of-line. The .cc file is going to be needed in next patches, which allow creating a temporary value for a view.	2020-08-26 09:29:07 +02:00
Asias He	f30895ad22	scylla-gdb.py: Add scylla repairs command This command lists all the active repair_meta objects for both repair master and repair follower. For example: (gdb) scylla repairs (repair_meta) for masters: addr = 0x600005abf830, table = myks2.standard1, ip = 127.0.0.1, states = ['127.0.0.1->repair_state::get_sync_boundary_started', '127.0.0.3->repair_state::get_sync_boundary_finished'], repair_meta = { db = @0x7fffe538c9f0, _messaging = @0x7fffe538ca90, _cf = @0x6000066f0000, .... (repair_meta) for masters: addr = 0x60000521f830, table = myks2.standard1, ip = 127.0.0.1, states = ['127.0.0.1->repair_state::get_sync_boundary_started', '127.0.0.2->repair_state::get_sync_boundary_started'], repair_meta = { _db = @0x7fffe538c9f0, _messaging = @0x7fffe538ca90, _cf = @0x6000066f0000, .... (repair_meta*) for follower: addr = 0x60000432a808, table = myks2.standard1, ip = 127.0.0.1, states = ['127.0.0.1->repair_state::get_sync_boundary_started', '127.0.0.2->repair_state::unknown'], repair_meta = { db = @0x7fffe538c9f0, messaging = @0x7fffe538ca90, _cf = @0x6000066f0000, Fixes #7103	2020-08-26 11:19:25 +08:00
Asias He	ab57cea783	repair: Add repair_state to track repair states Use repair_state to track the major state of repair from the beginning to the end of repair. With this patch, we can easily know at which state both the repair master and followers are. It is very helpful when debugging a repair hang issue. Refs #7103	2020-08-26 11:19:25 +08:00
Asias He	77c2e69e22	scylla-gdb.py: Print the pointers of elements in boost_intrusive_list_printer Sometimes it is helpful to print the pointers of the object in the list. For example: (gdb) p debug::repair_meta_for_masters._repair_metas $1 = boost::intrusive::list of size 3 = [0x6000051df830, 0x60000221f830, 0x60000473f830] = [@0x6000051df830={ _db = @0x7fffe538c9f0, _messaging = @0x7fffe538ca90, _cf = @0x6000066f0000, _schema = { _p = 0x600006568700 }, _range = { ... (gdb) p debug::repair_meta_for_followers._repair_metas $2 = boost::intrusive::list of size 3 = [0x60000081a808, 0x60000432b008, 0x60000432a808] = [@0x60000081a808={ _db = @0x7fffe538c9f0, _messaging = @0x7fffe538ca90, _cf = @0x6000066f0000, _schema = { _p = 0x600006568700 }, ... Refs #7103	2020-08-26 11:14:17 +08:00
Asias He	2b65f80271	scylla-gdb.py: Add printer for gms::inet_address We need this to print the address of the peer nodes in repair. Refs #7103	2020-08-26 10:12:07 +08:00
Asias He	0433f1060f	scylla-gdb.py: Fix a typo in boost_intrusive_list It is boost_intrusive_list not b0ost_intrusive_list. Refs #7103	2020-08-26 10:12:07 +08:00
Asias He	9ee86bb5a0	repair: Fix the incorrect comments for _all_nodes The _all_nodes field contains both the peer nodes and the node itself. Refs #7103	2020-08-26 10:12:07 +08:00
Asias He	656ff93d49	repair: Add row_level_repair object pointer in repair_meta It is helpful to track back the row_level_repair object for repair master when debugging. Refs #7103	2020-08-26 10:12:07 +08:00
Asias He	283c3dae0a	repair: Add counter for reads issued and finished for repair_reader It is helpful to check the reader blocks forever when debugging a repair hang. Refs #7103	2020-08-26 10:12:07 +08:00
Pekka Enberg	f7c5c48df6	Update tools/jmx submodule * tools/jmx be8f1ac...d5d1efd (1): > dist/debian: Remove conflict tag for Java 11	2020-08-25 15:46:51 +03:00
Rafael Ávila de Espíndola	8204801b7f	build: Add a --enable-seastar-debug-allocations This enables the corresponding seastar option. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200824153137.19683-1-espindola@scylladb.com>	2020-08-25 14:54:10 +03:00
Asias He	6cadf4e4fa	gossip: Apply state for local node in shadow round We saw errors in killed_wiped_node_cannot_join_test dtest: Aug 2020 10:30:43 [node4] Missing: ['A node with address 127.0.76.4 already exists, cancelling join']: The test does: n1, n2, n3, n4 wipe data on n4 start n4 again with the same ip address Without this patch, n4 will bootstrap into the cluster new tokens. We should prevent n4 to bootstrap because there is an existing node in the cluster. In shadow round, the local node should apply the application state of the node with the same ip address. This is useful to detect a node trying to bootstrap with the same IP address of an existing node. Tests: bootstrap_test.py Fixes: #7073	2020-08-25 12:53:59 +03:00
Calle Wilund	5ed3d6892d	cdc: Remove stored (postimage) data when doing row delete Fixes #6900 Clustered range deletes did not clear out the "row_states" data associated with affected rows (might be many). Adds a sweep through and erases relevant data. Since we do pre- and postimage in "order", this should only affect postimage.	2020-08-25 12:27:18 +03:00
Pekka Enberg	3a78593481	configure.py: Fix test repeat and timeout options Fix the default number of test repeats to 1, which it was before (spotted by Nadav). Also, prefix the options so that they become "--test-repeat" and "--test-timeout" (spotted by Avi). Message-Id: <20200825081456.197210-1-penberg@scylladb.com>	2020-08-25 11:26:46 +03:00
Dejan Mircevski	cbf8186a12	cql3/expr: Drop make_column_op() Instantiating binary_operator directly is more readable. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-08-25 11:10:36 +03:00
Asias He	e86881be99	repair: Print repair reason in repair stats log It is useful to distinguish if the repair is a regular repair or used for node operations. In addition, log the keyspace and tables are repaired. Fixes #7086	2020-08-25 11:05:47 +03:00
Piotr Jastrzebski	f01ce1458f	cdc: Preserve metadata columns when geting only keys for delta Fixes #7095 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-08-25 10:41:54 +03:00
Yaron Kaikov	02abcade27	ninja test: add --repeat and --timeout parameters Adding missing parameters to ninja test: --repeat: number of times to repeat each test, default = 3 --timeout: set total timeout in sec for each test	2020-08-25 10:41:54 +03:00
Raphael S. Carvalho	1c29f0a43d	cql3/statements: verify that counter column cannot be added into non-counter table A check, to validate that counter column cannot be added into non-counter table, is missing for alter table statement. Validation is performed when building new schema, but it's limited to checking that a schema will not contain both counter and non-counter columns. Due to lack of validation, the added counter column could be incorrectly persisted to the schema, but this results in a crash when setting the new schema to its table. On restart, it can be confirmed that the schema change was indeed persisted when describing the table. This problem is fixed by doing proper validation for the alter table statement, which consists of making sure a new counter column cannot be added to a non-counter table. The test cdc_disallow_cdc_for_counters_test is adjusted because one of its tests was built on the assumption that counter column can be added into a non-counter table. Fixes #7065. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200824155709.34743-1-raphaelsc@scylladb.com>	2020-08-25 10:41:54 +03:00
Avi Kivity	5ec2eae247	build: add python3-redis to dependencies Needed for redis tests.	2020-08-24 20:03:50 +03:00
Avi Kivity	316d7f74ab	Merge 'Do verify package for offline' from Takuya To verify offline installed image with do_verify_package() in scylla_setup, we introduce is_offline() function works just like is_nonroot() but the install not with --nonroot. * syuu1228-do_verify_package_for_offline: scylla_setup: verify package correctly on offline install scylla_util.py: implement is_offline() to detect offline installed image	2020-08-24 15:49:55 +03:00
Takuya ASADA	cb221ac393	scylla_setup: verify package correctly on offline install do_verify_package written only for .rpm/.deb, does not working correctly for offline install(including nonroot). We should check file existance for the environment, not by package existance using rpm/dpkg. Fixes #7075	2020-08-24 20:10:36 +09:00
Takuya ASADA	c71e5f244a	scylla_util.py: implement is_offline() to detect offline installed image Like is_nonroot(), detect offline installed image using install.sh.	2020-08-24 20:10:36 +09:00
Nadav Har'El	9636a33993	redis: fix use-after-free crash in "exists" command A missing "&" caused the key stored in a long-living command to be copied and the copy quickly freed - and then used after freed. This caused the test test_strings.py::test_exists_multiple_existent_key for this feature to frequently crash. Fixes #6469 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200823190141.88816-1-nyh@scylladb.com>	2020-08-24 11:41:43 +03:00
Asias He	fefa35987b	storage_service: Avoid updating tokens in system.peers for nodes to be removed Consider: 1) Start n1,n2,n3 2) Stop n3 3) Start n4 to replace n3 but list n4 as seed node 4) Node n4 finishes replacing operation 5) Restart n2 6) Run SELECT * from system.peers on node or node 1. cqlsh> SELECT * from system.peers ; peer\| data_center \| host_id\| preferred_ip \| rack \| release_version \| rpc_address \| schema_version\| supported_features\| tokens 127.0.0.3 \|null \|null \| null \| null \| null \|null \|null \|null \| {'-90410082611643223', '5874059110445936121'} The replaced old node 127.0.0.3 shows in system.peers. (Note, since commit `399d79fc6f` (init: do not allow replace-address for seeds), step 3 will be rejected. Assume we use a version without it) The problem is that n2 sees n3 is in gossip status of SHUTDOWN after restart. The storage_service::handle_state_normal callback is called for 127.0.0.3. Since n4 is using different token as n3 (seed node does not bootstrap so it uses new tokens instead of tokens of n3 which is being replaced), so owned_tokens will be set. We see logs like: [shard 0] storage_service - handle_state_normal: New node 127.0.0.3 at token 5874059110445936121 [shard 0] storage_service - Host ID collision for cbec60e5-4060-428e-8d40-9db154572df7 between 127.0.0.4 and 127.0.0.3; ignored 127.0.0.3 As a result, db::system_keyspace::update_tokens will be called to write to system.peers for 127.0.0.3 wrongly. if (!owned_tokens.empty()) { db::system_keyspace::update_tokens(endpoint, owned_tokens) } To fix, we should skip calling db::system_keyspace::update_tokens if the nodes is present in endpoints_to_remove. Refs: #4652 Refs: #6397	2020-08-24 10:06:37 +02:00
Takuya ASADA	fe8679a6ee	test/redis: make redis tests runnable from test.py Just like test/alternator, make redis-test runnable from test.py. For this we move the redis tests into a subdirectory of tests/, and create a script to run them: tests/redis/run. These tests currently fail, so we did not yet modify test.py to actually run them automatically. Fixes #6331	2020-08-23 20:31:45 +03:00
Avi Kivity	907b775523	Merge "Free compaction from storage service" from Pavel E " There's last call for global storage service left in compaction code, it comes from cleanup_compaction to get local token ranges for filtering. The call in question is a pure wrapper over database, so this set just makes use of the database where it's already available (perform_cleanup) and adds it where it's needed (perform_sstable_upgrade). tests: unit(dev), nodetool upgradesstables " * 'br-remove-ss-from-compaction-3' of https://github.com/xemul/scylla: storage_service: Remove get_local_ranges helper compaction: Use database from options to get local ranges compaction: Keep database reference on upgrade options compaction: Keep database reference on cleanup options db: Factor out get_local_ranges helper	2020-08-23 17:58:32 +03:00
Piotr Dulikowski	b111fa98ca	hinted handoff: use default timeout for sending orphaned hints This patch causes orphaned hints (hints that were written towards a node that is no longer their replica) to be sent with a default write timeout. This is what is currently done for non-orphaned hints. Previously, the timeout was hardcoded to one hour. This could cause a long delay while shutting down, as hints manager waits until all ongoing hint sending operation finish before stopping itself. Fixes: #7051	2020-08-23 11:50:27 +03:00
Botond Dénes	0a8cc4c2b5	db/size_estimates_virtual_reader: remove redundant _schema member This reader was probably created in ancient times, when readers didn't yet have a _schema member of their own. But now that they do, it is not necessary to store the schema in the reader implementation, there is one available in the parent class. While at it also move the schema into the class when calling the constructor. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200821070358.33937-1-bdenes@scylladb.com>	2020-08-22 20:47:49 +03:00
Botond Dénes	4944e050e3	mutation_reader: make_combined_reader(): return empty reader when combining 0 readers Avoid creating all the combining machinery when we know there is no data to be had. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200821045602.13096-1-bdenes@scylladb.com>	2020-08-22 20:47:49 +03:00
Avi Kivity	0dcb16c061	Merge "Constify access to token_metadata" from Benny " We keep refrences to locator::token_metadata in many places. Most of them are for read-only access and only a few want to modify the token_metadata. Recently, in `94995acedb`, we added yielding loops that access token_metadata in order to avoid cpu stalls. To make that possible we need to make sure they token_metadata object they are traversing won't change mid-loop. This series is a first step in ensuring the serialization of updates to shared token metadata to reading it. Test: unit(dev) Dtest: bootstrap_test:TestBootstrap.start_stop_test{,_node}, update_cluster_layout_tests.py -a next-gating(dev) " * tag 'constify-token-metadata-access-v2' of github.com:bhalevy/scylla: api/http_context: keep a const sharded<locator::token_metadata>& gossiper: keep a const token_metadata& storage_service: separate get_mutable_token_metadata range_streamer: keep a const token_metadata& storage_proxy: delete unused get_restricted_ranges declaration storage_proxy: keep a const token_metadata& storage_proxy: get rid of mutable get_token_metadata getter database: keep const token_metadata& database: keyspace_metadata: pass const locator::token_metadata& around everywhere_replication_strategy: move methods out of line replication_strategy: keep a const token_metadata& abstract_replication_strategy: get_ranges: accept const token_metadata& token_metadata: rename calculate_pending_ranges to update_pending_ranges token_metadata: mark const methods token_ranges: pending_endpoints_for: return empty vector if keyspace not found token_ranges: get_pending_ranges: return empty vector if keyspace not found token_ranges: get rid of unused get_pending_ranges variant replication_strategy: calculate_natural_endpoints: make token_metadata& param const token_metadata: add get_datacenter_racks() const variant	2020-08-22 20:47:45 +03:00
Pavel Emelyanov	b3274c83e1	storage_service: Remove get_local_ranges helper It's no longer in real use. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-21 14:58:40 +03:00
Pavel Emelyanov	171822cff8	compaction: Use database from options to get local ranges The cleanup compaction wants to keep local tokens on-board and gets them from storage_service.get_local_ranges(). This method is the wrapper around database.get_keyspace_local_ranges() created in previous patch, the live database reference is already available on the descriptor's options, so we can short-cut the call. This allows removing the last explicit call for global storage_service instance from compaction code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-21 14:58:40 +03:00
Pavel Emelyanov	8333fed8aa	compaction: Keep database reference on upgrade options The only place that creates them is the API upgrade_sstables call. The created options object doesn't over-survive the returned future, so it's safe to keep this reference there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-21 14:58:40 +03:00
Pavel Emelyanov	a6e6856e1f	compaction: Keep database reference on cleanup options The database is available at both places that create the options -- tests and API perform_cleanup call. Options object doesn't over-survive the returned future, so it's safe to keep the reference on it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-21 14:58:40 +03:00
Pavel Emelyanov	06f4828b93	db: Factor out get_local_ranges helper Storage service and repair code have identical helpers to get local ranges for keyspace. Move this helper's code onto database, later it will be reused by one more place. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-21 14:58:40 +03:00
Benny Halevy	436babdb3d	api/http_context: keep a const sharded<locator::token_metadata>& It has no need of changing token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	573142d4c4	gossiper: keep a const token_metadata& gossiper has no need to change token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	2f7c529c1c	storage_service: separate get_mutable_token_metadata Use a different getter for a token_metadata& that may be changed so we can better synchronize readers and writers of token_metadata and eventually allow them to yield in asynchronous loops. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	569f2830c1	range_streamer: keep a const token_metadata& range_streamer doesn't need to modify toekn_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	2c61383215	storage_proxy: delete unused get_restricted_ranges declaration Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	c8390da5f9	storage_proxy: keep a const token_metadata& storage_proxy doesn't need to change token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	dfa5f8ff1e	storage_proxy: get rid of mutable get_token_metadata getter We'd like to strictly control who can modify token metadata and nobody currently needs a mutable reference to storage_proxy::_token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	dd6d771331	database: keep const token_metadata& No need to modify token_metadata form database code. Also, get rid of mutable get_token_metadata variant. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	8b5c32c7a8	database: keyspace_metadata: pass const locator::token_metadata& around No need to modify token_metadata on this path. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	e4e4b269c7	everywhere_replication_strategy: move methods out of line Move methods depending on token_metadata to source file so we can avoid including token_metadata.hh in header files where spossible. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	4dba81cb92	replication_strategy: keep a const token_metadata& replication strategies don't need to change token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	2fd59e8bba	abstract_replication_strategy: get_ranges: accept const token_metadata& Now that calculate_natural_endpoints can be passed a const token_metadata& Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	8b63523fb7	token_metadata: rename calculate_pending_ranges to update_pending_ranges Since it sets the token_metadata_impl's pending ranges. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	22275e579e	token_metadata: mark const methods Many token_metadata methods do not modify the object and can be marked as const. The motivation is to better control who may modify token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:21 +03:00
Botond Dénes	5e9a7d2608	row_cache: remove unnecessary includes of partition_snapshot_reader.hh Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200820124447.2561477-1-bdenes@scylladb.com>	2020-08-20 15:19:42 +02:00
Tomasz Grabiec	c44455d514	Merge "Miscellaneous schema code cleanups" from Rafael	2020-08-20 15:19:42 +02:00
Rafael Ávila de Espíndola	33669bd21d	commitlog: Use try_with_gate Now that we have try_with_gate we can use instead of futurize_invoke and with_gate. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200819191334.74108-1-espindola@scylladb.com>	2020-08-20 15:19:42 +02:00
Benny Halevy	65d89512d0	token_ranges: pending_endpoints_for: return empty vector if keyspace not found Rather than creating a bogus empty entry. With that, it can be marked as const. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:16:14 +03:00
Benny Halevy	ca61c2797a	token_ranges: get_pending_ranges: return empty vector if keyspace not found Rather than creating a bogus empty entry. With that, it can be marked as const. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:14:44 +03:00
Tomasz Grabiec	cf12b5e537	db: view: Refactor view_info::initialize_base_dependent_fields() It is no longer called once for a given view_info, so the name "initialize" is not appropriate. This patch splits the "initialize" method into the "make" part, which makes a new base_info object, and the "set" part, which changes the current base_info object attached to the view.	2020-08-20 14:53:07 +02:00
Tomasz Grabiec	617ccc5408	tests: mv: Test dropping columns from base table Reproduces #7061.	2020-08-20 14:53:07 +02:00
Tomasz Grabiec	f8df214836	db: view: Fix incorrect schema access during view building after base table schema changes The view building process was accessing mutation fragments using current table's schema. This is not correct, fragments must be accessed using the schema of the generating reader. This could lead to undefined behavior when the column set of the base table changes. out_of_range exceptions could be observed, or data in the view ending up in the wrong column. Refs #7061. The fix has two parts. First, we always use the reader's schema to access fragments generated by the reader. Second, when calling populate_views() we upgrade the fragment-wrapping reader's schema to the base table schema so that it matches the base table schema of view_and_base snapshots passed to populate_views().	2020-08-20 14:53:07 +02:00
Tomasz Grabiec	d64d60f576	schema: Call on_internal_error() when out of range id is passed to column_at() Improves debuggability because backtrace is attached. Before, plain std::out_of_range exception was thrown.	2020-08-20 14:53:07 +02:00
Tomasz Grabiec	3a6ec9933c	db: views: Fix undefined behavior on base table schema changes The view_info object, which is attached to the schema object of the view, contains a data structure called "base_non_pk_columns_in_view_pk". This data structure contains column ids of the base table so is valid only for a particular version of the base table schema. This data structure is used by materialized view code to interpret mutations of the base table, those coming from base table writes, or reads of the base table done as part of view updates or view building. The base table schema version of that data structure must match the schema version of the mutation fragments, otherwise we hit undefined behavior. This may include aborts, exceptions, segfaults, or data corruption (e.g. writes landing in the wrong column in the view). Before this patch, we could get schema version mismatch here after the base table was altered. That's because the view schema does not change when the base table is altered. Part of the fix is to extract base_non_pk_columns_in_view_pk into a third entitiy called base_dependent_view_info, which changes both on base table schema changes and view schema changes. It is managed by a shared pointer so that we can take immutable snapshots of it, just like with schema_ptr. When starting the view update, the base table schema_ptr and the corresponding base_dependent_view_info have to match. So we must obtain them atomically, and base_dependent_view_info cannot change during update. Also, whenever the base table schema changes, we must update base_dependent_view_infos of all attached views (atomically) so that it matches the base table schema. Refs #7061.	2020-08-20 14:53:07 +02:00
Tomasz Grabiec	dc18117b82	db: views: Introduce has_base_non_pk_columns_in_view_pk() In preparation for pushing _base_non_pk_columns_in_view_pk deeper.	2020-08-20 14:53:07 +02:00
Pekka Enberg	10b2c23e19	configure.py: Fix build, check, and test targets when build mode is defined When user defines a build mode with configure.py, the build, check, and test targets fail as follows: ./configure.py --mode=dev && ninja build ninja: error: 'debug-build', needed by 'build', missing and no known rule to make it Fix the issue by making the targets depend on build targets for specified build modes, not all available modes. Message-Id: <20200813105639.1641090-1-penberg@scylladb.com>	2020-08-20 15:08:06 +03:00
Benny Halevy	23a0625998	token_ranges: get rid of unused get_pending_ranges variant Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 14:46:53 +03:00
Benny Halevy	b4f76cbb8a	replication_strategy: calculate_natural_endpoints: make token_metadata& param const No replication strategy needs to change token_metadata when calculating natural endpoints. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 14:38:45 +03:00
Benny Halevy	78f40cac8d	token_metadata: add get_datacenter_racks() const variant Needed for passing a const token_metadata& to calculate_natural_endpoints methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 14:38:45 +03:00
Nadav Har'El	a1453303f8	alternator test: test for OLD_IMAGE of an empty item We already have a test, test_streams.py::test_streams_updateitem_old_image, for issue #6935: It tests that the OLD_IMAGE in Alternator Streams should contain the item's key. However this test was missing one corner case, which is the first solution for this issue did incorrectly. So in this patch we add a test for this corner case, test_streams_updateitem_old_image_empty_item: This corner case about the item existing, but empty, i.e., having just the key but no other attribute. In this case, OLD_IMAGE should return that empty item - including its key. Not nothing. As usual, this test passes on DynamoDB and xfails on Alternator, and the "xfail" mark will be removed when issue #6935 is fixed. Refs #6935. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200819155229.34475-1-nyh@scylladb.com>	2020-08-20 13:22:40 +02:00
Piotr Jastrzebski	49fd17a4ef	cql3: Improve error messages for markers binding When binding prepared statement it is possible that values being binded are not correct. Unfortunately before this patch, the error message was only saying what type got a wrong value. This was not very helpful because there could be multiple columns with the same type in the table. We also support collections so sometimes error was saying that there is a wrong value for a type T but the affected column was actually of type collection<T>. This patch adds information about a column name that got the wrong value so that it's easier to find and fix the problem. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <90b70a7e5144d7cb3e53876271187e43fd8eee25.1597832048.git.piotr@scylladb.com>	2020-08-20 12:51:36 +03:00
Avi Kivity	392e24d199	Merge "Unglobal messaging service" from Pavel E " The messaging service is (as many other services) present in the global namespace and is widely accessed from where needed with global get(_local)?_messaging_service() calls. There's a long-term task to get rid of this globality and make services and componenets reference each-other and, for and due-to this, start and stop in specific order. This set makes this for the messaging service. The service is very low level and doesn't depend on anything. It's used by gossiper, streaming, repair, migration manager, storage proxy, storage service and API. According to this dependencies the set consists of several parts: patches 1-9 are preparatory, they encapsulate messaging service init/fini stuff in its own module and decouple it from the db::config patch 10-12 introduce local service reference in main and set its init/fini calls at the early stage so that this reference can later be passed to those depending on it patches 13-42 replace global referencing of messaging service from other subsystems with local references initialized from main. patch 43 finalizes tests. patch 44 wraps things up with removing global messaiging service instance along with get(_local)?_messaging_service calls. The service's stopping part is deliberately left incomplete (as it is now), the sharded service remains alive, only the instance's stop() method is called (and is empty for a while). Since the messaging service's users still do not stop cleanly, its instances should better continue leaking on exit. Once (if) the seastar gets the helper rpc::has_handlers() method merged the messaging_service::stop() will be able to check if all the verbs had been unregistered (spoiler: not yet, more fixes to come). For debugging purposes the pointer on now-local messaging service instance is kept in service::debug namespace. tests: unit(dev) dtest(dev: simple_boot_shutdown, repair, update_cluster_layout) manual start-stop " * 'br-unglobal-messaging-service-2' of https://github.com/xemul/scylla: (44 commits) messaging_service: Unglobal messaging service instance tests: Use own instances of messaging_service storage_service: Use local messaging reference storage_service: Keep reference on sharded messaging service migration_manager: Add messaging service as argument to get_schema_definition migration_manager: Use local messaging reference in simple cases migration_manager: Keep reference on messaging migration_manager: Make push_schema_mutation private non-static method migration_manager: Move get_schema_version verb handling from proxy repair: Stop using global messaging_service references repair: Keep sharded messaging service reference on repair_meta repair: Keep sharded messaging service reference on repair_info repair: Keep reference on messaging in row-level code repair: Keep sharded messaging service in API repair: Unset API endpoints on stop repair: Setup API endpoints in separate helper repair: Push the sharded<messaging_service> reference down to sync_data_using_repair repair: Use existing sharded db reference repair: Mark repair.cc local functions as static streaming: Keep messaging service on send_info ...	2020-08-20 12:20:36 +03:00
Rafael Ávila de Espíndola	f0e4e5b85a	schema: Make some functions static This just make it easier to see that they are file local helpers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-19 14:05:31 -07:00
Rafael Ávila de Espíndola	6363716799	schema: Pass an rvalue to set_compaction_strategy_options This produces less code and makes sure every caller moves the value. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-19 14:02:35 -07:00
Rafael Ávila de Espíndola	527c1ab546	schema: Move set_compaction_strategy_options out of line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-19 14:02:13 -07:00
Pavel Emelyanov	623f61e63e	messaging_service: Unglobal messaging service instance Remove the global messaging_service, keep it on the main stack. But also store a pointer on it in debug namespace for debugging. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	ee41645a1a	tests: Use own instances of messaging_service The global one is going away, no core code uses it, so all tests can be safely switched to use their own instances. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	a6f8f450ba	storage_service: Use local messaging reference All the places the are (and had become such with previous patches) using the global messaging service and the storage service methods, so they can access the local reference on the messaging service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	4ea3c2797c	storage_service: Keep reference on sharded messaging service It is a bit step backward in the storage-service decompsition campaign, but... Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	24eaf827c0	migration_manager: Add messaging service as argument to get_schema_definition There are 4 places that call this helper: - storage proxy. Callers are rpc verb handlers and already have the proxy at hands from which they can get the messaging service instance - repair. There's local-global messaging instance at hands, and the caller is in verb handler too - streaming. The caller is verb handler, which is unregistered on stop, so the messaging service instance can be captured - migration manager itself. The caller already uses "this", so the messaging service instance can be get from it The better approach would be to make get_schema_definition be the method of migration_manager, but the manager is stopped for real on shutdown, thus referencing it from the callers might not be safe and needs revisiting. At the same time the messaging service is always alive, so using its reference is safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	2a4c0fa280	migration_manager: Use local messaging reference in simple cases Most of those places are either non-static migration_manager methods. Plus one place where the local service instance is already at hands. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	6c49127d04	migration_manager: Keep reference on messaging That's another user of messaging service, init it with private reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	abb1dd608f	migration_manager: Make push_schema_mutation private non-static method The local migration manager instance is already available at caller, so we can call a method on it. This is to facilitate next patching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	56aa514cd9	migration_manager: Move get_schema_version verb handling from proxy The user of this verb is migration manager, so the handler must be it as well. The hander code now explicitly gets global proxy. This call is safe, as proxy is not stopped nowadays. In the future we'll need to revisit the relation between migration - proxy - stats anyway. The use of local migration manager is safe, as it happens in verb handler which is unregistered and is waited to be completed on migration manager stop. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	704880d564	repair: Stop using global messaging_service references Now all the users of messaging service have the needed reference. Again, the messaging service is not really stopped at the end, so its usage is safe regardless of whether repair stuff itself leaks on stop or not. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	d7e90dbfa9	repair: Keep sharded messaging service reference on repair_meta The reference comes from repair_info and storage_service calls, both had been already patched for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	285648620b	repair: Keep sharded messaging service reference on repair_info This reference comes from the API that already has it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	74494bac87	repair: Keep reference on messaging in row-level code The row-level repair keeps its statics for needed services, same as the streaming does. Treat the messaging service the same way to stop using the global one in the next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	8b4820b520	repair: Keep sharded messaging service in API The reference will be needed in repair_start, so prepare one in advance Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	126dac8ad1	repair: Unset API endpoints on stop This unset the roll-back of the correpsonding _set-s. The messaging service will be (already is, but implicitly) used in repair API callbacks, so make sure they are unset before the messaging service is stopped. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	fe2c479c04	repair: Setup API endpoints in separate helper There will be the unset part soon, this is the preparation. No functional changes in api/storage_server.cc, just move the code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	45c31eadb3	repair: Push the sharded<messaging_service> reference down to sync_data_using_repair This function needs the messaging service inside, but the closest place where it can get one from is the storage_service API handlers. Temporarily move the call for global messaging service into storage service, its turn for this cleanup will come later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	6b0f4d5c8d	repair: Use existing sharded db reference The db.invoke_on_all's lambda tries to get the sharded db reference via the global storage service. This can be done in a much nicer way. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	3d2e3203f7	repair: Mark repair.cc local functions as static Just a cleanup to facilitate code reading. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	d2c475f27c	streaming: Keep messaging service on send_info And use it in send_mutation_fragments. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	a6888e3ce3	streaming: Keep reference on messaging Streaming uses messaging, init it with itw own reference. Nowadays the whole streaming subsystem uses global static references on the needed services. This is not nice, but still better than just using code-wide globals, so treat the messaging service here the same way. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	163d615dc3	streaming: Use local ms() on ::start This is just a cleanup to avoid explicit global call. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	528e4455b9	storage_proxy: Use _proxy in paxos_response_handler methods The proxy pointer is non-null (and is already used in these methods), so it should be safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	d397d7e734	storage_proxy: Pass proxy into forward_fn lambda of handle_write It is alive there, so it is safe to pass one to lambda. Once in forward_fn, it can be used to get messaging from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	e5c10ee3e0	storage_proxy: Use reference on messaging in simple cases Most of the places that need messaging service in proxy already use storage_proxy instance, so it is safe to get the local messaging from it too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	24cb1b781f	storage_proxy: Keep reference on messaging The proxy is another user of messaging, so keep the reference on it. Its real usage will come in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	4ea63b2211	gossiper: Share the messaging service with snitch And make snitch use gossiper's messaging, not global Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	65bd54604d	gossiper: Use messaging service by reference Gossiper needs messaging service, the messaging is started before the gossiper, so we can push the former reference into it. Gossiper is not stopped for real, neither the messaging service is, so the memory usage is still safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Botond Dénes	6ad80f0adb	test/lib/cql_test_env: set debug::db pointer To allow using scylla-gdb.py scripts for debugging tests. These scripts expect a valid database pointer in `debug::db`. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200819145632.2423462-1-bdenes@scylladb.com>	2020-08-19 19:13:05 +03:00
Raphael S. Carvalho	a0e0195a77	sstables: Avoid excessive reallocations when creating sharding metadata Let's reserve space for sharding metadata in advance, to avoid excessive allocations in create_sharding_metadata(). With the default ignore_msb_bits=12, it was observed that the # of reallocations is frequently 11-12. With ignore_msb_bits=16, the number can easily go up to 50. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200814210250.39361-1-raphaelsc@scylladb.com>	2020-08-19 17:58:29 +03:00
Nadav Har'El	2e1499ee93	merge: cdc: introduce a ,,change visitor'' and ,,inspect mutation'' abstractions Merged pull request https://github.com/scylladb/scylla/pull/6978 by Kamil Braun: These abstractions are used for walking over mutations created by a write coordinator, deconstructing them into atomic'' pieces (changes''), and consuming these pieces. Read the big comment in cdc/change_visitor.hh for more details. 4 big functions were rewritten to use the new abstractions. tests: unit (dev build) all dtests from cdc_tests.py, except cdc_tests.py:TestCdc.cluster_reduction_with_cdc_test, have passed (on a dev build). The test that fails also fails on master. Part of #5945. cdc: rewrite process_changes using inspect_mutation cdc: move some functions out of `cdc::transformer` cdc: rewrite extract_changes using inspect_mutation cdc: rewrite should_split using inspect_mutation cdc: rewrite find_timestamp using inspect_mutation cdc: introduce a ,,change visitor'' abstraction	2020-08-19 17:19:01 +03:00
Avi Kivity	6f986df458	Merge "Fix TWCS compaction aggressiveness due to data segregation" from Raphael " After data segregation feature, anything that cause out-of-order writes, like read repair, can result in small updates to past time windows. This causes compaction to be very aggressive because whenever a past time window is updated like that, that time window is recompacted into a single SSTable. Users expect that once a window is closed, it will no longer be written to, but that has changed since the introduction of the data segregation future. We didn't anticipate the write amplification issues that the feature would cause. To fix this problem, let's perform size-tiered compaction on the windows that are no longer active and were updated because data was segregated. The current behavior where the last active window is merged into one file is kept. But thereafter, that same window will only be compacted using STCS. Fixes #6928. " * 'fix_twcs_agressiveness_after_data_segregation_v2' of github.com:raphaelsc/scylla: compaction/twcs: improve further debug messages compaction/twcs: Improve debug log which shows all windows test: Check that TWCS properly performs size-tiered compaction on past windows compaction/twcs: Make task estimation take into account the size-tiered behavior compaction/stcs: Export static function that estimates pending tasks compaction/stcs: Make get_buckets() static compact/twcs: Perform size-tiered compaction on past time windows compaction/twcs: Make strategy easier to extend by removing duplicated knowledge compaction/twcs: Make newest_bucket() non-static compaction/twcs: Move TWCS implementation into source file	2020-08-19 17:19:01 +03:00
Avi Kivity	f6b66456fd	Update seastar submodule Contains patch from Rafael to fix up includes. * seastar c872c3408c...7f7cf0f232 (9): > future: Consider result_unavailable invalid in future_state_base::ignore() > future: Consider result_unavailable invalid in future_state_base::valid() > Merge "future-util: split header" from Benny > docs: corrected some text and code-examples in streaming-rpc docs > future: Reduce nesting in future::then > demos: coroutines: include std-compat.hh > sstring: mark str() and methods using it as noexcept > tls: Add an assert > future: fix coroutine compilation	2020-08-19 17:18:57 +03:00
Pavel Emelyanov	dc0918e255	tests: Keep local reference on global messaging Some tests directly reference the global messaging service. For the sake of simpler patching wrap this global reference with a local one. Once the global messaging service goes away tests will get their own instances. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	b895c2971a	api: Use local reference to messaging_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	d477bd562d	api: Unregister messaging endpoints on stop API is one of the subsystems that work with messaging service. To keep the dependencies correct the related API stuff should be stopped before the messaging service stops. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	78298ec776	init: Use local messaging reference in main There are few places that initialize db and system_ks and need the messaging service. Pass the reference to it from main instead of using the global helpers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	878c50b9ad	main: Keep reference on global messaging service This is the preparation for moving the message service to main -- keep a reference and eventually pass one to subsystems depending on messaging. Once they are ready, the reference will be turned into an instance. For now only push the reference into the messaging service init/exit itself, other subsystems will be patched next. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	bdfb77492f	init: The messaging_service::stop is back (not really) Introduce back the .stop() method that will be used to really stop the service. For now do not do sharded::stop, as its users are not yet stopping, so this prevents use-after-free on messaging service. For now the .stop() is empty, but will be in charge of checking if all the other users had unregisterd their handlers from rpc. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	15998e20ce	init: Move messaging service init up the main() The messaging service is a low-level one which doesn't need other services, so it can be started first. Nowadays it's indeed started before most of its users but one -- the gossiper. In current code gossiper doesn't do anything with messaging service until it starts, but very soon this dependency will be expressed in terms of a refernce from gossiper to messaging_service, thus by the time the latter starts, the former should already exist. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	c28aeaee2e	messaging_service: Move initialization to messaging/ Now the init_messaging_service() only deals with messaing service and related internal stuff, so it can sit in its own module. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	41eee249d7	init: RIP init_scheduling_config This struct is nowadays only used to transport arguments from db::config to messaging_service::scheduling_config, things get simpler if dropping it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	ef6c75a732	init: Call init_messaging_service with its config only This makes the messaging service configuration completely independent from the db config. Next step would be to move the messaging service init code into its module. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	5b169e8d16	messaging_service: Construct using config This is the continuation of the previous patch -- change the primary constructor to work with config. This, in turn, will decouple the messaging service from database::config. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	304a414e39	messaging_service: Introduce and use config This service constructor uses and copies many simple values, it would be much simpler to group them on config. It also helps the next patches to simplify the messaging service initialization and to keep the defaults (for testing) in one place. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	f7d99b4a06	init: Split messaging service and gossiper initialization The init_ms_fd_gossiper function initializes two services, but effectively consists of two independent parts, so declare them as such. The duplication of listen address resolution will go away soon. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	1c8ea817cd	messaging_service: Rename stop() to shutdown() On today's stop() the messaging service is not really stopped as other services still (may) use it and have registered handlers in it. Inside the .stop() only the rpc servers are brought down, so the better name for this method would be shutdown(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	e6fb2b58fc	messaging_service: Cleanup visibility of stopping methods Just a cleanup. These internal stoppers must be private, also there are too many public specifiers in the class description around them. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Emelyanov	0601e9354d	init: Remove unused lonely future from init_ms_fd_gossiper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Rafael Ávila de Espíndola	56724d084d	sstables: Move date_tiered_compaction_strategy_options::date_tiered_compaction_strategy_options out of line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200812232915.442564-6-espindola@scylladb.com>	2020-08-19 11:34:13 +03:00
Rafael Ávila de Espíndola	07b3ead752	sstables: Move size_tiered_compaction_strategy_options::size_tiered_compaction_strategy_options out of line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200812232915.442564-5-espindola@scylladb.com>	2020-08-19 11:34:13 +03:00
Rafael Ávila de Espíndola	7b3946fa0e	sstables: Move compaction_strategy_impl::compaction_strategy_impl out of line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200812232915.442564-4-espindola@scylladb.com>	2020-08-19 11:34:13 +03:00
Rafael Ávila de Espíndola	9ba765fe6f	sstables: Move compaction_strategy_impl::get_value out of line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200812232915.442564-3-espindola@scylladb.com>	2020-08-19 11:34:13 +03:00
Rafael Ávila de Espíndola	06b15aa7e3	sstables: Move time_window_compaction_strategy_options' constructors to a .cc These are not trivial and not hot. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200812232915.442564-2-espindola@scylladb.com>	2020-08-19 11:34:13 +03:00
Raphael S. Carvalho	d601f78b4b	compaction/twcs: improve further debug messages Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-08-18 15:14:09 -03:00
Raphael S. Carvalho	086f277584	compaction/twcs: Improve debug log which shows all windows The current log prints one log entry for each window, it doesn't print the # of SSTs in the bucket, and the now information is copied across all the window entries. previously, it looked like this: [shard 0] compaction - Key 1597331160000000, now 1597331160000000 [shard 0] compaction - Key 1597331100000000, now 1597331160000000 [shard 0] compaction - Key 1597331040000000, now 1597331160000000 [shard 0] compaction - Key 1597330980000000, now 1597331160000000 this made it harder to group all windows which reflect the state of the strategy in a given time. now, it looks like as follow: [shard 0] compaction - time_window_compaction_strategy::newest_bucket: now 1597331160000000 buckets = { key=1597331160000000, size=1 key=1597331100000000, size=2 key=1597331040000000, size=1 key=1597330980000000, size=1 } Also the level of this log is changed from debug to trace, given that now it's compressed and only printed once. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-08-18 15:14:09 -03:00
Raphael S. Carvalho	3be1420083	test: Check that TWCS properly performs size-tiered compaction on past windows Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-08-18 15:14:09 -03:00
Raphael S. Carvalho	96436312be	compaction/twcs: Make task estimation take into account the size-tiered behavior The task estimation was not taking into account that TWCS does size-tiered on the the windows, and it only added 1 to the estimation when there could be more tasks than that depending on the amount of SSTables in all the existing size tiers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-08-18 15:14:09 -03:00
Raphael S. Carvalho	d287b1c198	compaction/stcs: Export static function that estimates pending tasks That will be useful for allowing other compaction strategies that use STCS to properly estimate the pending tasks. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-08-18 15:14:09 -03:00
Raphael S. Carvalho	b62737fd05	compaction/stcs: Make get_buckets() static STCS will export a static function to estimate pending tasks, and it relies on get_buckets() being static too. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-08-18 15:14:07 -03:00
Botond Dénes	48550eaae4	scylla-gdb.py: add find_vptrs_of_type() helper functions One of the most common code I write when investigating coredumps is finding all objects of a certain type and creating a human readable report of certain properties of these objects. This usually involves retrieving all objects with a vptr with `find_vptrs()` and matching their type to some pattern. I found myself writing this boilerplate over and over again, so in this patch I introduce a convenience method to avoid repeating it in the future. Message-Id: <20200818145247.2358116-1-bdenes@scylladb.com>	2020-08-18 17:20:06 +02:00
Botond Dénes	74ffafc8a7	scylla-gdb.py: scylla fiber: add actual return to early return scylla_fiber._walk() has an early return condition on the passed-in pointer actually being a task pointer. The problem is that the actual return statement was missing and only an error was logged. This resulted in execution continuing and further weird errors being printed due to the code not knowing how to handle the bad pointer. Message-Id: <20200818144902.2357289-1-bdenes@scylladb.com>	2020-08-18 17:17:25 +02:00
Botond Dénes	f3af6ff221	scylla-gdb.py: scylla fiber: add new FQ name of thread_wake_task thread_wake_task was moved into an anonymous namespace, add this new fully qualified name to the task name white-list. Leave the old name for backward compatibility. While at it, also add `seastar::thread_context` which is also a task object, for better seastar thread support. Message-Id: <20200818142206.2354921-1-bdenes@scylladb.com>	2020-08-18 16:48:01 +02:00
Botond Dénes	ece638fb3f	scylla-gdb.py: collection_element(): add std::tuple support Accessing the element of a tuple from the gdb command line is a nightmare, add support to collection_element() retrieving one of its elements to make this easier. Message-Id: <20200818141123.2351892-1-bdenes@scylladb.com>	2020-08-18 16:48:01 +02:00
Botond Dénes	077dc7c021	scylla-gdb.py: boost_intrusive_list: add __len__() operator Message-Id: <20200818141340.2352666-1-bdenes@scylladb.com>	2020-08-18 16:48:01 +02:00
Dejan Mircevski	fb6c011b52	everywhere: Insert space after `switch` Quoth @avikivity: "switch is not a function, and we celebrate that by putting a space after it like other control-flow keywords." https://github.com/scylladb/scylla/pull/7052#discussion_r471932710 Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-08-18 14:31:04 +03:00
Botond Dénes	78f94ba36a	table: get_sstables_by_partition_key(): don't make a copy of selected sstables Currently we assign the reference to the vector of selected sstables to `auto sst`. This makes a copy and we pass this local variable to `do_for_each()`, which will result in a use-after-free if the latter defers. Fix by not making a copy and instead just keep the reference. Fixes: #7060 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200818091241.2341332-1-bdenes@scylladb.com>	2020-08-18 14:20:31 +03:00
Avi Kivity	ecb2bdad54	Merge 'Replace operator_type with an enum' from Dejan " operator_type is awkward because it's not copyable or assignable. Replace it with a new enum class. Tests: unit(dev) " * dekimir-operator-type: cql3: Drop operator_type entirely cql3: Drop operator_type from the parser cql3/expr: Replace operator_type with an enum	2020-08-18 13:45:20 +03:00
Dejan Mircevski	1aa326c93b	cql3: Drop operator_type entirely Since no live code uses it anymore, it can be safely removed. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-08-18 12:27:01 +02:00
Dejan Mircevski	d97605f4f8	cql3: Drop operator_type from the parser Replace operator_type with the nicer-behaved oper_t in CQL parser and, consequently, in the relation hierarchy and column_condition. After this, no references to operator_type remain in live code. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-08-18 12:27:00 +02:00
Dejan Mircevski	71c921111d	cql3/expr: Replace operator_type with an enum operator_type is awkward because it's not copyable or assignable. Replace it in expression representation with a new enum class, oper_t. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-08-18 12:27:00 +02:00
Avi Kivity	b0ae9d0c7d	Update tools/python3 submodule * tools/python3 196be5a...f89ade5 (1): > reloc: cleanup deb builddir	2020-08-18 13:10:01 +03:00
Pekka Enberg	385ad5755b	configure.py: Move tarballs to build/<mode>/dist/tar As suggested by Avi, let's move the tarballs from "build/dist/<mode>/tar" to "build/<mode>/dist/tar" to retain the symmetry of different build modes, and make the tarballs easier to discover. While at it, let's document the new tarball locations. Message-Id: <20200818100427.1876968-1-penberg@scylladb.com>	2020-08-18 13:07:52 +03:00
Botond Dénes	22a6493716	view_update_generator: fix race between registering and processing sstables `fea83f6` introduced a race between processing (and hence removing) sstables from `_sstables_with_tables` and registering new ones. This manifested in sstables that were added concurrently with processing a batch for the same sstables being dropped and the semaphore units associated with them not returned. This resulted in repairs being blocked indefinitely as the units of the semaphore were effectively leaked. This patch fixes this by moving the contents of `_sstables_with_tables` to a local variable before starting the processing. A unit test reproducing the problem is also added. Fixes: #6892 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200817160913.2296444-1-bdenes@scylladb.com>	2020-08-18 10:22:35 +03:00
Takuya ASADA	352a136ae2	scylla-python3: move scylla-python3 to separated repository Except scylla-python3, each scylla package has its own git repository, same package script filename, same build directory structure. To put python3 thing on scylla repo, we created 'python3' directory on multiple locations, made '-python3' suffixed files, dig deeper build directory not to conflict scylla-server package build. We should move all scylla-python3 related files to new repository, scylla-python3. To keep compatibility with current Jenkins script, provide packages on build/ directory for now. Fixes #6751	2020-08-18 09:34:08 +03:00
Raphael S. Carvalho	f9f0be9ac8	compact/twcs: Perform size-tiered compaction on past time windows After data segregation feature, anything that cause out-of-order writes, like read repair, can result in small updates to past time windows. This causes compaction to be very aggressive because whenever a past time window is updated like that, that time window is recompacted into a single SSTable. Users expect that once a window is closed, it will no longer be written to, but that has changed since the introduction of the data segregation future. We didn't anticipate the write amplification issues that the feature would cause. To fix this problem, let's perform size-tiered compaction on the windows that are no longer active and were updated because data was segregated. The current behavior where the last active window is merged into one file is kept. But thereafter, that same window will only be compacted using STCS. Fixes #6928. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-08-17 12:29:34 -03:00
Raphael S. Carvalho	820b47e9a3	compaction/twcs: Make strategy easier to extend by removing duplicated knowledge TWCS is hard to extend because its knowledge on what to do with a window bucket is duplicated in two functions. Let's remove this duplication by placing the knowledge into a single function. This is important for the coming change that will perform size-tiered instead of major on windows that are no longer active. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-08-17 12:29:34 -03:00
Raphael S. Carvalho	f2b588cfc4	compaction/twcs: Make newest_bucket() non-static To fix #6928, newest_bucket() will have to access the class fields. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-08-17 12:29:34 -03:00
Raphael S. Carvalho	b95359314d	compaction/twcs: Move TWCS implementation into source file Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-08-17 12:29:34 -03:00
Pavel Solodovnikov	9aa4712270	lwt: introduce `paxos_grace_seconds` per-table option to set paxos ttl Previously system.paxos TTL was set as max(3h, gc_grace_seconds). Introduce new per-table option named `paxos_grace_seconds` to set the amount of seconds which are used to TTL data in paxos tables when using LWT queries against the base table. Default value is equal to `DEFAULT_GC_GRACE_SECONDS`, which is 10 days. This change allows to easily test various issues related to paxos TTL. Fixes #6284 Tests: unit (dev, debug) Co-authored-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200816223935.919081-1-pa.solodovnikov@scylladb.com>	2020-08-17 16:44:14 +02:00
Kamil Braun	0d3779e3e6	cdc: rewrite process_changes using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	9067f1a4e2	cdc: move some functions out of `cdc::transformer` Preparing them to be used outside of `transformer`.	2020-08-17 15:51:33 +02:00
Kamil Braun	4533f62f54	cdc: rewrite extract_changes using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	e9192a6108	cdc: rewrite should_split using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	ee87f4026e	cdc: rewrite find_timestamp using inspect_mutation	2020-08-17 15:51:33 +02:00
Kamil Braun	694714796f	cdc: introduce a ,,change visitor'' abstraction This is an abstraction for walking over mutations created by a write coordinator, deconstructing them into ,,atomic'' pieces (,,changes''), and consuming these pieces. Read the big comment in cdc/change_visitor.hh for more details.	2020-08-17 15:51:30 +02:00
Nadav Har'El	4c73d43153	Alternator: allow CreateTable with SSESpecification explicitly disabled While Alternator doesn't yet support creating a table with a different "server-side encryption" (a.k.a. encryption-at-rest) parameters, the SSESpecification option with Enabled=false should still be allowed, as it is just the default, and means exactly the same as would a missing SSESpecification. This patch also adds a test for this case, which failed on Alternator before this patch. Fixes #7031. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200812205853.173846-1-nyh@scylladb.com>	2020-08-17 13:48:52 +02:00
Nadav Har'El	159a966949	alternator test, streams: add test LSI key attributes in OldImage This patch adds a test that attributes which serve as a key for a secondary index still appear in the OldImage in an Alternator Stream. This is a special case, because although usually Alternator attributes are saved as map elements, not stand-alone Scylla columns, in the special case of secondary-index keys they are saved as actual Scylla columns in the base table. And it turns out we produce wrong results in this case: CDC's "preimage" does not currently include these columns if they didn't change, while DynamoDB requires that all columns, not just the changed ones, appear in OldImage. So the test added in this patch xfails on Alternator (and as usual, passes on DynamoDB). Refs #7030. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200812144656.148315-1-nyh@scylladb.com>	2020-08-17 13:46:53 +02:00
Pekka Enberg	d6354cb507	dbuild: Use host $USER and $HOME in Podman container The "user.home" system property in JVM does not use the "HOME" environment variable. This breaks Ant and Maven builds with Podman, which attempts to look up the local Maven repository in "/root/.m2" when building tools, for example: build.xml:757: /root/.m2/repository does not exist. To fix the issue, let's bind-mount an /etc/passwd file, which contains host username for UID 0, which ensures that Podman container $USER and $HOME are the same as on the host. Message-Id: <20200817085720.1756807-1-penberg@scylladb.com>	2020-08-17 13:46:28 +03:00
Avi Kivity	f4dbe3e65e	Update tools/jmx submodule * tools/jmx c5ed831...be8f1ac (1): > dist/common/systemd: set WorkingDirectory to get heap dump correctly	2020-08-17 09:54:59 +03:00
Avi Kivity	3b1ff90a1a	Merge "Get rid of seed concept in gossip" from Asias " gossip: Get rid of seed concept The concept of seed and the different behaviour between seed nodes and non seed nodes generate a lot of confusion, complication and error for users. For example, how to add a seed node into into a cluster, how to promote a non seed node to a seed node, how to choose seeds node in multiple DC setup, edit config files for seeds, why seed node does not bootstrap. If we remove the concept of seed, it will get much easier for users. After this series, seed config option is only used once when a new node joins a cluster. Major changes: Seed nodes are only used as the initial contact point nodes. Seed nodes now perform bootstrap. The only exception is the first node in the cluster. The unsafe auto_bootstrap option is now ignored. Gossip shadow round now talks to all nodes instead of just seed nodes. Refs: #6845 Tests: update_cluster_layout_tests.py + manual test " * 'gossip_no_seed_v2' of github.com:asias/scylla: gossip: Get rid of seed concept gossip: Introduce GOSSIP_GET_ENDPOINT_STATES verb gossip: Add do_apply_state_locally helper gossip: Do not talk to seed node explicitly gossip: Talk to live endpoints in a shuffled fashion	2020-08-17 09:50:51 +03:00
Avi Kivity	5356e8319d	Merge 'Support building packages on non-x86 platform' from Takuya " Allow users to build unofficial packages for non-x86 platform. " * syuu1228-aarch64_packaging_fix: dist/debian: allow building non-amd64 .deb configure.py: disable DPDK by default on non-x86_64 platform	2020-08-17 08:26:17 +03:00
Takuya ASADA	c73e945cf6	dist/debian: allow building non-amd64 .deb Allow building .deb on any architecture, not only amd64.	2020-08-17 14:16:24 +09:00
Takuya ASADA	06079f0656	configure.py: disable DPDK by default on non-x86_64 platform Since configure.py without option fails on some non-x86 architecture such as ARM64, we should disable it on such architectures.	2020-08-17 14:16:24 +09:00
Asias He	d0b3f3dfe8	gossip: Get rid of seed concept The concept of seed and the different behaviour between seed nodes and non seed nodes generate a lot of confusion, complication and error for users. For example, how to add a seed node into into a cluster, how to promote a non seed node to a seed node, how to choose seeds node in multiple DC setup, edit config files for seeds, why seed node does not bootstrap. If we remove the concept of seed, it will get much easier for users. After this series, seed config option is only used once when a new node joins a cluster. Major changes: - Seed nodes are only used as the initial contact point nodes. - Seed nodes now perform bootstrap. The only exception is the first node in the cluster. - The unsafe auto_bootstrap option is now ignored. - Gossip shadow round now attempts to talk to all nodes instead of just seed nodes. Manual test: - bootstrap n1, n2, n3 (n1 and n2 are listed as seed, check only n1 will skip bootstrap, n2 and n3 will bootstrap) - shtudown n1, n2, n3 - start n2 (check non seed node can boot) - start n1 (check n1 talks to both n2 and n3) - start n3 (check n3 talks to both n1 and n3) Upgrade/Downgrade test: - Initialize cluster Start 3 node with n1, n2, n3 using old version n1 and n2 are listed as seed - Test upgrade starting from seed nodes Rolling restart n1 using new version Rolling restart n2 using new version Rolling restart n3 using new version - Test downgrade to old version Rolling restart n1 using old version Rolling restart n2 using old version Rolling restart n3 using old version - Test upgrade starting from non seed nodes Rolling restart n3 using new version Rolling restart n2 using new version Rolling restart n1 using new version Notes on upgrade procedure: There is no special procedure needed to upgrade to Scylla without seed concept. Rolling upgrade node one by one is good enough. Fixes: #6845 Tests: ./test.py + update_cluster_layout_tests.py + manual test	2020-08-17 10:35:16 +08:00
Takuya ASADA	75c2362c95	dist/debian: disable debuginfo compression on .deb Since older binutils on some distribution does not able to handle compressed debuginfo generated on Fedora, we need to disable it. However, debian packager force debuginfo compression since debian/compat = 9, we have to uncompress them after compressed automatically. Fixes #6982	2020-08-16 18:13:29 +03:00
Avi Kivity	125795bda5	Merge " Build tarballs to build/dist/<mode>/tar directory" from Pekka " This patch series changes the build system to build all tarballs to build/dist/<mode>/tar directory. For example, running: ./tools/toolchain/dbuild ./configure.py --mode=dev && ./tools/toolchain/dbuild ninja-build dist-tar produces the following tarballs in build/dist/dev/tar: $ ls -1 build/dist/dev/tar/ scylla-jmx-package.tar.gz scylla-package.tar.gz scylla-python3-package.tar.gz scylla-tools-package.tar.gz This makes it easy to locate release tarballs for humans and scripts. To preserve backward compatibility, the tarballs are also retained in their original locations. Once release engineering infrastructure has been adjusted to use the new locations, we can drop the duplicate copies. " * 'penberg/build-dist-tar/v1' of github.com:penberg/scylla: configure.py: Copy tarballs to build/dist/<mode>/tar directory configure.py: Add "dist-<component>-tar" targets reloc/python3: Add "--builddir" to build_deb.sh configure.py: Use copy-on-write copies when possible	2020-08-16 17:55:35 +03:00
Avi Kivity	061ec49a6c	Merge "Improve error reporting on invalid internal schema access" from Tomasz " Contains several fixes which improve debuggability in situations where too large column ids are passed to column definition loop methods. " * 'schema-range-check-fix' of github.com:tgrabiec/scylla: schema: Add table name and schema version to error messages schema: Use on_internal_error() for range check errors schema: Fix off-by-one in column range check schema: Make range checks for regular and static columns the same as for clustering columns	2020-08-16 17:48:48 +03:00
Raphael S. Carvalho	81ec49c82f	sstables/sstable_set: rename method to retrieve sstable runs select() is too generic for the method that retrieve sstable runs, and it has a completely different meaning that the former select method used to select sstables based on token range. let's give it a more descriptive name. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200811193401.22749-1-raphaelsc@scylladb.com>	2020-08-16 17:41:16 +03:00
Raphael S. Carvalho	b07920dd1f	sstables: Fix remove_by_toc_name() on temporary toc regression caused by `55cf219c97`. remove_by_toc_name() must work both for a sealed sstable with toc, and also a partial sstable with tmp toc. so dirname() should be called conditionally on the condition of the sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200813160612.101117-1-raphaelsc@scylladb.com>	2020-08-16 17:35:55 +03:00
Raphael S. Carvalho	7d7f9e1c54	sstables/LCS: increase per-level overlapping tolerance in reshape LCS can have its overlapping invariant broken after operations that can proceed in parallel to regular compaction like cleanup. That's because there could be two compactions in parallel placing data in overlapping token ranges of a given level > 0. After reshape, the whole table will be rewritten, on restart, if a given level has more than (fan_out2)=20 overlaps. That may sound like enough, but that's not taking into account the exponential growth in # of SSTables per level, so 20 overlaps may sound like a lot for level 2 which can afford 100 sstables, but it's only 2% of level 3, and 0.2% of level 4. So let's change the overlapping tolerance from the constant of fan_out2 to 10% of level limit on # of SSTables, or fan_out, whichever is higher. Refs #6938. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200810154510.32794-1-raphaelsc@scylladb.com>	2020-08-16 17:33:48 +03:00
Raphael S. Carvalho	11df96718a	compaction: Prevent non-regular compaction from picking compacting SSTables After `8014c7124`, cleanup can potentially pick a compacting SSTable. Upgrade and scrub can also pick a compacting SSTable. The problem is that table::candidates_for_compaction() was badly named. It misleads the user into thinking that the SSTables returned are perfect candidates for compaction, but manager still need to filter out the compacting SSTables from the returned set. So it's being renamed. When the same SSTable is compacted in parallel, the strategy invariant can be broken like overlapping being introduced in LCS, and also some deletion failures as more than one compaction process would try to delete the same files. Let's fix scrub, cleanup and ugprade by calling the manager function which gets the correct candidates for compaction. Fixes #6938. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200811200135.25421-1-raphaelsc@scylladb.com>	2020-08-16 17:31:03 +03:00
Nadav Har'El	7e01ae089e	cdc: avoid including cdc/cdc_options.hh everywhere Before this patch, modifying cdc/cdc_options.hh required recompiling 264 source files. This is because this header file was included by a couple other header files - most notably schema.hh, where a forward declaration would have been enough. Only the handful of source files which really need to access the CDC options should include "cdc/cdc_options.hh" directly. After this patch, modifying cdc/cdc_options.hh requires only 6 source files to be recompiled. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200813070631.180192-1-nyh@scylladb.com>	2020-08-16 14:41:47 +03:00
Piotr Jastrzebski	01ea159fde	codebase wide: use try_emplace when appropriate C++17 introduced try_emplace for maps to replace a pattern: if(element not in a map) { map.emplace(...) } try_emplace is more efficient and results in a more concise code. This commit introduces usage of try_emplace when it's appropriate. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <4970091ed770e233884633bf6d46111369e7d2dd.1597327358.git.piotr@scylladb.com>	2020-08-16 14:41:09 +03:00
Pekka Enberg	39400f58fb	build_unified.sh: Compress generated tarball Fixes #7039 Message-Id: <20200813124921.1648028-1-penberg@scylladb.com>	2020-08-16 14:41:01 +03:00
Dejan Mircevski	edf91e9e06	test: Restore a case in user_types_test This testcase was temporarily commented out in `37ebe52`, because it relied on buggy (#6369) behaviour fixed by that commit. Specifically, it expected a NULL comparison to match a NULL cell value. We now bring it back, with corrected result expectation. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-08-16 13:49:55 +03:00
Pavel Emelyanov	319c9dda92	scylla-gdb: Fix netw command The _clients is std::vector, it doesn't have _M_elems. Luckily there's std_vector() class for it. The seastar::rpc::server::_conns is unordered_map, not unordered_set. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200814070858.32383-1-xemul@scylladb.com>	2020-08-16 11:41:02 +03:00
Asias He	c76296e97e	scylla-gdb.py: Add boost_intrusive_list_printer It is needed to print the boost::intrusive::list which is used by repair_meta_for_masters in repair. Fixes #7037 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Signed-off-by: Asias He <asias@scylladb.com>	2020-08-15 20:26:02 +03:00
Piotr Jastrzebski	c001374636	codebase wide: replace count with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously `count` function was often used in various ways. `contains` does not only express the intend of the code better but also does it in more unified way. This commit replaces all the occurences of the `count` with the `contains`. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>	2020-08-15 20:26:02 +03:00
Konstantin Osipov	6d7393b0df	build: Make it possible to opt-out from building the packages. Message-Id: <20200812162845.852515-1-kostja@scylladb.com>	2020-08-15 20:26:02 +03:00
Tomasz Grabiec	db1c8c439a	schema: Add table name and schema version to error messages	2020-08-14 14:35:09 +02:00
Tomasz Grabiec	817c2e0508	schema: Use on_internal_error() for range check errors	2020-08-14 14:35:09 +02:00
Tomasz Grabiec	43d503102b	schema: Fix off-by-one in column range check We'd fail in std::vector::at() instead. Let's catch all invalid accesses, as intended.	2020-08-14 14:34:51 +02:00
Tomasz Grabiec	b41f2c719b	schema: Make range checks for regular and static columns the same as for clustering columns	2020-08-14 14:34:51 +02:00
Pekka Enberg	3c1db2fb87	configure.py: Copy tarballs to build/dist/<mode>/tar directory	2020-08-14 13:06:13 +03:00
Pekka Enberg	5a2d271df8	configure.py: Add "dist-<component>-tar" targets	2020-08-14 13:06:13 +03:00
Pekka Enberg	e4685020ba	reloc/python3: Add "--builddir" to build_deb.sh Add a "--builddir" command line option to build_deb.sh script of Python 3 so that we can use it to control artifact build location.	2020-08-14 13:06:13 +03:00
Pekka Enberg	7adae6b04a	configure.py: Use copy-on-write copies when possible Pass the "--reflink=auto" command line option to "cp" to use copy-on-write copies whenever the filesystem supports it to reduce disk space usage.	2020-08-14 13:06:13 +03:00
Asias He	e6ceec1685	gossip: Fix race between shutdown message handler and apply_state_locally 1. The node1 is shutdown 2. The node1 sends shutdown message to node2 3. The node2 receives gossip shutdown message but the handler yields 4. The node1 is restarted 5. The node1 sends new gossip endpoint_state to node2, node2 applies the state in apply_state_locally and calls gossiper::handle_major_state_change and then calls gossiper::mark_alive 6. The shutdown message handler in step 3 resumes and sets status of node1 to SHUTDOWN 7. The gossiper::mark_alive fiber in step 5 resumes and calls gossiper::real_mark_alive, node2 will skip to mark node1 as alive because the status of node1 is SHUTDOWN. As a result, node1 is alive but it is not marked as UP by node2. To fix, we serialize the two operations. Fixes #7032	2020-08-13 11:06:04 +03:00
Nadav Har'El	ee7291aa88	merge: CDC: allow "full" preimage in logs Merged pull request https://github.com/scylladb/scylla/pull/7028 By Calle Wilund: Changes the "preimage" option from binary true/false to on/off/full (accepting true/false, and using old style notation for normal to string - for upgrade reasons), where "full" will force us to include all columns in pre image log rows. Adds small test (just adding the case to preimage test). Uses the feature in alternator Fixes #7030 alternator: Set "preimage" to "full" for streams cdc_test: Do small test of "full" cdc: Make pre image optionally "full" (include all columns)	2020-08-12 23:19:46 +03:00
Calle Wilund	730c5ea283	alternator: Set "preimage" to "full" for streams Fixes #7030 Dynamo/alternator streams old image data is supposed to contain the full old value blob (all keys/values). Setting preimage=full ensures we get even those properties that have separate columns if they are not part of an actual modification.	2020-08-12 16:05:00 +00:00
Calle Wilund	8cc5076033	cdc_test: Do small test of "full" Not a huge test change, but at least verifies it works.	2020-08-12 16:04:52 +00:00
Calle Wilund	2eb4522fef	cdc: Make pre image optionally "full" (include all columns) Makes the "preimage" option for cdc non-binary, i.e. it can now be "true"/"on", "false"/"off" or "full. The two former behaving like previously, the latter obviously including all columns in pre image.	2020-08-12 16:03:06 +00:00
Avi Kivity	79851d6216	Update tools/java submodule * tools/java f2c7cf8d8d...d6c0ad1e2e (3): > sstableloader: Preserve droppedColumns in column rename handling > Revert "reloc: Build relocatable package without Maven" > reloc: Build relocatable package without Maven	2020-08-12 16:58:45 +03:00
Takuya ASADA	7cccb018b8	aws: update enhanced networking supported instance list Sync enhanced networking supported instance list to latest one. Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html Fixes #6991	2020-08-12 15:43:17 +03:00
Nadav Har'El	8135647906	merge: Add metrics to semaphores Merged pull request https://github.com/scylladb/scylla/pull/7018 by Piotr Sarna: This series addresses various issues with metrics and semaphores - it mainly adds missing metrics, which makes it possible to see the length of the queues attached to the semaphores. In case of view building and view update generation, metrics was not present in these services at all, so a first, basic implementation is added. More precise semaphore metrics would ease the testing and development of load shedding and admission control. view_builder: add metrics db, view: add view update generator metrics hints: track resource_manager sending queue length hints: add drain queue length to metrics table: add metrics for sstable deletion semaphore database: remove unused semaphore	2020-08-12 12:39:59 +03:00
Botond Dénes	4cfab59eb1	scylla-gdb.py: find_db(): don't return current shard's database for shard=0 The `shard` parameter of `find_db()` is optional and is defaulted to `None`. When missing, the current shard's database instance is returned. The problem is that the if condition checking this uses `not shard`, which also evaluates to `True` if `shard == 0`, resulting in returning the current shard's database instance for shard 0. Change the condition to `shard is None` to avoid this. Fixes: #7016 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200812091546.1704016-1-bdenes@scylladb.com>	2020-08-12 12:22:46 +03:00
Avi Kivity	736863c385	Merge "repair: Add progress metrics for node ops" from Asias " This series adds progress metrics for the node operations. Metrics for bootstrap and rebuild progress are added as a starter. I will add more for the remaining operations after getting feedback. With this the Scylla Monitor and Scylla Manager can know the progress of the bootstrap and other node operations. E.g., scylla_node_ops_bootstrap_nr_ranges_finished{shard="0",type="derive"} 50 scylla_node_ops_bootstrap_nr_ranges_total{shard="0",type="derive"} 1040 Fixes #1244, #6733 " * 'repair_progress_metrics_v3' of github.com:asias/scylla: repair: Add progress metrics for repair ops repair: Add progress metrics for rebuild ops repair: Add progress metrics for bootstrap ops	2020-08-12 11:42:14 +03:00
Avi Kivity	8853eddaf6	Merge 'repair: Track repair_meta created on both repair follower and master' from Asias " It is pretty hard to find the repair_meta object when debugging a core. This patch makes it is easier by putting repair_meta object created by both repair follower and master into a map. Fixes #7009 " * asias-repair_make_debug_eaiser_track_all_repair_metas: repair: Add repair_meta_tracker to track repair_meta for followers and masters repair: Move thread local object _repair_metas out of the function	2020-08-12 11:01:32 +03:00
Botond Dénes	1d48442ae7	test/lib/mutation_source_test: test-monotonic-positions: test the reader-under-test Instead of always testing `flat_mutation_reader_from_mutations()`. Tests: unit(dev, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200812073406.1681250-1-bdenes@scylladb.com>	2020-08-12 10:52:26 +03:00
Avi Kivity	24aa03a13c	Merge "Move some test code out of line" (sstable_run_based_compaction_strategy_for_test) from Rafael * 'espindola/move-out-of-line' of https://github.com/espindola/scylla: test: Move code in sstable_run_based_compaction_strategy_for_tests.hh out of line test: Drop ifdef now that we always use c++20 test: Move sstable_run_based_compaction_strategy_for_tests.hh to test/lib	2020-08-12 10:46:40 +03:00
Asias He	e9a520a22b	repair: Add repair_meta_tracker to track repair_meta for followers and masters It is pretty hard to find the repair_meta object when debugging a core. This patch makes it is easier by putting repair_meta object created by both repair follower and master into boost intrusive list. Fixes #7009	2020-08-12 15:44:22 +08:00
Asias He	58f4c730b0	repair: Move thread local object _repair_metas out of the function It is a lot of pain to access _repair_metas when debugging. Refs #7009	2020-08-12 11:23:18 +08:00
Rafael Ávila de Espíndola	aa2476d7ac	test: Move code in sstable_run_based_compaction_strategy_for_tests.hh out of line Most of this is virtual and it is all test code. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-11 11:49:49 -07:00
Rafael Ávila de Espíndola	ef6a52a407	test: Drop ifdef now that we always use c++20 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-11 11:49:20 -07:00
Rafael Ávila de Espíndola	bd2f9fc685	test: Move sstable_run_based_compaction_strategy_for_tests.hh to test/lib This is in preparation to moving the code to a .cc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-11 11:48:41 -07:00
Avi Kivity	f158d056e8	Update seastar submodule * seastar e615054c75...c872c3408c (5): > future-util: Pass a rvalue reference to repeat > tutorial: service_loop: do not return handle_connection future > future-util: Drop redundant make_tuple call > future-util: Pass an rvalue reference to the repeater constructor > allow move assign empty expiring_fifo	2020-08-11 19:53:36 +03:00
Benny Halevy	6deba1d0b4	test: cql_query_test: test_cache_bypass: use table stats test is currently flaky since system reads can happen in the background and disturb the global row cache stats. Use the table's row_cache stats instead. Fixes #6773 Test: cql_query_test.test_cache_bypass(dev, debug) Credit-to: Botond Dénes <bdenes@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200811140521.421813-1-bhalevy@scylladb.com>	2020-08-11 19:52:16 +03:00
Piotr Sarna	5086a5ca32	view_builder: add metrics The view builder service lacked metrics, so a basic set of them is added.	2020-08-11 17:43:53 +02:00
Piotr Sarna	e4d78b60ff	db, view: add view update generator metrics The view update generator completely lacked metrics, so a basic set of them is now exposed.	2020-08-11 17:43:53 +02:00
Piotr Sarna	180a1505fd	hints: track resource_manager sending queue length The number of tasks waiting for a hint to be sent is now tracked.	2020-08-11 17:43:53 +02:00
Piotr Sarna	58a9fa7d2e	hints: add drain queue length to metrics The number of tasks waiting for a drain is now tracked.	2020-08-11 17:43:53 +02:00
Piotr Sarna	8b56b24737	table: add metrics for sstable deletion semaphore It's now possible to read the number of tasks waiting on the sstable deletion semaphore.	2020-08-11 17:43:53 +02:00
Benny Halevy	13f437157a	compaction_manager: register_compacting_sstables: allocate before registering sstables make all required allocations in advance to merging sstables into _compacting_sstables so it should not throw after registering some sstables, but not all. Test: database_test(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200811132440.416945-1-bhalevy@scylladb.com>	2020-08-11 18:14:58 +03:00
Botond Dénes	4ab4619341	auth: common: separate distributed query timeout for debug builds Currently when running against a debug build, our integration test suite suffers from a ton of timeout related error logs, caused by auth queries timing out. This causes spurious test failures due to the unexpected error messages in the log. This patch increases the timeout for internal distributed auth queries in debug mode, to give the slow debug builds more headroom to meet the timeout. Refs: #6548 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200811145757.1593350-1-bdenes@scylladb.com>	2020-08-11 18:07:53 +03:00
Avi Kivity	58104d17e0	Merge 'transport: Allow user to disable unencrypted native transport' from Pekka " Let users disable the unencrypted native transport too by setting the port to zero in the scylla.yaml configuration file. Fixes #6997 " * penberg-penberg/native-transport-disable: docs/protocol: Document CQL protocol port configuration options transport: Allow user to disable unencrypted native transport	2020-08-11 16:30:52 +03:00
Avi Kivity	d36601a838	Merge 'Make commitlog respect disk limit better' from Calle " Refs #6148 Separates disk usage into two cases: Allocated and used. Since we use both reserve and recycled segments, both which are not actually filled with anything at the point of waiting. Also refuses to recycle segments or increase reserve size if our current disk footprint exceeds threshold. And finally uses some initial heuristics to determine when we should suggest flushing, based on disk limit, segment size, and current usage. Right now, when we only have a half segment left before hitting used == max. Some initial tests show an improved adherence to limit though it will still be exceeded, because we do _not_ force waiting for segments to become cleared or similar if we need to add data, thus slow flushing can still make usage create extra segments. We will however attempt to shrink disk usage when load is lighter. Somewhat unclear how much this impacts performance with tight limits, and how much this matters. " * elcallio-calle/commitlog_size: commitlog: Make commitlog respect disk limit better commitlog: Demote buffer write log messages to trace	2020-08-11 15:03:32 +03:00
Dejan Mircevski	013893b08d	auth: Drop needless role-manager check The service constructor included a check ensuring that only standard_role_manager can be used with password_authenticator. But after `00f7bc6`, password_authenticator does not depend on any action of standard_role_manager. All queries to meta::roles_table in password_authenticator seem self-contained: the table is created at the start if missing, and salted_hash is CRUDed independently of any other columns bar the primary key role_col_name. NOTE: a nonstandard role manager may not delete a role's row in meta::roles_table when that role is dropped. This will result in successful authentication for that non-existing role. But the clients call check_user_can_login() after such authentication, which in turn calls role_manager::exists(role). Any correctly implemented role manager will then return false, and authentication_exception will be thrown. Therefore, no dependencies exist on the role-manager behaviour, other than it being self-consistent. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-08-11 14:56:18 +03:00
Avi Kivity	4547949420	Merge "Fix repair stalls in get_sync_boundary and apply_rows_on_master_in_thread" from Asias " This path set fixes stalls in repair that are caused by std::list merge and clear operations during test_latency_read_with_nemesis test. Fixes #6940 Fixes #6975 Fixes #6976 " * 'fix_repair_list_stall_merge_clear_v2' of github.com:asias/scylla: repair: Fix stall in apply_rows_on_master_in_thread and apply_rows_on_follower repair: Use clear_gently in get_sync_boundary to avoid stall utils: Add clear_gently repair: Use merge_to_gently to merge two lists utils: Add merge_to_gently	2020-08-11 14:52:23 +03:00
Botond Dénes	db5926134a	sstables: sstable_mutation_reader: read_partition(): include more information in exception Resolve the FIXME to help investigating related issues and include the position of the consumer in the error message. Refs: #6529 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200811111101.1576222-1-bdenes@scylladb.com>	2020-08-11 14:52:04 +03:00
Asias He	c65ad02fcd	repair: Fix stall in apply_rows_on_master_in_thread and apply_rows_on_follower The row_diff list in apply_rows_on_master_in_thread and apply_rows_on_follower can be large. Modify do_apply_rows to remove the row from the list when the row is consumed to avoid stall when the list is destroyed. Fixes #6975	2020-08-11 19:37:47 +08:00
Asias He	9f4b3a5fa6	repair: Use clear_gently in get_sync_boundary to avoid stall The _row_buf and _working_row_buf list can be large. Use clear_gently helper to avoid stalls. Fixes #6940	2020-08-11 19:37:47 +08:00
Asias He	3e8c4a6788	utils: Add clear_gently A helper to clear a list without stall. Refs #6975 Refs #6940	2020-08-11 19:37:47 +08:00
Calle Wilund	ed86e870ee	docs/cdc.md: Add short explanation of stream ID bit composition Bit layout, sort order and field usage of CDC stream ids.	2020-08-11 14:09:45 +03:00
Avi Kivity	41a75f2b99	Merge "make do_io_check path noexcept" from Benny " Make do_io_check and the io_check functions that call it noexcept. Up to sstable_write_io_check and sstable_touch_directory_io_check. Tests: unit (dev) " * tag 'io-check-noexcept-v1' of github.com:bhalevy/scylla: ssstable: io_check functions: make noexcept utils: do_io_check: adjust indentation utils: io_check: make noexcept for future-returning functions	2020-08-11 13:41:20 +03:00
Calle Wilund	5d044ab74e	commitlog: Make commitlog respect disk limit better Refs #6148 Separates disk usage into two cases: Allocated and used. Since we use both reserve and recycled segments, both which are not actually filled with anything at the point of waiting. Also refuses to recycle segments or increase reserve size if our current disk footprint exceeds threshold. And finally uses some initial heuristics to determine when we should suggest flushing, based on disk limit, segment size, and current usage. Right now, when we only have a half segment left before hitting used == max. Some initial tests show an improved adherence to limit though it will still be exceeded, because we do _not_ force waiting for segments to become cleared or similar if we need to add data, thus slow flushing can still make usage create extra segments. We will however attempt to shrink disk usage when load is lighter. Somewhat unclear how much this impacts performance with tight limits, and how much this matters. v2: * Add some comments/explanations v3: * Made disk footprint subtract happen post delete (non-optimistic)	2020-08-11 10:40:56 +00:00
Avi Kivity	3530e80ce1	Merge "Support md format" from Benny " This series adds support for the "md" sstable format. Support is based on the following: * do not use clustering based filtering in the presence of static row, tombstones. * Disabling min/max column names in the metadata for formats older than "md". * When updating the metadata, reset and disable min/max in the presence of range tombstones (like Cassandra does and until we process them accurately). * Fix the way we maintain min/max column names by: keeping whole clustering key prefixes as min/max rather than calculating min/max independently for each component, like Cassandra does in the "md" format. Fixes #4442 Tests: unit(dev), cql_query_test -t test_clustering_filtering* (debug) md migration_test dtest from git@github.com:bhalevy/scylla-dtest.git migration_test-md-v1 " * tag 'md-format-v4' of github.com:bhalevy/scylla: (27 commits) config: enable_sstables_md_format by default test: cql_query_test: add test_clustering_filtering unit tests table: filter_sstable_for_reader: allow clustering filtering md-format sstables table: create_single_key_sstable_reader: emit partition_start/end for empty filtered results table: filter_sstable_for_reader: adjust to md-format table: filter_sstable_for_reader: include non-scylla sstables with tombstones table: filter_sstable_for_reader: do not filter if static column is requested table: filter_sstable_for_reader: refactor clustering filtering conditional expression features: add MD_SSTABLE_FORMAT cluster feature config: add enable_sstables_md_format database: add set_format_by_config test: sstable_3_x_test: test both mc and md versions test: Add support for the "md" format sstables: mx/writer: use version from sstable for write calls sstables: mx/writer: update_min_max_components for partition tombstone sstables: metadata_collector: support min_max_components for range tombstones sstable: validate_min_max_metadata: drop outdated logic sstables: rename mc folder to mx sstables: may_contain_rows: always true for old formats sstables: add may_contain_rows ...	2020-08-11 13:29:11 +03:00
Piotr Jastrzebski	80e3923b3c	codebase wide: replace find(...) != end() with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously the code pattern looked like: <collection>.find(<element>) != <collection>.end() In C++20 the same can be expressed with: <collection>.contains(<element>) This is not only more concise but also expresses the intend of the code more clearly. This commit replaces all the occurences of the old pattern with the new approach. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>	2020-08-11 13:28:50 +03:00
Avi Kivity	55cf219c97	Merge "sstable: close files on error" from Benny " Make sure to close sstable files also on error paths. Refs #5509 Fixes #6448 Tests: unit (dev) " * tag 'sstable-close-files-on-error-v6' of github.com:bhalevy/scylla: sstable: file_writer: auto-close in destructor sstable: file_writer: add optional filename member sstable: add make_component_file_writer sstable: remove_by_toc_name: accept std::string_view sstable: remove_by_toc_name: always close file and input stream sstable: delete_sstables: delete outdated FIXME comment sstable: remove_by_toc_name: drop error_handler parameter sstable: remove_by_toc_name: make static sstable: read_toc: always close file sstable: mark read_toc and methods calling it noexcept sstable: read_toc: get rid of file_path sstable: open_data, create_data: set member only on success. sstable: open_file: mark as noexcept sstable: new_sstable_component_file: make noexcept sstable: new_sstable_component_file: close file on failure sstable: rename_new_sstable_component_file: do not pass file sstable: open_sstable_component_file_non_checked: mark as noexcept sstable: open_integrity_checked_file_dma: make noexcept sstable: open_integrity_checked_file_dma: close file on failure	2020-08-11 13:28:50 +03:00
Pekka Enberg	4a02e0c3c0	docs/protocol: Document CQL protocol port configuration options	2020-08-11 13:15:24 +03:00
Pekka Enberg	e401a26701	transport: Allow user to disable unencrypted native transport Let users disable the unencrypted native transport too by setting the port to zero in the scylla.yaml configuration file. Fixes #6997	2020-08-11 13:15:17 +03:00
Asias He	97d47bffa5	repair: Add progress metrics for repair ops The following metric is added: scylla_node_maintenance_operations_repair_finished_percentage{shard="0",type="gauge"} 0.650000 It is the number of finished percentage for all ongoing repair operations. When all ongoing repair operations finish, the percentage stays at 100%. Fixes #1244, #6733	2020-08-11 18:15:10 +08:00
Botond Dénes	b11d181413	scylla-gdb.py: restore python2 compatibility Although python2 should be a distant memory by now, the reality is that we still need to debug scylla on platforms that still have no python3 available (centos7), so we need to keep scylla-gdb.py python2 compatible. Refs: #7014 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200811093753.1567689-1-bdenes@scylladb.com>	2020-08-11 12:55:42 +03:00
Nadav Har'El	796ad24f37	docs: correct typo in maintainers.md maintainers.md contains a very helpful explanation of how to backport Seastar fixes to old branches of Scylla, but has a tiny typo, which this patch corrects. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200811095350.77146-1-nyh@scylladb.com>	2020-08-11 12:54:41 +03:00
Takuya ASADA	6fbbe836c1	scylla_raid_setup: use mdadm.service on older Debian variants On older Debian variants does not have mdmonitor.service, we should use mdadm.service instead. Fixes #7000	2020-08-11 12:52:24 +03:00
Calle Wilund	a6ad70d3da	cdc:stream_id: Encode format version + vnode grouping/index in id Fixes #6948 Changes the stream_id format from <token:64>:<rand:64> to <token:64>:<rand:38><index:22><version:4> The code will attempt to assert version match when presented with a stored id (i.e. construct from bytes). This means that ID:s created by previous (experimental) versions will break. Moves the ID encoding fully into the ID class, and makes the code path private for the topology generation code path. Removes some superflous accessors but adds accessors for token, version and index. (For alternator etc).	2020-08-11 12:48:04 +03:00
Calle Wilund	9167d1ac76	commitlog: Demote buffer write log messages to trace Because they become very plentiful and annoying when one tries to analyze segment behaviour. More so in batch mode.	2020-08-11 09:18:23 +00:00
Piotr Sarna	3b8fd11fa3	database: remove unused semaphore A semaphore for limiting the number of loaded sstables is completely unused, so it can be removed.	2020-08-11 09:48:12 +02:00
Asias He	53fee789f0	repair: Use merge_to_gently to merge two lists During a performance test, test_latency_read_with_nemesis during manager repair, it experienced a stall of 73 ms: ``` (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >::operator=(repair_row const&) at /usr/include/c++/9/bits/stl_iterator.h:515 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > std::__copy_move<false, false, std::bidirectional_iterator_tag>::__copy_m<std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > >(std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >) at /usr/include/c++/9/bits/stl_algobase.h:312 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > std::__copy_move_a<false, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > >(std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >) at /usr/include/c++/9/bits/stl_algobase.h:404 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > std::__copy_move_a2<false, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > >(std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >) at /usr/include/c++/9/bits/stl_algobase.h:440 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > std::copy<std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > >(std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >) at /usr/include/c++/9/bits/stl_algobase.h:474 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > std::__merge<std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >, __gnu_cxx::__ops::_Iter_comp_iter<repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int)::{lambda(repair_row const&, repair_row const&)#1}> >(std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >, std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, __gnu_cxx::__ops::_Iter_comp_iter<repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int)::{lambda(repair_row const&, repair_row const&)#1}>, __gnu_cxx::__ops::_Iter_comp_iter<repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int)::{lambda(repair_row const&, repair_row const&)#1}>) at /usr/include/c++/9/bits/stl_algo.h:4923 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > std::merge<std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >, repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int)::{lambda(repair_row const&, repair_row const&)#1}>(std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >, std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int)::{lambda(repair_row const&, repair_row const&)#1}, repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int)::{lambda(repair_row const&, repair_row const&)#1}) at /usr/include/c++/9/bits/stl_algo.h:5018 (inlined by) repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int) at ./repair/row_level.cc:1242 repair_meta::get_row_diff_source_op(seastar::bool_class<update_peer_row_hash_sets_tag>, gms::inet_address, unsigned int, seastar::rpc::sink<repair_hash_with_cmd>&, seastar::rpc::source<repair_row_on_wire_with_cmd>&) at ./repair/row_level.cc:1608 repair_meta::get_row_diff_with_rpc_stream(std::unordered_set<repair_hash, std::hash<repair_hash>, std::equal_to<repair_hash>, std::allocator<repair_hash> >, seastar::bool_class<needs_all_rows_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, gms::inet_address, unsigned int) at ./repair/row_level.cc:1674 row_level_repair::get_missing_rows_from_follower_nodes(repair_meta&) at ./repair/row_level.cc:2413 ``` The problem was that when std::merge() ran out of one range, it copied the second range. To fix, use the new merge_to_gently helper. Fixes #6976	2020-08-11 10:37:34 +08:00
Asias He	0bf0019eeb	utils: Add merge_to_gently This helper is similar to std::merge but it runs inside a thread and does not stall. Refs #6976	2020-08-11 10:37:34 +08:00
Benny Halevy	e2340d0684	config: enable_sstables_md_format by default Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 19:19:32 +03:00
Benny Halevy	0d85ceaf37	test: cql_query_test: add test_clustering_filtering unit tests Add unit tests reproducing https://github.com/scylladb/scylla/issues/3552 with clustering-key filtering enabled. enable_sstables_md_format option is set to true as clustering-key filtering is enabled only for md-format sstables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 19:19:32 +03:00
Benny Halevy	7cfca519cb	table: filter_sstable_for_reader: allow clustering filtering md-format sstables Now that it is safe to filter md format sstable by min/max column names we can remove the `filtering_broken` variable that disabled filtering in `19b76bf75b` to fix #4442. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 19:19:32 +03:00
Benny Halevy	ab67629ea6	table: create_single_key_sstable_reader: emit partition_start/end for empty filtered results To prevent https://github.com/scylladb/scylla/issues/3552 we want to ensure that in any case that the partition exists in any sstable, we emit partition_start/end, even when returning no rows. In the first filtering pass, filter_sstable_for_reader_by_pk filters the input sstables based on the partition key, and num_sstables is set the size of the sstables list after the first filtering pass. An empty sstables list at this stage means there are indeed no sstables with the required partition so returning an empty result will leave the cache in the desired state. Otherwise, we filter again, using filter_sstable_for_reader_by_ck, and examine the list of the remaining readers. If num_readers != num_sstables, we know that some sstables were filterd by clustering key, so we append a flat_mutation_reader_from_mutations to the list of readers and return a combined reader as before. This will ensure that we will always have a partition_start/end mutations for the queried partition, even if the filtered readers emit no rows. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 19:19:32 +03:00
Benny Halevy	a672747da3	table: filter_sstable_for_reader: adjust to md-format With the md sstable format, min/max column names in the metadata now track clustering rows (with or without row tombstones), range tombstones, and partition tombstones (that are reflected with empty min/max column names - indicating the full range). As such, min and max column names may be of different lengths due to range tombstones and potentially short clustering key prefixes with compact storage, so the current matching algorithm must be changed to take this into account. To determine if a slice range overlaps the min/max range we are using position_range::overlaps. sstable::clustering_components_ranges was renamed to position_range as it now holds a single position_range rather than a vector of bytes_view ranges. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 19:19:30 +03:00
Benny Halevy	90d0fea7df	table: filter_sstable_for_reader: include non-scylla sstables with tombstones Move contains_rows from table code to sstable::may_contain_rows since its implementation now has too specific knowledge of sstable internals. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	2a57ec8c3d	table: filter_sstable_for_reader: do not filter if static column is requested Static rows aren't reflected in the sstable min/max clustering keys metadata. Since we don't have any indication in the metadata that the sstable stores static rows, we must read all sstables if a static column is requested. Refs #3553 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	2fed3f472c	table: filter_sstable_for_reader: refactor clustering filtering conditional expression We're about to drop `filtering_broken` in a future patche when clustering filtering can be supported for md-format sstables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	e8d7744040	features: add MD_SSTABLE_FORMAT cluster feature Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	65239a6e50	config: add enable_sstables_md_format MD format is disabled by default at this point. The option extends enable_sstables_mc_format so that both are needed to be set for supporting the md format. The MD_FORMAT cluster feature will be added in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	8e0e2c8a48	database: add set_format_by_config This is required for test applications that may select a sstable format different than the default mc format, like perf_fast_forward. These apps don't use the gossip-based sstables_format_selector to set the format based on the cluster feature and so they need to rely on the db config. Call set_format_by_config in single_node_cql_env::do_with. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	d77ceba498	test: sstable_3_x_test: test both mc and md versions Run the test cases that write sstables using both the mc and md versions. Note that we can still compare the resulting Data, Index, Digest, and Filter components with the prepared mc sstables we have since these haven't changed in md. We take special consideration around validating min/max column names that are now calculated using a revised algorithm in the md format. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Pekka Enberg	3168be3483	test: Add support for the "md" format Test also the md format in all_sstable_versions. Add pre-computed md-sstable files generated using Cassandra version 3.11.7 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	e44ec45ab9	sstables: mx/writer: use version from sstable for write calls Rather than using a constant sstable_version_types::mc. In preparation to supporting sstable_version_types::md. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	bd4383a842	sstables: mx/writer: update_min_max_components for partition tombstone Partition tombstones represent an implicit clustering range that is unbound on both sides, so reflect than in min/max column names metadata using empty clustering key prefixes. If we don't do that, when using the sstable for filtering, we have no other way of distinguishing range tombstones from partition tombstones given the sstable metadata and we would need to include any sstable with tombstones, even if those are range tombstone, for which we can do a better filtering job, using the sstable min/max column names metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	68acae5873	sstables: metadata_collector: support min_max_components for range tombstones We essentially treat min/max column names as range bounds with min as incl_start and max as incl_end. By generating a bound_view for min/max column names on the fly, we can correctly track and compare also short clustering key prefixes that may be used as bounds for range tombstones. Extend the sstable_tombstone_metadata_check unit test to cover these cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	34fb95dacf	sstable: validate_min_max_metadata: drop outdated logic The following checks were introduced in `0a5af61176` To deal with a bug in min max metadata generation of our own, from a time where only ka / la were supported. This is no longer relevant now that we'll consider min_max_column_names only for sstable format > mc (in sstable::may_contain_rows) We choose not to clear_incorrect_min_max_column_names from older versions here as this disturbs sstable unit tests. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	12393c5ec2	sstables: rename mc folder to mx Prepare for supporting the md format. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	7139fb92e6	sstables: may_contain_rows: always true for old formats the min/max column names metadata can be trusted only starting the md format, so just always return `true` for older sstable formats. Note that we could achieve that by clearing the min/max metadata in set_clustering_components_ranges but we choose not to do so since it disturbs sstable unit tests Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	200d8d41d9	sstables: add may_contain_rows Move the logic from table to sstable as it will contain intimate knowledge of the sstable min/max column names validity for md format. Also, get rid of the sstable::clustering_components_ranges() method as the member is used only internally by the sstable code now. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Pekka Enberg	a37eaaa022	sstables: Add support for the "md" format enum value Add the sstable_version_types::md enum value and logically extend sstable_version_types comparisons to cover also the > sstable_version_types::mc cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	7de004d42a	sstables: version: delete unused is_latest_supported predicate Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	025b74e20e	sstables: metadata_collector: use empty key to represent full min/max range Instead of keeping the `_has_min_max_clustering_keys` flag, just store an empty key for `_{min,max}_clustering_key` to represent the full range. These will never be narrowed down and will be encoded as empty min/max column names as if they weren't set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Benny Halevy	9f114d821a	sstables: keep whole clustering_key_prefix as min/max_column_names Currently we compare each min/max component independently. This may lead to suboptimal, inclusive clustering ranges that do not indicate any actual key we encountered. For example: ['a', 2], ['b', 1] will lead to min=['a', 1], max=['b', 2] instead of the keys themselves. This change keeps the min or max keys as a whole. It considers shorter clustering prefixes (that are possible with compact storage) as range tombstone bounds, so that a shorter key is considered less than the minimum if the latter has a common prefix, and greater than the maximum if the latter has a common prefix. Extend the min_max_clustering_key_test to test for this case. Previously {"a", "2"}, {"b", "1"} clustering keys would erronuously end up with min={"a", "1"} max={"b", "2"} while we want them to be min={"a", "2"} max={"b", "1"}. Adjust sstable_3_x_test to ignore original mc sstables that were previously computed with different min/max column names. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:03 +03:00
Benny Halevy	707b098f44	sstables: metadata_collector: construct with schema Pass the sstable schema to the metadata_collector constructor. Note that the long term plan is to move metadata_collector to the sstable writer but this requires a bigger change to get rid of the dependencies on it in the legacy writer code in class sstable methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:52:43 +03:00
Benny Halevy	c9cade833c	sstables: metadata_collector: make only for write path make a metadata_collector only when writing the sstable, no need to make one when reading. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:51:12 +03:00
Rafael Ávila de Espíndola	74db08165d	tests: Convert to using memory::with_allocation_failures Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200805155143.122396-1-espindola@scylladb.com>	2020-08-10 18:37:42 +03:00
Piotr Jastrzebski	52ec0c683e	codebase wide: replace erase + remove_if with erase_if C++20 introduced std::erase_if which simplifies removal of elements from the collection. Previously the code pattern looked like: <collection>.erase( std::remove_if(<collection>.begin(), <collection>.end(), <predicate>), <collection>.end()); In C++20 the same can be expressed with: std::erase_if(<collection>, <predicate>); This commit replaces all the occurences of the old pattern with the new approach. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <6ffcace5cce79793ca6bd65c61dc86e6297233fd.1597064990.git.piotr@scylladb.com>	2020-08-10 18:17:38 +03:00
Calle Wilund	9620755c7f	database: Do not assert on replay positions if truncate does not flush Fixes #6995 In `c2c6c71` the assert on replay positions in flushed sstables discarded by truncate was broken, by the fact that we no longer flush all sstables unless auto snapshot is enabled. This means the low_mark assertion does not hold, because we maybe/probably never got around to creating the sstables that would hold said mark. Note that the (old) change to not create sstables and then just delete them is in itself good. But in that case we should not try to verify the rp mark.	2020-08-10 18:17:38 +03:00
Avi Kivity	f9aea94c5c	Merge 'add out of box configs for GCP VMs with nvmes' from Lubos " not recommended setups will still run iotune fixes #6631 " * tarzanek-gcp-iosetup: scylla_io_setup: Supported GCP VMs with NVMEs get out of box I/O configs scylla_util.py: add support for gcp instances scylla_util.py: support http headers in curl function scylla_io_setup: refactor iotune run to a function	2020-08-10 18:17:38 +03:00
Avi Kivity	188c832e3d	Merge 'scylla_swap_setup improvement' from Takuya " As I described at https://github.com/scylladb/scylla/issues/6973#issuecomment-669705374, we need to avoid disk full on scylla_swap_setup, also we should provide manual configuration of swap size. This PR provides following things: - use psutil to get memtotal and disk free, since it provides better APIs - calculate swap size in bytes to avoid causing error on low-memory environment - prevent to use more than 50% of disk space when auto-configured swap size, abort setup when almost no disk space available (less than 2GB) - add --swap-size option to specify swap size both on scylla_swap_setup and scylla_setup - add interactive swap size prompt on scylla_setup Fixes #6947 Related #6973 Related scylladb/scylla-machine-image#48 " * syuu1228-scylla_swap_setup_improves: scylla_setup: add swap size interactive prompt on swap setup scylla_swap_setup: add --swap-size option to specify swap size scylla_swap_setup: limit swapfile size to half of diskspace scylla_swap_setup: calculate in bytes instead of GB scylla_swap_setup: use psutil to get memtotal and disk free	2020-08-10 18:17:38 +03:00
Botond Dénes	1e7cf27f1f	scylla-gdb.py: scylla find: add option to include freed objects Sometimes (e.g. when investigating a suspected heap use-after-free) it is useful to include dead objects in the search results. This patch adds a new option to scylla find to enable just that. Also make sure we save and print the offset of the address in the object for dead objects, just like we do for live ones. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200810091202.1401231-1-bdenes@scylladb.com>	2020-08-10 18:17:38 +03:00
Takuya ASADA	48944adc72	scylla_setup: add swap size interactive prompt on swap setup Fixes #6947	2020-08-10 13:53:20 +03:00
Takuya ASADA	e9f688b0e8	scylla_swap_setup: add --swap-size option to specify swap size Add --swap-size option to allow user to customize swap size.	2020-08-10 13:53:20 +03:00
Takuya ASADA	1fa0886ac0	scylla_swap_setup: limit swapfile size to half of diskspace We should not fill entire disk space with swapfile, it's safer to limit swap size 50% of diskspace. Also, if 50% of diskspace <= 1GB, abort setup since it's out of disk space.	2020-08-10 13:53:20 +03:00
Takuya ASADA	7f5c8d6553	scylla_swap_setup: calculate in bytes instead of GB Converting memory & disk sizes to an int value of N gigabytes was too rough, it become problematic in low memory size / low disk size environment, such as some types of EC2 instances. We should calculate in bytes.	2020-08-10 13:53:19 +03:00
Takuya ASADA	b21bed701b	scylla_swap_setup: use psutil to get memtotal and disk free To get better API of memory & disk statistics, switch to psutil. With the library we don't need to parse /proc/meminfo. [avi: regenerate tools/toolchain/image for new python3-psutils package]	2020-08-10 13:50:09 +03:00
Benny Halevy	25c1a16f8e	sstables: move column_name_helper to metadata_collector.cc It is used only for updating the metadata_collector {min,max}_column_names. Implement metadata_collector::do_update_min_max_components in sstables/metadata_collector.cc that will be used to host some other metadata_collector methods in following patches that need not be implemented in the header file. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 13:27:26 +03:00
Asias He	e3c2d08f4f	repair: Add progress metrics for rebuild ops The following metric is added: scylla_node_maintenance_operations_rebuild_finished_percentage{shard="0",type="gauge"} 0.650000 It is the number of finished percentage for rebuild operation so far. Fixes #1244, #6733	2020-08-10 15:45:37 +08:00
Asias He	b23f65d1d9	repair: Add progress metrics for bootstrap ops The following metric is added: scylla_node_maintenance_operations_bootstrap_finished_percentage{shard="0",type="gauge"} 0.850000 It is the number of finished percentage for bootstrap operation so far. Fixes #1244, #6733	2020-08-10 15:45:37 +08:00
Benny Halevy	60873d2360	sstable: file_writer: auto-close in destructor Otherwise we may trip the following assert(_closing_state == state::closed); in ~append_challenged_posix_file_impl when the output_stream is destructed. Example stack trace: non-virtual thunk to seastar::append_challenged_posix_file_impl::~append_challenged_posix_file_impl() at /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/include/seastar/core/future.hh:944 seastar::shared_ptr_count_for<checked_file_impl>::~shared_ptr_count_for() at crtstuff.c:? seastar::shared_ptr<seastar::file_impl>::~shared_ptr() at /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/include/seastar/core/future.hh:944 (inlined by) seastar::file::~file() at /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/include/seastar/core/file.hh:155 (inlined by) seastar::file_data_sink_impl::~file_data_sink_impl() at /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/src/core/fstream.cc:312 (inlined by) seastar::file_data_sink_impl::~file_data_sink_impl() at /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/src/core/fstream.cc:312 seastar::output_stream<char>::~output_stream() at crtstuff.c:? sstables::sstable::write_crc(sstables::checksum const&) [clone .cold] at sstables.cc:? sstables::mc::writer::close_data_writer() at crtstuff.c:? sstables::mc::writer::consume_end_of_stream() at crtstuff.c:? sstables::sstable::write_components(flat_mutation_reader, unsigned long, seastar::lw_shared_ptr<schema const>, sstables::sstable_writer_config const&, encoding_stats, seastar::io_priority_class const&)::{lambda()#1}::operator()() at sstables.cc:? Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 13:58:13 +03:00
Benny Halevy	d277ec2ab9	sstable: file_writer: add optional filename member To be used for reporting errors when failing to closing the output stream. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 13:58:13 +03:00
Benny Halevy	c5feeb7723	sstable: add make_component_file_writer Unify common code for file creation and file_writer construction for sstable components. It is defined as noexcept based on `new_sstable_component_file` and makes sure the file is closed on error by using `file_writer::make` that guarantees that. Will be used for auto-closing the writer as a RAII object. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 13:58:13 +03:00
Benny Halevy	34bf9ae5ed	sstable: remove_by_toc_name: accept std::string_view Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 13:57:48 +03:00
Benny Halevy	aacf69358a	sstable: remove_by_toc_name: always close file and input stream Get rid of seastar::async. Use seastar::with_file to make sure the opened file is always closed and move in.close() into a .finally continuation. While at it, make remove_by_toc_name noexcept. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 13:54:15 +03:00
Avi Kivity	8e87c52747	Update seastar submodule * seastar eb452a22a0...e615054c75 (13): > memory: fix small aligned free memory corruption Fixes #6831 > logger: specify log methods as noexcept > logger: mark trivial methods as noexcept > everywhere: Replace boost::optional with std::optional > httpd: fix Expect header case sensitivity > rpc: Avoid excessive casting in has_handler() helper > test: Increase capacity of fair-queue unit test case > file_io_test: Simplify a test with memory::with_allocation_failures > Merge 'Add HTTP/1.1 100 Continue' from Wojciech Fixes #6844 > future: Use "if constexpr" > future: Drop code that was avoiding a gcc 8 warning > file: specify alignment get methods noexcept > merge: Specify abort_source subscription handlers as noexcept	2020-08-09 13:17:22 +03:00
Piotr Sarna	29e2dc242a	row_cache: add tracing In order to improve tracing for the read path, cache is now also actively adding basic trace information. Example: select * from t where token(p) >= 42 and token(p) < 112; activity \| timestamp \| source \| source_elapsed \| client -----------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-08-07 13:10:34.694000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-08-07 13:10:34.694307 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2020-08-07 13:10:34.694377 \| 127.0.0.1 \| 70 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2020-08-07 13:10:34.694425 \| 127.0.0.1 \| 118 \| 127.0.0.1 Start querying token range [{42, start}, {112, start}] [shard 0] \| 2020-08-07 13:10:34.694432 \| 127.0.0.1 \| 125 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2020-08-07 13:10:34.694446 \| 127.0.0.1 \| 139 \| 127.0.0.1 Scanning cache for range [{42, start}, {112, start}] and slice {(-inf, +inf)} [shard 0] \| 2020-08-07 13:10:34.694454 \| 127.0.0.1 \| 147 \| 127.0.0.1 Querying is done [shard 0] \| 2020-08-07 13:10:34.694494 \| 127.0.0.1 \| 187 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-08-07 13:10:34.694520 \| 127.0.0.1 \| 213 \| 127.0.0.1 Request complete \| 2020-08-07 13:10:34.694221 \| 127.0.0.1 \| 221 \| 127.0.0.1 Example with cache miss: select * from t where p = 7; activity \| timestamp \| source \| source_elapsed \| client -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-08-07 13:25:04.363000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-08-07 13:25:04.363310 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2020-08-07 13:25:04.363384 \| 127.0.0.1 \| 74 \| 127.0.0.1 Creating read executor for token 1634052884888577606 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] \| 2020-08-07 13:25:04.363450 \| 127.0.0.1 \| 139 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2020-08-07 13:25:04.363455 \| 127.0.0.1 \| 145 \| 127.0.0.1 Start querying singular range {{1634052884888577606, pk{000400000007}}} [shard 0] \| 2020-08-07 13:25:04.363461 \| 127.0.0.1 \| 151 \| 127.0.0.1 Querying cache for range {{1634052884888577606, pk{000400000007}}} and slice {(-inf, +inf)} [shard 0] \| 2020-08-07 13:25:04.363490 \| 127.0.0.1 \| 180 \| 127.0.0.1 Range {{1634052884888577606, pk{000400000007}}} not found in cache [shard 0] \| 2020-08-07 13:25:04.363494 \| 127.0.0.1 \| 183 \| 127.0.0.1 Reading key {{1634052884888577606, pk{000400000007}}} from sstable /home/sarna/.ccm/scylla-1/node1/data/ks/t-f7b7a9b0d89f11eab650000000000000/mc-1-big-Data.db [shard 0] \| 2020-08-07 13:25:04.363522 \| 127.0.0.1 \| 211 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-f7b7a9b0d89f11eab650000000000000/mc-1-big-Index.db: scheduling bulk DMA read of size 16 at offset 0 [shard 0] \| 2020-08-07 13:25:04.363546 \| 127.0.0.1 \| 235 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-f7b7a9b0d89f11eab650000000000000/mc-1-big-Index.db: finished bulk DMA read of size 16 at offset 0, successfully read 16 bytes [shard 0] \| 2020-08-07 13:25:04.364406 \| 127.0.0.1 \| 1095 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-f7b7a9b0d89f11eab650000000000000/mc-1-big-Data.db: scheduling bulk DMA read of size 56 at offset 0 [shard 0] \| 2020-08-07 13:25:04.364445 \| 127.0.0.1 \| 1134 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-f7b7a9b0d89f11eab650000000000000/mc-1-big-Data.db: finished bulk DMA read of size 56 at offset 0, successfully read 56 bytes [shard 0] \| 2020-08-07 13:25:04.364599 \| 127.0.0.1 \| 1288 \| 127.0.0.1 Querying is done [shard 0] \| 2020-08-07 13:25:04.364685 \| 127.0.0.1 \| 1375 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-08-07 13:25:04.364719 \| 127.0.0.1 \| 1408 \| 127.0.0.1 Request complete \| 2020-08-07 13:25:04.364421 \| 127.0.0.1 \| 1421 \| 127.0.0.1 Example without cache for verification: select * from t where token(p) >= 42 and token(p) < 112 bypass cache; activity \| timestamp \| source \| source_elapsed \| client ------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-08-07 13:11:16.122000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-08-07 13:11:16.122657 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2020-08-07 13:11:16.122742 \| 127.0.0.1 \| 85 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2020-08-07 13:11:16.122806 \| 127.0.0.1 \| 149 \| 127.0.0.1 Start querying token range [{42, start}, {112, start}] [shard 0] \| 2020-08-07 13:11:16.122814 \| 127.0.0.1 \| 158 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2020-08-07 13:11:16.122829 \| 127.0.0.1 \| 172 \| 127.0.0.1 Querying is done [shard 0] \| 2020-08-07 13:11:16.122895 \| 127.0.0.1 \| 239 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-08-07 13:11:16.122928 \| 127.0.0.1 \| 271 \| 127.0.0.1 Request complete \| 2020-08-07 13:11:16.122280 \| 127.0.0.1 \| 280 \| 127.0.0.1 Message-Id: <3b31584c13f23f84af35660d0aa73ba56c30cf13.1596799589.git.sarna@scylladb.com>	2020-08-09 12:53:04 +03:00
Piotr Sarna	71bb277cbc	multishard_mutation_query: fix a typo in variable name s/allwoed/allowed Message-Id: <eedb62b1f13ebf4ab1e6e92642a77fab32379d73.1596799589.git.sarna@scylladb.com>	2020-08-09 12:52:40 +03:00
Piotr Sarna	5e8247fd8c	storage_proxy: make tracing more specific wrt. token ranges Until now, only singular ranges were present in tracing, and, what's more, their tracing message suggested that the range is not singular: Start querying the token range that starts with (...) This commit makes the message more specific and also provides a corresponding tracing message to non-singular ranges. Example for a singular range: activity \| timestamp \| source \| source_elapsed \| client ----------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-08-07 13:11:55.479000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-08-07 13:11:55.479616 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2020-08-07 13:11:55.479695 \| 127.0.0.1 \| 80 \| 127.0.0.1 Creating read executor for token -7160136740246525330 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] \| 2020-08-07 13:11:55.479747 \| 127.0.0.1 \| 132 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2020-08-07 13:11:55.479752 \| 127.0.0.1 \| 137 \| 127.0.0.1 Start querying singular range {{-7160136740246525330, pk{00040000002a}}} [shard 0] \| 2020-08-07 13:11:55.479758 \| 127.0.0.1 \| 143 \| 127.0.0.1 Querying is done [shard 0] \| 2020-08-07 13:11:55.479816 \| 127.0.0.1 \| 201 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-08-07 13:11:55.479844 \| 127.0.0.1 \| 229 \| 127.0.0.1 Request complete \| 2020-08-07 13:11:55.479238 \| 127.0.0.1 \| 238 \| 127.0.0.1 Example for nonsingular range: activity \| timestamp \| source \| source_elapsed \| client ------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-08-07 13:13:47.189000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-08-07 13:13:47.189259 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2020-08-07 13:13:47.189346 \| 127.0.0.1 \| 87 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2020-08-07 13:13:47.189412 \| 127.0.0.1 \| 153 \| 127.0.0.1 Start querying token range [{7, end}, {42, end}] [shard 0] \| 2020-08-07 13:13:47.189421 \| 127.0.0.1 \| 162 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2020-08-07 13:13:47.189436 \| 127.0.0.1 \| 177 \| 127.0.0.1 Querying is done [shard 0] \| 2020-08-07 13:13:47.189495 \| 127.0.0.1 \| 236 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-08-07 13:13:47.189526 \| 127.0.0.1 \| 268 \| 127.0.0.1 Request complete \| 2020-08-07 13:13:47.189276 \| 127.0.0.1 \| 276 \| 127.0.0.1 Message-Id: <82f1a8680fc8383cd7e6c7b283de94e5b71a52ab.1596799589.git.sarna@scylladb.com>	2020-08-09 12:52:08 +03:00
Benny Halevy	c4d023d622	sstable: delete_sstables: delete outdated FIXME comment delete_sstables is used for replaying pending_delete logs since `043673b236`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:33:58 +03:00
Benny Halevy	d0bb180e53	sstable: remove_by_toc_name: drop error_handler parameter It's now always called with the default one: sstable_write_error_handler. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Benny Halevy	78595303f9	sstable: remove_by_toc_name: make static It's not called outside of sstables code anymore. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Benny Halevy	55826deb05	sstable: read_toc: always close file Use utils::with_file helper to always close the file new_sstable_component_file opens. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Benny Halevy	69f7454d88	sstable: mark read_toc and methods calling it noexcept read_toc can be marked as noexcept now that new_sstable_component_file is. With that, other methods that call it can be marked noexcept too. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Benny Halevy	3444abcb8f	sstable: read_toc: get rid of file_path In preparation for closing the file in all paths, get rid of the file_path sstring and just recompute it as needed on error paths using the this->filename method. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Benny Halevy	c2c18bc708	sstable: open_data, create_data: set member only on success. Now that sstable::open_file is noexcept, if any of open or create of data/index doesn't succeed, we don't set the respective sstable member and return the failure. When destroyed, the sstable destructor will close any file (data or index), that we managed to open. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Benny Halevy	803aadf89f	sstable: open_file: mark as noexcept Now that new_sstable_component_file is noexcept, open_file can be specified as noexcept too. This makes error handling in create/open sstable data and index files easier using when_all_succeed(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Benny Halevy	155f06b0d5	sstable: new_sstable_component_file: make noexcept Try/catch any exception in the function body and return it as an exceptional future. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Benny Halevy	9544b98787	sstable: new_sstable_component_file: close file on failure Use with_file_close_on_error to make sure any files we open and/or wrap are closed on failure. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Benny Halevy	cad5c31141	sstable: rename_new_sstable_component_file: do not pass file Currently the function is handed over a `file` that it just passes through on success. Let its single caller do that to simplify its error handling. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Benny Halevy	881e32d0fe	sstable: open_sstable_component_file_non_checked: mark as noexcept Now that open_integrity_checked_file_dma (and open_file_dma) are noexcept, open_sstable_component_file_non_checked can be noexcept too. Also, get a std::string_view name instead of a const sstring& to match both open_integrity_checked_file_dma and open_file_dma name arg. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Benny Halevy	472013de27	sstable: open_integrity_checked_file_dma: make noexcept Convert to accepting std::string_view name. Move the sstring allocation to make_integrity_checked_file that may already throw. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Benny Halevy	3ad9503a3f	sstable: open_integrity_checked_file_dma: close file on failure Use seastar::with_file_close_on_failure to make sure the file we open is closed on failure of the continuation, as make_integrity_checked_file may throw from ::make_shared. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-09 12:04:36 +03:00
Dejan Mircevski	8cae61ee6b	cql3: Move #include from .hh to .cc restrictions.hh included fmt/ostream.h, which is expensive due to its transitive #includes. Replace it with fmt/core.h, which transitively includes only standard C++ headers. As requested by #5763 feedback: https://github.com/scylladb/scylla/pull/5763#discussion_r443210634 Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-08-08 21:37:08 +03:00
Dejan Mircevski	df20854963	cql3: Move expressions to their own namespace Move the classes representing CQL expressions (and utility functions on them) from the `restrictions` namespace to a new namespace `expr`. Most of the restriction.hh content was moved verbatim to expression.hh. Similarly, all expression-related code was moved from statement_restrictions.cc verbatim to expression.cc. As suggested in #5763 feedback https://github.com/scylladb/scylla/pull/5763#discussion_r443210498 Tests: dev (unit) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-08-08 21:03:26 +03:00
Lubos Kosco	c203b6bb1f	scylla_io_setup: Supported GCP VMs with NVMEs get out of box I/O configs Iotune isn't run for supported/recommended GCP instances anymore, we set the I/O properties now based on GCP tables or our measurements(where applicable). Not recommended/supported setups will still run iotune. Fixes #6631	2020-08-08 11:05:28 +02:00
Lubos Kosco	125a84d7c5	scylla_util.py: add support for gcp instances GCP VMs with NVMEs that are recommended can be recognized now. Detection is done using resolving of internal GCP metadata server. We recommend 2 cpu instances at least. For more than 16 disks we mandate at least 32 cpus. 50:1 disk to ram ratio also has to be kept. Instances that use NVMEs as root disks are also considered as unsupported. Supported instances for NVMEs are n1, n2, n2d, c2 and m1-megamem-96. All others are unsupported now.	2020-08-08 11:04:37 +02:00
Lubos Kosco	97e3ab739b	scylla_util.py: support http headers in curl function	2020-08-08 10:58:59 +02:00
Lubos Kosco	0c5dbb4c4f	scylla_io_setup: refactor iotune run to a function	2020-08-08 10:57:31 +02:00
Nadav Har'El	f8291500cf	alternator test: test for ConditionExpression on key columns While answering a stackoverflow question on how to create an item but only if we don't already have an item with the same key, I realized that we never actually tested that ConditionExpressions works on key columns: all the tests we had in test_condition_expression.py had conditions on non-key attributes. So in this patch we add two tests with a condition on the key attribute. Most examples of conditions on the key attributes would be silly, but in these two tests we demonstrate how a test on key attributes can be useful to solve the above need of creating an item if no such item exists yet. We demonstrate two ways to do this using a condition on the key - using either the "<>" (not equal) operator, or the "attribute_not_exists()" function. These tests pass - we don't have a bug in this. But it's nice to have a test that confirms that we don't (and don't regress in that area). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200806200322.1568103-1-nyh@scylladb.com>	2020-08-07 08:05:48 +02:00
Benny Halevy	6d66d5099a	ssstable: io_check functions: make noexcept Now that do_io_check is noexcept. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-06 19:38:41 +03:00
Avi Kivity	1072acb215	Update tools/java and tools/jmx submodules * tools/java aa7898d771...f2c7cf8d8d (2): > dist: debian: support non-x86 > NodeProbe: get all histogram values in a single call * tools/jmx 626fd75...c5ed831 (1): > dist: debian: support non-x86	2020-08-06 19:17:14 +03:00
Avi Kivity	ba4d7e8523	Merge "Teach B+ to use AVX for keys search" from Pavel E " The current implementation of B+ benefits from using SIMD instruction in intra-nodes keys search. This set adds this functionality. The general idea behind the implementation is in "asking" the less comparator if it is the plain "<" and allows for key simplification to do this natural comparison. If it does, the search key is simplified to int64_t, the node's array of keys is casted to array of integers, then both are fed into avx-optimized searcher. The searcher should work on nodes that are not filled with keys. For performance the "unused" keys are set to int64_t minimum, the search loop compares them too (!) and adjusts the result index by node size. This needs some care in the maybe_key{} wrapper. fixes: #186 tests: unit(dev) " * 'br-bptree-avx-b' of https://github.com/xemul/scylla: utils: AVX searcher bptree: Special intra-node key search when possible bptree: Add lesses to maybe_key template token: Restrict TokenCarrier concept with noexcept	2020-08-06 19:14:46 +03:00
Tomasz Grabiec	bfd129cffe	thrift: Fix crash on unsorted column names in SlicePredicate The column names in SlicePredicate can be passed in arbitrary order. We converted them to clustering ranges in read_command preserving the original order. As a result, the clustering ranges in read command may appear out of order. This violates storage engine's assumptions and lead to undefined behavior. It was seen manifesting as a SIGSEGV or an abort in sstable reader when executing a get_slice() thrift verb: scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed. Fixes #6486. Tests: - added a new dtest to thrift_tests.py which reproduces the problem Message-Id: <1596725657-15802-1-git-send-email-tgrabiec@scylladb.com>	2020-08-06 19:13:22 +03:00
Benny Halevy	e33fc10638	utils: do_io_check: adjust indentation was broken by the previous patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-06 19:01:18 +03:00
Benny Halevy	fd5b2672c1	utils: io_check: make noexcept for future-returning functions Use futurize_apply to handle any exception the passed function may throw. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-06 19:01:17 +03:00
Rafael Ávila de Espíndola	b1315a2120	cql_test_env: Delay starting the compaction manager In case of an initialization failure after db.get_compaction_manager().enable(); But before stop_database, we would never stop the compaction manager and it would assert during destruction. I am trying to add a test for this using the memory failure injector, but that will require fixing other crashes first. Found while debugging #6831. Refs #6831. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200805181840.196064-1-espindola@scylladb.com>	2020-08-06 16:07:16 +03:00
Pavel Emelyanov	7c20e3ed05	utils: AVX searcher With all the preparations made so far it's now possible to implement the avx-powered search in an array. The array to search in has both -- capacity and size, so searching in it needs to take allocated, but unused tail into account. Two options for that -- limit the number of comparisons "by hands" or keep minimal and impossible value in this tail, scan "capacity" elements, then correct the result with "size" value. The latter approach is up to 50% faster than any (tried) attempt to do the former one. The run-time selection of the array search code is done with the gnu target attribute. It's available since gcc 4.8. For AVX-less platforms the default linear scanner is used. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-06 15:41:31 +03:00
Pavel Emelyanov	35a22ac48a	bptree: Special intra-node key search when possible If the key type is int64_t and the less-comparator is "natural" (i.e. it's literally 'a < b') we may use the SIMD instructions to search for the key on a node. Before doing so, the maybe_key and the searcher should be prepared for that, in particular: 1. maybe_key should set unused keys to the minimal value 2. the searcher for this case should call the gt() helper with primitive types -- int64_t search key and array of int64_t values To tell to B+ code that the key-less pair is such the less-er should define the simplify_key() method converting search keys to int64_t-s. This searcher is selected automatically, if any mismatch happens it silently falls back to default one. Thus also add a static assertion to the row-cache to mitigate this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-06 15:41:31 +03:00
Pavel Emelyanov	14f0cdb779	bptree: Add lesses to maybe_key template The way maybe_key works will be in-sync with the intra-node searching code and will require to know what the Less type is, so prepare for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-06 15:41:31 +03:00
Pavel Emelyanov	61d4a73ed0	token: Restrict TokenCarrier concept with noexcept The <search-key>.token() is noexcept currently and will have to be explicitly such for future optimized key searcher, so restrict the constraint and update the related classes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-06 15:41:31 +03:00
Dejan Mircevski	54749112f9	cql3: Rewrite bounds_ck_symmetrically for deep conjunctions As suggested by #5763 feedback: https://github.com/scylladb/scylla/pull/5763#discussion_r443214356 Pull found_bounds outside the visit call and apply the visitor recursively to conjunction children. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-08-05 19:34:18 +03:00
Dejan Mircevski	5e382aaaab	cql3: Rename bounded_ck to bounds_ck_symmetrically As suggested by #6818 feedback: https://github.com/scylladb/scylla/pull/6818#discussion_r460494026 Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-08-05 19:34:18 +03:00
Nadav Har'El	8f12ef3628	alternator test: faster stream tests by reducing number of vnodes The single-node test Scylla run by test/alternator/run uses, as the default, 256 vnodes. When we have 256 vnodes and two shards, our CDC implementation produces 512 separate "streams" (called "shards" in DynamoDB lingo). This causes each of the tests in test_streams.py which need to read data from the stream to need to do 1024 (!) API requests (512 calls to GetShardIterator and 512 calls to GetRecords) which takes significant time - about a second per test. In this patch, we reduce the number of vnodes to 16. We still have a non-negligible number of stream "shards" (32) so this part of the CDC code is still exercised. Moreover, to ensure we still routinely test the paging feature of DescribeStream (whose default page size is 100), the patch changes the request to use a Limit of 10, so paging will still be used to retrieve the list of 32 shards. The time to run the 27 tests in test_streams.py, on my laptop: Before this patch: 26 seconds After this patch: 6 seconds. Fixes #6979 Message-Id: <20200805093418.1490305-1-nyh@scylladb.com>	2020-08-05 19:34:18 +03:00
Nadav Har'El	59dff3226b	Alternator tests: more tests for Alternator Streams This patch adds additional tests for Alternator Streams, which helped uncover 9 new issues. The stream tests are noticibly slower than most other Alternator tests - test_streams.py now has 27 tests taking a total of 20 seconds. Much of this slowness is attributed to Alternator Stream's 512 "shards" per stream in the single-node test setup with 256 vnodes, meaning that we need over 1000 API requests per test using GetRecords. These tests could be made significantly faster (as little as 4 seconds) by setting a lower number of vnodes. Issue #6979 is about doing this in the future. The tests in this patch have comments explaining clearly (I hope) what they test, and also pointing to issues I opened about the problems discovered through these tests. In particular, the tests reproduce the following bugs: Refs #6918 Refs #6926 Refs #6930 Refs #6933 Refs #6935 Refs #6939 Refs #6942 The tests also work around the following issues (and can be changed to be more strict and reproduce these issues): Refs #6918 Refs #6931 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200804154755.1461309-1-nyh@scylladb.com>	2020-08-05 19:34:18 +03:00
Avi Kivity	e994275f80	Merge "auth: Avoid more global variable initializations" from Rafael " This patch series converts a few more global variables from sstring to constexpr std::string_view. Doing that makes it impossible for them to be part of any initialization order problems. " * 'espindola/more-constexpr-v2' of https://github.com/espindola/scylla: auth: Turn DEFAULT_USER_NAME into a std::string_view variable auth: Turn SALTED_HASH into a std::string_view variable auth: Turn meta::role_members_table::qualified_name into a std::string_view variable auth: Turn meta::roles_table::qualified_name into a std::string_view variable auth: Turn password_authenticator_name into a std::string_view variable auth: Inline default_authorizer_name into only use auth: Turn allow_all_authorizer_name into a std::string_view variable auth: Turn allow_all_authenticator_name into a std::string_view variable	2020-08-05 10:54:13 +03:00
Raphael S. Carvalho	f640d71b23	sstables/LCS: Dump # of overlapping SSTables too if reshape is triggered Refs #6938. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200804142620.54340-1-raphaelsc@scylladb.com>	2020-08-05 10:53:03 +03:00
Rafael Ávila de Espíndola	f98ea77ae8	cql: Mark functions::init noexcept If initialization of a TLS variable fails there is nothing better to do than call std::unexpected. This also adds a disable_failure_guard to avoid errors when using allocation error injection. With init() being noexcept, we can also mark clear_functions. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200804180550.96150-1-espindola@scylladb.com>	2020-08-05 10:13:06 +03:00
Rafael Ávila de Espíndola	a4916ce553	auth: Turn DEFAULT_USER_NAME into a std::string_view variable Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-04 16:40:00 -07:00
Rafael Ávila de Espíndola	61de1fe752	auth: Turn SALTED_HASH into a std::string_view variable Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-04 16:40:00 -07:00
Rafael Ávila de Espíndola	f6006dbba8	auth: Turn meta::role_members_table::qualified_name into a std::string_view variable Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-04 16:40:00 -07:00
Rafael Ávila de Espíndola	cb4c3e45d5	auth: Turn meta::roles_table::qualified_name into a std::string_view variable Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-04 16:40:00 -07:00
Rafael Ávila de Espíndola	27c2b3de30	auth: Turn password_authenticator_name into a std::string_view variable Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-04 16:40:00 -07:00
Rafael Ávila de Espíndola	e526ed369b	auth: Inline default_authorizer_name into only use Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-04 16:39:57 -07:00
Rafael Ávila de Espíndola	1a11e64f52	auth: Turn allow_all_authorizer_name into a std::string_view variable Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-04 16:38:55 -07:00
Rafael Ávila de Espíndola	0da6807f7e	auth: Turn allow_all_authenticator_name into a std::string_view variable There is no constexpr operator+ for std::string_view, so we have to concatenate the strings ourselves. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-08-04 16:38:27 -07:00
Nadav Har'El	db08ff4cbd	Additional entries in CODEOWNERS List a few more code areas, and add or correct paths for existing code areas. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200804175651.1473082-1-nyh@scylladb.com>	2020-08-04 21:03:23 +03:00
Nadav Har'El	936cf4cce0	merge: Increase row limits Merged pull request https://github.com/scylladb/scylla/pull/6910 by Wojciech Mitros: This patch enables selecting more than 2^32 rows from a table. The change becomes active after upgrading whole cluster - until then old limits are used. Tested reading 4.5*10^9 rows from a virtual table, manually upgrading a cluster with ccm and performing cql SELECT queries during the upgrade, ran unit tests in dev mode and cql and paging dtests. tests: add large paging state tests increase the maximum size of query results to 2^64	2020-08-04 19:52:30 +03:00
Wojciech Mitros	4863e8a11f	tests: add large paging state tests Add a unit test checking if the top 32 bits of the number of remaining rows in paging state is used correctly and a manual test checking if it's possible to select over 2^32 rows from a table and a virtual reader for this table.	2020-08-04 18:44:29 +02:00
Kamil Braun	b5f3aef900	cdc: add an abstraction for building log mutations This commit takes out some responsibilities of `cdc::transformer` (which is currently a big ball of mud) into a separate class. This class is a simple abstraction for creating entries in a CDC log mutation. Low-level calls to the mutation API (such as `set_cell`) inside `cdc::transformer` were replaced by higher-level calls to the builder abstraction, removing some duplication of logic.	2020-08-04 19:37:03 +03:00
Avi Kivity	c97924b8ad	Update seastar submodule util/loading_cache.hh includes adjusted. * seastar 02ad74fa7d...eb452a22a0 (17): > core: add missing include for std::allocator_traits > exceptions: move timed_out_error and factory into its own header file > future: parallel_for_each: add disable_failure_guard for parallel_for_each_state > Merge "Improve file API noexcept correctness" from Rafael > util: Add a with_allocation_failures helper > future: Fix indentation > future: Refactor duplicated try/catch > future: Make set_to_current_exception public > future: Add noexcept to continuation related functions > core: mark timer cancellation functions as noexcept > future: Simplify future::schedule > test: add a case for overwriting exact routes > http: throw on duplicated routes to prevent memory leaks > metrics: Remove the type label > fstream: turn file_data_source_impl's memory corruption bugs into aborts > doc: update tutorial splitting script > reactor_backend: let the reactor know again if any work was done by aio backend	2020-08-04 17:54:45 +03:00
Nadav Har'El	1adcd7aca7	merge: Alternator streams get_records - fix threshold/record Merged pull request https://github.com/scylladb/scylla/pull/6969 by Calle Wilund: Fixes #6942 Fixes #6926 Fixes #6933 We use clustering [lo:hi) range for iterator query. To avoid encoding inclusive/exclusive range (depending on init/last get_records call), instead just increment the timeuuid threshold. Also, dynamo result always contains a "records" entry. Include one for us as well. Also, if old (or new) image for a change set is empty, dynamo will not include this key at all. Alternator did return an empty object. This changes it to be excluded on empty. alternator::streams: Don't include empty new/old image alternator::streams: Always include "Records" array in get_records reponse alternator::streams: Incr shard iterator threshold in get_records	2020-08-04 11:11:07 +03:00
Rafael Ávila de Espíndola	d5e8b64f01	Simplify a few calls to seastar::make_shared There is no need to construct a value and then move it when using make_shared. It can be constructed in place. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200804001144.59641-1-espindola@scylladb.com>	2020-08-04 11:03:18 +03:00
Avi Kivity	311ba4e427	Merge "sstables: Simplify the relationship of monitors and writers" from Rafael " With this patches a monitor is destroyed before the writer, which simplifies the writer destructor. " * 'espindola/simplify-write-monitor-v2' of https://github.com/espindola/scylla: sstables: Delete write_failed sstables: Move monitor after writer in compaction_writer	2020-08-04 11:01:55 +03:00
Rafael Ávila de Espíndola	74ea522cd2	Use detect_stack_use_after_return=1 This works great with gcc 10.2, but unfortunately not any previous gcc. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200731161205.22369-1-espindola@scylladb.com>	2020-08-04 11:00:09 +03:00
Calle Wilund	bf63b8f9f4	alternator::streams: Don't include empty new/old image Fixes #6933 If old (or new) image for a change set is empty, dynamo will not include this key at all. Alternator did return an empty object. This changes it to be excluded on empty.	2020-08-04 07:39:09 +00:00
Calle Wilund	f80b465350	alternator::streams: Always include "Records" array in get_records reponse Fixes #6926 Even it empty...	2020-08-04 07:39:09 +00:00
Calle Wilund	a763bb223f	alternator::streams: Incr shard iterator threshold in get_records Fixes #6942 We use clustering [lo:hi) range for iterator query. To avoid encoding inclusive/exclusive range (depending on init/last get_records call), instead just increment the timeuuid threshold.	2020-08-04 07:39:02 +00:00
Rafael Ávila de Espíndola	ef0bed7253	Drop duplicated 'if' in comment Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200730170109.5789-1-espindola@scylladb.com>	2020-08-04 07:53:34 +03:00
Calle Wilund	a978e043c3	alternator::streams: Do not allow enabling streams when CDC is off Fixes #6866 If we try to create/alter an Alternator table to include streams, we must check that the cluster does in fact support CDC (experimental still). If not, throw a hopefully somewhat descriptive error. (Normal CQL table create goes through a similar check in cql_prop_defs) Note: no other operations are prohibited. The cluster could have had CDC enabled before, so streams could exist to list and even read. Any tables loaded from schema tables should be reposnsible for their own validation.	2020-08-03 21:01:31 +03:00
Calle Wilund	05851578d4	alternator::streams: Report streams as not ready until CDC stream id:s are available Refs #6864 When booting a clean scylla, CDC stream ID:s will not be availble until a nring delay time period has passed. Before this, writing to a CDC enabled table will fail hard. For alternator (and its tests), we can report the stream(s) for tables as not yet available (ENABLING) until such time as id:s are computed. v2: Keep storage service ref in executor	2020-08-03 20:34:15 +03:00
Avi Kivity	1572b9e41c	Merge 'transport: Added listener with port-based load balancing' from Juliusz " This is inspired by #6781. The idea is to make Scylla listen for CQL connections on port 9042 (where both old shard-aware and shard-unaware clients can still connect the traditional way). On top of that I added a new port, where everything works the same way, only the port from client's socket used to determine the shard No. to connect to. Desired shard No. is the result of `clientside_port % num_shards`. The new port is configurable from scylla.yaml and defaults to 19042 (unencrypted, unless user configures encryption options and omits `native_shard_aware_transport_port_ssl` in DB config). Two "SUPPORTED" tags are added: "SCYLLA_SHARD_AWARE_PORT" and "SCYLLA_SHARD_AWARE_PORT_SSL". For compatibility, "SCYLLA_SHARDING_ALGORITHM" is still kept. Fixes #5239 " * jul-stas-shard-aware-listener: docs: Info about shard-aware listeners in protocol-extensions transport: Added listener with port-based load balancing	2020-08-03 19:23:28 +03:00
Wojciech Mitros	45215746fe	increase the maximum size of query results to 2^64 Currently, we cannot select more than 2^32 rows from a table because we are limited by types of variables containing the numbers of rows. This patch changes these types and sets new limits. The new limits take effect while selecting all rows from a table - custom limits of rows in a result stay the same (2^32-1). In classes which are being serialized and used in messaging, in order to be able to process queries originating from older nodes, the top 32 bits of new integers are optional and stay at the end of the class - if they're absent we assume they equal 0. The backward compatibility was tested by querying an older node for a paged selection, using the received paging_state with the same select statement on an upgraded node, and comparing the returned rows with the result generated for the same query by the older node, additionally checking if the paging_state returned by the upgraded node contained new fields with correct values. Also verified if the older node simply ignores the top 32 bits of the remaining rows number when handling a query with a paging_state originating from an upgraded node by generating and sending such a query to an older node and checking the paging_state in the reply(using python driver). Fixes #5101.	2020-08-03 17:32:49 +02:00
Juliusz Stasiewicz	201268ea19	docs: Info about shard-aware listeners in protocol-extensions	2020-08-03 16:45:42 +02:00
Takuya ASADA	c0b2933106	scylla_setup: skip RAID prompt when var-lib-scylla.mount already exists Since scylla_raid_setup always cause error when var-lib-scylla.mount already exists, it's better to skip RAID prompt. See #6965	2020-08-03 17:44:02 +03:00
Takuya ASADA	cff3e60f98	scylla_raid_setup: check var-lib-scylla.mount existance before formatting RAID We should run var-lib-scyllla.mount existance check before formatting RAID. Fixes #6965	2020-08-03 17:44:02 +03:00
Avi Kivity	4edfdfa78d	Merge 'Build id cleanups' from Benny " Refs #5525 - main: add --build-id option - build_id: mv sources to utils/ - build_id: throw on errors rather than assert - build_id: simplify callback pointer type casting " * bhalevy-build-id-cleanups: build_id: simplify callback pointer type casting build_id: mv sources to utils/ main: add --build-id option	2020-08-03 17:18:09 +03:00
Calle Wilund	30a700c5b0	system_keyspace: Remove support for legacy truncation records Fixes #6341 Since scylla no longer supports upgrading from a version without the "new" (dedicated) truncation record table, we can remove support for these and the migtration thereof. Make sure the above holds whereever this is committed. Note that this does not remove the "truncated_at" field in system.local.	2020-08-03 17:16:26 +03:00
Botond Dénes	a9013030cf	multishard_mutation_reader: add a trace message for each shard reader created So we can see in the trace output, the shards that actually participated in the reads. There is a single message for each shard reader. Fixes: #6888 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200803132338.95013-1-bdenes@scylladb.com>	2020-08-03 16:24:46 +03:00
Benny Halevy	9256d2f504	build_id: simplify callback pointer type casting Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-03 15:55:18 +03:00
Benny Halevy	bf6e8f66d9	build_id: mv sources to utils/ The root directory is already overcrowded. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-03 15:55:16 +03:00
Benny Halevy	46f7d01536	main: add --build-id option Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-03 15:52:08 +03:00
Nadav Har'El	2dcb6294da	merge: cdc: New delta modes: `off`, `keys`, `fulll` Merged pull request https://github.com/scylladb/scylla/pull/6914 by By Juliusz Stasiewicz: The goal is to have finer control over CDC "delta" rows, i.e.: disable them totally (mode off); record only base PK+CK columns (mode keys); make them behave as usual (mode full, default). The editing of log rows is performed at the stage of finishing CDC mutation. Fixes #6838 tests: Added CQL test for `delta mode` cdc: Implementations of `delta_mode::off/keys` cdc: Infrastructure for controlling `delta_mode`	2020-08-03 14:10:15 +03:00
Piotr Sarna	ed829fade0	sstables: make abort handlers noexcept Abort handlers are used in noexcept environment, so they should be noexcept themselves. Tested on a not-merged-yet Seastar patch with hardened noexcept checks for abort_source. Message-Id: <fbfd4950c0e8cc4f6005ad5b862d7bce01b90162.1596446857.git.sarna@scylladb.com>	2020-08-03 14:00:19 +03:00
Piotr Sarna	bd2d48e99c	streaming: make stream_plan::abort noexcept Aborting a stream plan is used in deinitialization code ran in noexcept environment, so it should be noexcept itself. Tested on a not-merged-yet Seastar patch with hardened noexcept checks for abort_source. Message-Id: <6eada033bb394d725b83a7e0f92381cb792ef6a1.1596446857.git.sarna@scylladb.com>	2020-08-03 14:00:19 +03:00
Piotr Sarna	5cc5b64d82	github: remove THE REST rule from CODEOWNERS file The rule for THE REST results in each person listed in it to receive notifications about every single pull request, which can easily lead to inbox overload - the generic rule is therefore dropped and authors of pull requests are expected to manually add reviewers. GitHub offers semi-random suggestions for reviewers anyway. Message-Id: <3c0f7a2f13c098438a8abf998ec56b74db87c733.1596450426.git.sarna@scylladb.com>	2020-08-03 13:48:39 +03:00
Eliran Sinvani	779502ab11	Revert "schema: take into account features when converting a table creation to" This reverts commit `b97f466438`. It turns out that the schema mechanism has a lot of nuances, after this change, for unknown reason, it was empirically proven that the amount of cross shard on an upgraded node was increased significantly with a steady stress traffic, if was so significant that the node appeared unavailable to the coordinators because all of the requests started to fail on smp_srvice_group semaphore. This revert will bring back a caveat in Scylla, the caveat is that creating a table in a mixed cluster might under certain condition cause schema mismatch on the newly created table, this make the table essentially unusable until the whole cluster has a uniform version (rolling upgrade or rollback completion). Fixes #6893.	2020-08-03 12:51:16 +03:00
Botond Dénes	c81658c96e	configure.py: remove unused variable do_sanitize Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200803082724.120916-1-bdenes@scylladb.com>	2020-08-03 12:51:16 +03:00
Botond Dénes	f4c8163d11	db/config_file.hh: named_value: remove unused members _name and _desc They seem to be just copypasta. Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200803080604.45595-1-bdenes@scylladb.com>	2020-08-03 12:51:16 +03:00
Benny Halevy	3fa0f289de	table: snapshot: do not capture name This captured sstring is unused. Test: database_test(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200803072258.44681-1-bhalevy@scylladb.com>	2020-08-03 12:51:16 +03:00
Botond Dénes	e4d06a3bbf	scylla-gdb.py: collection_element: add circular_buffer support Also add a __getitem__() to circular_buffer and mask indexes so they are mapped to [`_impl.begin`, `_impl.end`). Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200803053646.14689-1-bdenes@scylladb.com>	2020-08-03 12:51:16 +03:00
Benny Halevy	122136c617	tables: snapshot: do not create links from multiple shards We need only one of the shards owning each ssatble to call create_links. This will allow us to simplify it and only handle crash/replay scenarios rather than rename/link/remove races. Fixes #1622 Test: unit(dev), database_test(debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200803065505.42100-3-bhalevy@scylladb.com>	2020-08-03 10:07:07 +03:00
Benny Halevy	ec6e136819	table: snapshot: reduce copies of snapshot dir sstring Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200803065505.42100-2-bhalevy@scylladb.com>	2020-08-03 10:07:06 +03:00
Benny Halevy	72365445c6	table: snapshot: create destination dir only once No need to recursive_touch_directory for each sstable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200803065505.42100-1-bhalevy@scylladb.com>	2020-08-03 10:07:05 +03:00
Pekka Enberg	4f0f97773e	configure.py: Use build directory variable The "outdir" variable in configure.py and "$builddir" in build.ninja file specifies the build directory. Let's use them to eliminate hard-coded "build" paths from configure.py. Message-Id: <20200731105113.388073-1-penberg@scylladb.com>	2020-08-03 09:51:51 +03:00
Nadav Har'El	ae25661d9c	alternator test: set streams time window to zero Alternator Streams have a "alternator_streams_time_window_s" parameter which is used to allow for correct ordering in the stream in the face of clock differences between Scylla nodes and possibly network delays. This parameter currently defaults to 10 seconds, and there is a discussion on issue #6929 on whether it is perhaps too high. But in any case, for tests running on a single node there is no reason not to set this parameter to zero. Setting this parameter to zero greatly speeds up the Alternator Streams tests which use ReadRecords to read from the stream. Previously each such test took at least 10 seconds, because the data was only readable after a 10 second delay. With alternator_streams_time_window_s=0, these tests can finish in less than a second. Unfortunately they are still relatively slow because our Streams implementation has 512 shards, and thus we need over a thousand (!) API calls to read from the stream). Running "test/alternator/run test_streams.py" with 25 tests took before this patch 114 seconds, after this patch, it is down to 18 seconds. Refs #6929 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Reviewed-by: Calle Wilund <calle@scylladb.com> Message-Id: <20200728184612.1253178-1-nyh@scylladb.com>	2020-08-03 09:19:57 +03:00
Avi Kivity	257c17a87a	Merge "Don't depend on seastar::make_(lw_)?shared idiosyncrasies" from Rafael " While working on another patch I was getting odd compiler errors saying that a call to ::make_shared was ambiguous. The reason was that seastar has both: template <typename T, typename... A> shared_ptr<T> make_shared(A&&... a); template <typename T> shared_ptr<T> make_shared(T&& a); The second variant doesn't exist in std::make_shared. This series drops the dependency in scylla, so that a future change can make seastar::make_shared a bit more like std::make_shared. " * 'espindola/make_shared' of https://github.com/espindola/scylla: Everywhere: Explicitly instantiate make_lw_shared Everywhere: Add a make_shared_schema helper Everywhere: Explicitly instantiate make_shared cql3: Add a create_multi_column_relation helper main: Return a shared_ptr from defer_verbose_shutdown	2020-08-02 19:51:24 +03:00
Avi Kivity	bb9ad9c90b	Merge 'Mount RAID volume correctly beyond reboot' from Takuya " To mount RAID volume correctly (#6876), we need to wait for MDRAID initialization. To do so we need to add After=mdmonitor.service on var-lib-scylla.mount. Also, `lsblk -n -oPARTTYPE {dev}` does not work for CentOS7, since older lsblk does not supported PARTTYPE column (#6954). We need to provide relocatable lsblk and run it on out() / run() function instead of distribution provided version. " * syuu1228-scylla_raid_setup_mount_correctly_beyond_reboot: scylla_raid_setup: initialize MDRAID before mounting data volume create-relocatable-package.py: add lsblk for relocatable CLI tools scylla_util.py: always use relocatable CLI tools	2020-08-02 16:36:45 +03:00
Piotr Sarna	ccbffc3177	codeowners: add some @psarnas and @penbergs where applicable I shamelessly added myself to some modules I usually take part in reviewing. Also, I assume that the THE REST bucket should show current maintainers, so the list is extended appropriately. Message-Id: <0c172d0f20e367c3ce47fdf8d40755038ddee373.1596195689.git.sarna@scylladb.com>	2020-07-31 17:08:28 +03:00
Rafael Ávila de Espíndola	30722b8c8e	logalloc: Add disable_failure_guard during a few tls variable initialization The constructors of these global variables can allocate memory. Since the variables are thread_local, they are initialized at first use. There is nothing we can do if these allocations fail, so use disable_failure_guard. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200729184901.205646-1-espindola@scylladb.com>	2020-07-31 15:49:21 +02:00
Pavel Emelyanov	14b279020b	scylla-gdb.py: Support b+tree-based row_cache::_partitions The row_cache::_partitions type is nowadays a double_decker which is B+tree of intrusive_arrays of cache_entrys, so scylla cache command will raise an error being unable to parse this new data type. The respective iterator for double decker starts on the tree and walks the list of leaf nodes, on each node it walks the plain array of data nodes, then on each data node it walks the intrusive array of cache_entrys yielding them to the caller. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200730145851.8819-1-xemul@scylladb.com>	2020-07-31 15:48:25 +02:00
Piotr Jastrzębski	b16b2c348f	Add CDC code owners	2020-07-31 14:22:08 +03:00
Piotr Jastrzębski	7eff7a39a0	Add hinted handoff code owners	2020-07-31 14:21:59 +03:00
Piotr Jastrzębski	443affa525	Update counters code owners	2020-07-31 14:21:48 +03:00
Juliusz Stasiewicz	1c11d8f4c4	transport: Added listener with port-based load balancing The new port is configurable from scylla.yaml and defaults to 19042 (unencrypted, unless client configures encryption options and omits `native_shard_aware_transport_port_ssl`). Two "SUPPORTED" tags are added: "SCYLLA_SHARD_AWARE_PORT" and "SCYLLA_SHARD_AWARE_PORT_SSL". For compatibility, "SCYLLA_SHARDING_ALGORITHM" is still kept. Fixes #5239	2020-07-31 13:02:13 +02:00
Tomasz Grabiec	5263e0453a	CMakeLists.txt: Add abseil to include directories Fixes IDE integration. Message-Id: <1596190352-15467-1-git-send-email-tgrabiec@scylladb.com>	2020-07-31 12:15:23 +02:00
Avi Kivity	66c2b4c8bf	tools: toolchain: regenerate for gcc 10.2 Fixes #6813. As a side effect, this also brings in xxhash 0.7.4.	2020-07-31 08:32:16 +03:00
Takuya ASADA	9e5d548f75	scylla_raid_setup: initialize MDRAID before mounting data volume var-lib-scylla.mount should wait for MDRAID initilization, so we need to add 'After=mdmonitor.service'. However, currently mdmonitor.service fails to start due to no mail address specified, we need to add the entry on mdadm.conf. Fixes #6876	2020-07-31 06:33:52 +09:00
Takuya ASADA	6ba2a6c42e	create-relocatable-package.py: add lsblk for relocatable CLI tools We need latest version of lsblk that supported partition type UUID. Fixes #6954	2020-07-31 04:23:03 +09:00
Takuya ASADA	a19a62e6f6	scylla_util.py: always use relocatable CLI tools On some CLI tools, command options may different between latest version vs older version. To maximize compatibility of setup scripts, we should always use relocatable CLI tools instead of distribution version of the tool. Related #6954	2020-07-31 04:17:01 +09:00
Piotr Sarna	b3ad5042c4	.gitignore: add .vscode to the list Since it looks like vscode is used as main IDE by some developers, including me, let's ignore its helper files. Message-Id: <63931cadc733c3d0345616be633a6479dc85ca19.1596115302.git.sarna@scylladb.com>	2020-07-30 16:35:06 +03:00
Piotr Sarna	8728c70628	.gitignore: allow symlinks when ignoring testlog The .gitignore entry for testlog/ directory is generalized from "testlog/*" to "testlog", in order to please everyone who potentially wants test logs to use ramfs by symlinking testlog to /tmp. Without the change, the symlink remains visible in `git status`. Message-Id: <e600f5954868aea7031beb02b1d8e12a2ff869e2.1596115302.git.sarna@scylladb.com>	2020-07-30 16:35:02 +03:00
Piotr Sarna	0788a77109	Merge 'Replace MAINTAINERS with CODEOWNERS' from Pekka Replace the MAINTAINERS file with a CODEOWNERS file, which Github is able to parse, and suggest reviewers for pull requests. * penberg-penberg/codeowners: Replace MAINTAINERS with CODEOWNERS Update MAINTAINERS	2020-07-30 15:12:59 +02:00
Nadav Har'El	8b9da9c92a	alternator test: tests for combination of query filter and projection The tests in this patch, which pass on DynamoDB but fail on Alternator, reproduce a bug described in issue #6951. This bug makes it impossible for a Query (or Scan) to filter on an attribute if that attribute is not requested to be included in the output. This patch includes two xfailing tests of this type: One testing a combination of FilterExpression and ProjectionExpression, and the second testing a combination of QueryFilter and AttributesToGet; These two pairs are, respectively, DynamoDB's newer and older syntaxes to achieve the same thing. Additionally, we add two xfailing tests that demonstrates that combining old and new style syntax (e.g., FilterExpression with AttributesToGet) should not have been allowed (DynamoDB doesn't allow such combinations), but Alternator currently accepts these combinations. Refs #6951 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200729210346.1308461-1-nyh@scylladb.com>	2020-07-30 09:34:23 +02:00
Rafael Ávila de Espíndola	a548e5f5d1	test: Mark tmpdir::remove noexcept Also disable the allocation failure injection in it. Refs #6831. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200729200019.250908-2-espindola@scylladb.com>	2020-07-30 09:55:52 +03:00
Rafael Ávila de Espíndola	d8ba9678b4	test: Move tmpdir code to a .cc file This is not hot, so we can move it out of the header. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200729200019.250908-1-espindola@scylladb.com>	2020-07-30 09:55:52 +03:00
Tomasz Grabiec	3486eba1ce	commitlog: Fix use-after-free on mutation object during replay The mutation object may be freed prematurely during commitlog replay in the schema upgrading path. We will hit the problem if the memtable is full and apply_in_memory() needs to defer. This will typically manifest as a segfault. Fixes #6953 Introduced in `79935df` Tests: - manual using scylla binary. Reproduced the problem then verified the fix makes it go away Message-Id: <1596044010-27296-1-git-send-email-tgrabiec@scylladb.com>	2020-07-29 20:58:15 +03:00
Juliusz Stasiewicz	7e42a42381	tests: Added CQL test for `delta mode` Tested scenario is just a single insert in every `delta_mode`. It is also checked that CDC cannot be enabled with all its subfeatures disabled.	2020-07-29 16:42:26 +02:00
Nadav Har'El	665b78253a	alternator test: reduce amount of Scylla logs saved The test/alternator/run script follows the pytest log with a full log of Scylla. This saved log can be useful in diagnosing problems, but most of it is filled with non-useful "INFO"-level messages. The two biggest offenders are compaction - which logs every single compaction happening, and the migration manager, which is just a second (and very long) message about schema change operations (e.g., table creations). Neither of these are interesting for Alternator's tests, which shouldn't care exactly when compaction of which sstable is happening. These two components alone are reponsible for 80% of the log lines, and 90% of the log bytes! In this patch we increase the log level of just these two components - compaction and migration_manager - to WARN, which reduces the log by the same percentages (80% by lines, 90% by bytes). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200728191420.1254961-1-nyh@scylladb.com>	2020-07-29 14:17:12 +03:00
Takuya ASADA	3a25e7285b	scylla_post_install.sh: generate memory.conf for CentOS7 On CentOS7, systemd does not support percentage-based parameter. To apply memory parameter on CentOS7, we need to override the parameter in bytes, instead of percentage. Fixes #6783	2020-07-29 14:10:16 +03:00
Avi Kivity	fea5067dfa	Merge "Limit non-paged query memory consumption" from Botond " Non-paged queries completely ignore the query result size limiter mechanism. They consume all the memory they want. With sufficiently large datasets this can easily lead to a handful or even a single unpaged query producing an OOM. This series continues the work started by `134d5a5f7`, by introducing a configurable pair of soft/hard limit (default to 1MB/100MB) that is applied to otherwise unlimited queries, like reverse and unpaged ones. When an unlimited query reaches the soft limit a warning is logged. This should give users some heads-up to adjust their application. When the hard limit is reached the query is aborted. The idea is to not greet users with failing queries after an upgrade while at the same time protect the database from the really bad queries. The hard limit should be decreased from time to time gradually approaching the desired goal of 1MB. We don't want to limit internal queries, we trust ourselves to either use another form of memory usage control, or read only small datasets. So the limit is selected according to the query class. User reads use the `max_memory_for_unlimited_query_{soft,hard}_limit` configuration items, while internal reads are not limited. The limit is obtained by the coordinator, who passes it down to replicas using the existing `max_result_size` parameter (which is not a special type containing the two limits), which is now passed on every verb, instead of once per connection. This ensures that all replicas work with the same limits. For normal paged queries `max_result_size` is set to the usual `query::result_memory_limiter::maximum_result_size` For queries that can consume unlimited amount of memory -- unpaged and reverse queries -- this is set to the value of the aforementioned `max_memory_for_unlimited_query_{soft,hard}_limit` configuration item, but only for user reads, internal reads are not limited. This has the side-effect that reverse reads now send entire partitions in a single page, but this is not that bad. The data was already read, and its size was below the limit, the replica might as well send it all. Fixes: #5870 " * 'nonpaged-query-limit/v5' of https://github.com/denesb/scylla: (26 commits) test: database_test: add test for enforced max result limit mutation_partition: abort read when hard limit is exceeded for non-paged reads query-result.hh: move the definition of short_read to the top test: cql_test_env: set the max_memory_unlimited_query_{soft,hard}_limit test: set the allow_short_read slice option for paged queries partition_slice_builder: add with_option() result_memory_accounter: remove default constructor query_*(): use the coordinator specified memory limit for unlimited queries storage_proxy: use read_command::max_result_size to pass max result size around query: result_memory_limiter: use the new max_result_size type query: read_command: add max_result_size query: read_command: use tagged ints for limit ctor params query: read_command: add separate convenience constructor service: query_pager: set the allow_short_read flag result_memory_accounter: check(): use _maximum_result_size instead of hardcoded limit storage_proxy: add get_max_result_size() result_memory_limiter: add unlimited_result_size constant database: add get_statement_scheduling_group() database: query_mutations(): obtain the memory accounter inside query: query_class_config: use max_result_size for the max_memory_for_unlimited_query field ...	2020-07-29 13:41:53 +03:00
Avi Kivity	22fe38732d	Update tools/jmx and tools/java submodules * tools/java a9480f3a87...aa7898d771 (4): > dist: debian: do not require root during package build > cassandra-stress: Add serial consistency options > dist: debian: fix detection of debuild > bin tools: Use non-default `cassandra.config` * tools/jmx c0d9d0f...626fd75 (1): > dist: debian: do not require root during package build Fixes #6655.	2020-07-29 12:55:18 +03:00
Botond Dénes	3804dfcc0c	test: database_test: add test for enforced max result limit Two tests are added: one that works on the low-level database API, and another one that works on the CQL API.	2020-07-29 08:32:34 +03:00
Botond Dénes	f7a4d19fb1	mutation_partition: abort read when hard limit is exceeded for non-paged reads If the read is not paged (short read is not allowed) abort the query if the hard memory limit is reached. On reaching the soft memory limit a warning is logged. This should allow users to adjust their application code while at the same time protecting the database from the really bad queries. The enforcement happens inside the memory accounter and doesn't require cooperation from the result builders. This ensures memory limit set for the query is respected for all kind of reads. Previously non-paged reads simply ignored the memory accounter requesting the read to stop and consumed all the memory they wanted.	2020-07-29 08:32:31 +03:00
Rafael Ávila de Espíndola	c4cb3817cf	build: Use -fdata-sections and -ffunction-sections This is a 4.2% reduction in the scylla text size, from 38975956 to 37404404 bytes. When benchmarking perf_simple_query without --shuffle-sections, there is no performance difference. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200724032504.3004-1-espindola@scylladb.com>	2020-07-28 19:39:26 +03:00
Botond Dénes	02a7492d62	query-result.hh: move the definition of short_read to the top It will be used by `result_memory_{limiter,accounter}` soon.	2020-07-28 18:00:29 +03:00
Botond Dénes	43c0da4b63	test: cql_test_env: set the max_memory_unlimited_query_{soft,hard}_limit To an unlimited value, in order to avoid aborting any unpaged queries executed by tests, that would exceed the default result limit of 1MB/100MB.	2020-07-28 18:00:29 +03:00
Botond Dénes	648ce473ab	test: set the allow_short_read slice option for paged queries Some tests use the lower level methods directly and meant to use paging but didn't and nobody noticed. This was revealed by the enforcement of max result size (introduced in a later patch), which caused these tests to fail due to exceeding the max result size. This patch fixes this by setting the `allow_short_reads` slice option.	2020-07-28 18:00:29 +03:00
Botond Dénes	d27f8321d7	partition_slice_builder: add with_option()	2020-07-28 18:00:29 +03:00
Botond Dénes	6660a5df51	result_memory_accounter: remove default constructor If somebody wants to bypass proper memory accounting they should at the very least be forced to consider if that is indeed wise and think a second about the limit they want to apply.	2020-07-28 18:00:29 +03:00
Botond Dénes	9eab5bca27	query_*(): use the coordinator specified memory limit for unlimited queries It is important that all replicas participating in a read use the same memory limits to avoid artificial differences due to different amount of results. The coordinator now passes down its own memory limit for reads, in the form of max_result_size (or max_size). For unpaged or reverse queries this has to be used now instead of the locally set max_memory_unlimited_query configuration item. To avoid the replicas accidentally using the local limit contained in the `query_class_config` returned from `database::make_query_class_config()`, we refactor the latter into `database::get_reader_concurrency_semaphore()`. Most of its callers were only interested in the semaphore only anyway and those that were interested in the limit as well should get it from the coordinator instead, so this refactoring is a win-win.	2020-07-28 18:00:29 +03:00
Botond Dénes	159d37053d	storage_proxy: use read_command::max_result_size to pass max result size around Use the recently added `max_result_size` field of `query::read_command` to pass the max result size around, including passing it to remote nodes. This means that the max result size will be sent along each read, instead of once per connection. As we want to select the appropriate `max_result_size` based on the type of the query as well as based on the query class (user or internal) the previous method won't do anymore. If the remote doesn't fill this field, the old per-connection value is used.	2020-07-28 18:00:29 +03:00
Botond Dénes	fbbbc3e05c	query: result_memory_limiter: use the new max_result_size type	2020-07-28 18:00:29 +03:00
Botond Dénes	92a7b16cba	query: read_command: add max_result_size This field will replace max size which is currently passed once per established rpc connection via the CLIENT_ID verb and stored as an auxiliary value on the client_info. For now it is unused, but we update all sites creating a read command to pass the correct value to it. In the next patch we will phase out the old max size and use this field to pass max size on each verb instead.	2020-07-28 18:00:29 +03:00
Botond Dénes	8992bcd1f8	query: read_command: use tagged ints for limit ctor params The convenience constructor of read_command now has two integer parameter next to each other. In the next patch we intend to add another one. This is recipe for disaster, so to avoid mistakes this patch converts these parameters to tagged integers. This makes sure callers pass what they meant to pass. As a matter of fact, while fixing up call-sites, I already found several ones passing `query::max_partitions` to the `row_limit` parameter. No harm done yet, as `query::max_partitions` == `query::max_rows` but this shows just how easy it is to mix up parameters with the same type.	2020-07-28 18:00:29 +03:00
Botond Dénes	2ca118b2d5	query: read_command: add separate convenience constructor query::read_command currently has a single constructor, which serves both as an idl constructor (order of parameters is fixed) and a convenience one (most parameters have default values). This makes it very error prone to add new parameters, that everyone should fill. The new parameter has to be added as last, with a default value, as the previous ones have a default value as well. This means the compiler's help cannot be enlisted to make sure all usages are updated. This patch adds a separate convenience constructor to be used by normal code. The idl constructor looses all default parameters. New parameters can be added to any position in the convenience constructor (to force users to fill in a meaningful value) while the removed default parameters from the idl constructor means code cannot accidentally use it without noticing.	2020-07-28 18:00:29 +03:00
Botond Dénes	1615fe4c5e	service: query_pager: set the allow_short_read flag All callers should set this already before passing the slice to the pager, however not all actually do (e.g. `cql3::indexed_table_select_statement::read_posting_list()`). Instead of auditing each call site, just make sure this is set in the pager itself. If someone is creating a pager we can be sure they mean to use paging.	2020-07-28 18:00:29 +03:00
Botond Dénes	989142464c	result_memory_accounter: check(): use _maximum_result_size instead of hardcoded limit The use of the global `result_memory_limiter::maximum_result_size` is probably a leftover from before the `_maximum_result_size` member was introduced (`aa083d3d85`).	2020-07-28 18:00:29 +03:00
Botond Dénes	9eb6d704b2	storage_proxy: add get_max_result_size() Meant to be used by the coordinator node to obtain the max result size applicable to the query-class (determined based on the current scheduling group). For normal paged queries the previously used `query::result_memory_limiter::maximum_result_size` is used uniformly. For reverse and unpaged queries, a query class dependent value is used. For user reads, the value of the `max_memory_for_unlimited_query_{soft,hard}_limit` configuration items is used, for other classes no limit is used (`query::result_memory_limiter::unlimited_result_size`).	2020-07-28 18:00:29 +03:00
Botond Dénes	c364c7c6a2	result_memory_limiter: add unlimited_result_size constant To be used as the max result size for internal queries.	2020-07-28 18:00:29 +03:00
Botond Dénes	a64d9b8883	database: add get_statement_scheduling_group()	2020-07-28 18:00:29 +03:00
Botond Dénes	d5cc932a0b	database: query_mutations(): obtain the memory accounter inside Instead of requesting callers to do it and pass it as a parameter. This is in line with data_query().	2020-07-28 18:00:29 +03:00
Botond Dénes	92ce39f014	query: query_class_config: use max_result_size for the max_memory_for_unlimited_query field We want to switch from using a single limit to a dual soft/hard limit. As a first step we switch the limit field of `query_class_config` to use the recently introduced type for this. As this field has a single user at the moment -- reverse queries (and not a lot of propagation) -- we update it in this same patch to use the soft/hard limit: warn on reaching the soft limit and abort on the hard limit (the previous behaviour).	2020-07-28 18:00:29 +03:00
Botond Dénes	8aee7662a9	query: introduce max_result_size To be used to pass around the soft/hard limit configured via `max_memory_for_unlimited_query_{soft,hard}_limit` in the codebase.	2020-07-28 18:00:29 +03:00
Botond Dénes	517a941feb	query_class_config: move into the query namespace It belongs there, its name even starts with "query".	2020-07-28 18:00:29 +03:00
Botond Dénes	46d5b651eb	db/config: introduce max_memory_for_unlimited_query_soft_limit and max_memory_for_unlimited_query_hard_limit This pair of limits replace the old max_memory_for_unlimited_query one, which remains as an alias to the hard limit. The soft limit inherits the previous value of the limit (1MB), when this limit is reached a warning will be logged allowing the users to adjust their client codes without downtime. The hard limit starts out with a more permissive default of 100MB. When this is reached queries are aborted, the same behaviour as with the previous single limit. The idea is to allow clients a grace period for fixing their code, while at the same time protecting the database from the really bad queries.	2020-07-28 18:00:29 +03:00
Botond Dénes	9faaf46d4b	utils: config_src::add_command_line_options(): drop name and desc args Now that there are no ad-hoc aliases needing to overwrite the name and description parameter of this method, we can drop these and have each config item just use `name()` and `desc()` to access these.	2020-07-28 18:00:29 +03:00
Botond Dénes	dc23736d0c	db/config: replace ad-hoc aliases with alias mechanism We already uses aliases for some configuration items, although these are created with an ad-hoc mechanism that only registers them on the command line. Replace this with the built-in alias mechanism in the previous patch, which has the benefit of conflict resolution and also working with YAML.	2020-07-28 18:00:29 +03:00
Botond Dénes	003f5e9e54	utils: config: add alias support Allow configuration items to also have an alias, besides the name. This allows easy replacement of configuration items, with newer names, while still supporting the old name for backward compatibility. The alias mechanism takes care of registering both the name and the alias as command line arguments, as well as parsing them from YAML. The command line documentation of the alias will just refer to the name for documentation.	2020-07-28 17:59:51 +03:00
Raphael S. Carvalho	99b75d1f63	compaction: Improve compaction efficiency by killing the procedure that trims jobs This procedure consists of trimming SSTables off a compaction job until its weight[1] is smaller than one already taken by a running compaction. Min threshold is respected though, we only trim a job while its size is > min threshold. [1]: this value is a logarithimic function of the total size of the SSTables in a given job, and it's used to control the compaction parallelism. It's intended to improve the compaction efficiency by allowing more jobs to run in parallel, but it turns out that this can have an opposite effect because the write amplification can be significantly increased. Take STCS for example, the more similar-sized SSTables you compact together, the higher the compaction efficiency will be. With the trimming procedure, we're aiming at running smaller jobs, thinking that running more parallel compactions will provide us with better performance, but that's not true. Most of the efficiency comes from making informed decisions when selecting candidates for compaction. Similarly, this will also hurt TWCS, which does STCS in current window, and a sort of major compaction when the current window closes. If the TWCS jobs are trimmed, we'll likely need another compaction to get to the desired state, recompacting the same data again. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200728143648.31349-1-raphaelsc@scylladb.com>	2020-07-28 17:44:00 +03:00
Takuya ASADA	d7de9518fe	scylla_setup: skip boot partition On GCE, /dev/sda14 reported as unused disk but it's BIOS boot partition, should not use for scylla data partition, also cannot use for it since it's too small. It's better to exclude such partiotion from unsed disk list. Fixes #6636	2020-07-28 12:19:55 +03:00
Asias He	e6f640441a	repair: Fix race between create_writer and wait_for_writer_done We saw scylla hit user after free in repair with the following procedure during tests: - n1 and n2 in the cluster - n2 ran decommission - n2 sent data to n1 using repair - n2 was killed forcely - n1 tried to remove repair_meta for n1 - n1 hit use after free on repair_meta object This was what happened on n1: 1) data was received -> do_apply_rows was called -> yield before create_writer() was called 2) repair_meta::stop() was called -> wait_for_writer_done() / do_wait_for_writer_done was called with _writer_done[node_idx] not engaged 3) step 1 resumed, create_writer() was called and _repair_writer object was referenced 4) repair_meta::stop() finished, repair_meta object and its member _repair_writer was destroyed 5) The fiber created by create_writer() at step 3 hit use after free on _repair_writer object To fix, we should call wait_for_writer_done() after any pending operations were done which were protected by repair_meta::_gate. This prevents wait for writer done finishes before the writer is in the process of being created. Fixes: #6853 Fixes: #6868 Backports: 4.0, 4.1, 4.2	2020-07-28 11:53:40 +03:00
Asias He	bdaf904864	storage_service: Improve log on removing pending replacing node The log "removing pending replacing node" is printed whenever a node jumps to normal status including a normal restart. For example, on node1, we saw the following when node2 restarts. [shard 0] storage_service - Node 127.0.0.2 state jump to normal [shard 0] storage_service - Remove node 127.0.0.2 from pending replacing endpoint This is confusing since no node is really being replaced. To fix, log only if a node is really removed from the pending replacing nodes. In addition, since do_remove_node will call del_replacing_endpoint, there is no need to call del_replacing_endpoint again in storage_service::handle_state_normal after do_remove_node. Fixes #6936	2020-07-28 11:51:22 +03:00
Piotr Sarna	ee35c4c3d6	db: handle errors when loading view build progress Currently, encountering an error when loading view build progress would result in view builder refusing to start - which also means that future views would not be built until the server restarts. A more user-friendly solution would be to log an error message, but continue to boot the view builder as if no views are currently in progress, which would at least allow future views to be built correctly. The test case is also amended, since now it expects the call to return that "no view builds are in progress" instead of an exception. Fixes #6934 Tests: unit(dev) Message-Id: <9f26de941d10e6654883a919fd43426066cee89c.1595922374.git.sarna@scylladb.com>	2020-07-28 11:32:09 +03:00
Piotr Sarna	0dbcaa1fd9	test: add a case for disengaged optional values in system tables Following the patch which fixes incorrect access to disengaged optionals, a test case which used to reproduce the problem is added. Message-Id: <99174d47c1c55ed8730b4998d5e5e464990d36e3.1595834092.git.sarna@scylladb.com>	2020-07-28 10:06:42 +03:00
Piotr Sarna	43a3719fe4	cql3: fix potential segfault on disengaged optional In untyped_result_set::get_view, there exists a silent assumption that the underlying data, which is an optional, to always be engaged. In case the value happens to be disengaged it may lead to creating an incorrect bytes view from a disengaged optional. In order to make the code safer (since values parsed by this code often come from the network and can contain virtually anything) a segfault is replaced with an exception, by calling optional's value() function, which throws when called on disengaged optionals. Fixes #6915 Tests: unit(dev) Message-Id: <6e9e4ca67e0e17c17b718ab454c3130c867684e2.1595834092.git.sarna@scylladb.com>	2020-07-28 10:06:00 +03:00
Raphael S. Carvalho	0d70efa58e	sstable: index_reader: Make sure streams are all properly closed on failure Turns out the fix `f591c9c710` wasn't enough to make sure all input streams are properly closed on failure. It only closes the main input stream that belongs to context, but it misses all the input streams that can be opened in the consumer for promote index reading. Consumer stores a list of indexes, where each of them has its own input stream. On failure, we need to make sure that every single one of them is properly closed before destroying the indexes as that could cause memory corruption due to read ahead. Fixes #6924. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200727182214.377140-1-raphaelsc@scylladb.com>	2020-07-28 10:01:44 +03:00
Rafael Ávila de Espíndola	34d60efbf9	sstables: Delete write_failed It is no longer used. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-27 11:23:48 -07:00
Rafael Ávila de Espíndola	030f96be1a	sstables: Move monitor after writer in compaction_writer With this the monitor is destroyed first. It makes intuitive sense to me to destroy a monitor_X before X. This is also the order we had before `55a8b6e3c9`. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-27 11:23:47 -07:00
Juliusz Stasiewicz	9e4247090f	cdc: Implementations of `delta_mode::off/keys` At the stage of `finish`ing CDC mutation, deltas are removed (mode `off`) or edited to keep only PK+CK of the base table (mode `keys`). Fixes #6838	2020-07-27 19:05:47 +02:00
Juliusz Stasiewicz	c05128d217	cdc: Infrastructure for controlling `delta_mode` The goal is to have finer control over CDC "delta" rows, i.e.: - disable them totally (mode `off`); - record only PK+CK (mode `keys`); - make them behave as usual (mode `full`, default). This commit adds the necessary infrastructure to `cdc_options`.	2020-07-27 19:00:06 +02:00
Nadav Har'El	a7df8486b1	alternator test: add test for tracing In commit `8d27e1b`, we added tracing (see docs/tracing.md) support to Alternator requests. However, we never had a functional test that verifies this feature actually works as expected, and we recently noticed that for the GetItem and BatchGetItem requestd, the trace doesn't really work (it returns an empty list of events). So this patch adds a test, test/alternator/test_tracing.py, which verifies that the tracing feature works for the PutItem, GetItem, DeleteItem, UpdateItem, BatchGetItem, BatchWriteItem, Query and Scan operations. This test is very peculiar. It needs to use out-of-band REST API requests to enable and disable tracing (of course, the test is skipped when running against AWS - this is a Scylla-only feature). It also needs to read CQL-only system tables and does this using Alternator's ".scylla.alternator" interface for system tables - which came through for us here beautifully and demonstrated their usefulness. I paid a lot of attention for this test to remain reasonably fast - this entire test now runs in a little less than one second. Achieving this while testing eight different requests was a bit of a challenge, because traces take time until they are visible in the trace table. This is the main reason why in this patch the test for all eight request types are done in one test, instead of eight separate tests. Fixes #6891 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200727115401.1199024-1-nyh@scylladb.com>	2020-07-27 14:31:45 +02:00
Takuya ASADA	97fa17b17b	scylla_setup: remove square bracket from disk prompt selected list Selected list on disk prompt is looks like an alternatives, it's better to use single quote. Fixes #6760	2020-07-27 14:50:31 +03:00
Avi Kivity	3f84d41880	Merge "messaging: make verb handler registering independent of current scheduling group" from Botond " `0c6bbc8` refactored `get_rpc_client_idx()` to select different clients for statement verbs depending on the current scheduling group. The goal was to allow statement verbs to be sent on different connections depending on the current scheduling group. The new connections use per-connection isolation. For backward compatibility the already existing connections fall-back to per-handler isolation used previously. The old statement connection, called the default statement connection, also used this. `get_rpc_client_idx()` was changed to select the default statement connection when the current scheduling group is the statement group, and a non-default connection otherwise. This inadvertently broke `scheduling_group_for_verb()` which also used this method to get the scheduling group to be used to isolate a verb at handle register time. This method needs the default client idx for each verb, but if verb registering is run under the system group it instead got the non-default one, resulting in the per-handler isolation not being set-up for the default statement connection, resulting in default statement verb handlers running in whatever scheduling group the process loop of the rpc is running in, which is the system scheduling group. This caused all sorts of problems, even beyond user queries running in the system group. Also as of `0c6bbc8` queries on the replicas are classified based on the scheduling group they are running on, so user reads also ended up using the system concurrency semaphore. In particular this caused severe problems with ranges scans, which in some cases ended up using different semaphores per page resulting in a crash. This could happen because when the page was read locally the code would run in the statement scheduling group, but when the request arrived from a remote coordinator via rpc, it was read in a system scheduling group. This caused a mismatch between the semaphore the saved reader was created with and the one the new page was read with. The result was that in some cases when looking up a paused reader from the wrong semaphore, a reader belonging to another read was returned, creating a disconnect between the lifecycle between readers and that of the slice and range they were referencing. This series fixes the underlying problem of the scheduling group influencing the verb handler registration, as well as adding some additional defenses if this semaphore mismatch ever happens in the future. Inactive read handles are now unique across all semaphores, meaning that it is not possible anymore that a handle succeeds in looking up a reader when used with the wrong semaphore. The range scan algorithm now also makes sure there is no semaphore mismatch between the one used for the current page and that of the saved reader from the previous page. I manually checked that each individual defense added is already preventing the crash from happening. Fixes: #6613 Fixes: #6907 Fixes: #6908 Tests: unit(dev), manual(run the crash reproducer, observe no crash) " * 'query-classification-regressions/v1' of https://github.com/denesb/scylla: multishard_mutation_query: use cached semaphore messaging: make verb handler registering independent of current scheduling group multishard_mutation_query: validate the semaphore of the looked-up reader reader_concurrency_semaphore: make inactive read handles unique across semaphores reader_concurrency_semaphore: add name() accessor reader_concurrency_semaphore: allow passing name to no-limit constructor	2020-07-27 13:56:52 +03:00
Nadav Har'El	9080709c56	docs: add paragraph to tracing.md Issue #6919 was caused by an incorrect assumption: I assumed that we see the tracing session record, we can be sure that the event records for this session had already been written. In this patch we add a paragraph to the tracing documentation - docs/tracing.md, which explains that this assumption is in fact incorrect: 1. On a multi-node setup, replicas may continue to write tracing events after the coordinator "finished" (moved to background) the request and wrote the session record. 2. Even on a single-node setup, the writes of the session record and the individual events are asynchronous, and can happen in an unexpected order (which is what happened in issue #6919). Refs #6919. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200727102438.1194314-1-nyh@scylladb.com>	2020-07-27 13:38:57 +03:00
Takuya ASADA	0ffa0e8745	dist_util.py: use correct ID value to detect Amazon Linux 2 On `2d63acdd6a` we replaced 'ol' and 'amzn' to 'oracle' and 'amazon', but distro.id() actually returns 'amzn' for Amazon Linux 2, so we need to revert the change. Fixes #6882	2020-07-27 12:46:21 +03:00
Botond Dénes	eeeef0a0f1	multishard_mutation_query: use cached semaphore Instead of requesting the query class config from the database every time the semaphore is needed, use the cached one by calling `semaphore()`.	2020-07-27 12:17:22 +03:00
Nadav Har'El	65f75e3862	alternator test: enable test_get_records After issue #6864 was fixed, the test_streams.py::test_get_records test no longer fails, so its "xfail" marker can be removed. Refs #6864. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200722132518.1077882-1-nyh@scylladb.com>	2020-07-27 09:19:37 +02:00
Nadav Har'El	f488eaebaf	merge: db/view: view_update_generator: make staging reader evictable Merged patch set by Botond Dénes: The view update generation process creates two readers. One is used to read the staging sstables, the data which needs view updates to be generated for, and another reader for each processed mutation, which reads the current value (pre-image) of each row in said mutation. The staging reader is created first and is kept alive until all staging data is processed. The pre-image reader is created separately for each processed mutation. The staging reader is not restricted, meaning it does not wait for admission on the relevant reader concurrency semaphore, but it does register its resource usage on it. The pre-image reader however is restricted. This creates a situation, where the staging reader possibly consumes all resources from the semaphore, leaving none for the later created pre-image reader, which will not be able to start reading. This will block the view building process meaning that the staging reader will not be destroyed, causing a deadlock. This patch solves this by making the staging reader restricted and making it evictable. To prevent thrashing -- evicting the staging reader after reading only a really small partition -- we only make the staging reader evictable after we have read at least 1MB worth of data from it. test/boost: view_build_test: add test_view_update_generator_buffering test/boost: view_build_test: add test test_view_update_generator_deadlock reader_permit: reader_resources: add operator- and operator+ reader_concurrency_semaphore: add initial_resources() test: cql_test_env: allow overriding database_config mutation_reader: expose new_reader_base_cost db/view: view_updating_consumer: allow passing custom update pusher db/view: view_update_generator: make staging reader evictable db/view: view_updating_consumer: move implementation from table.cc to view.cc database: add make_restricted_range_sstable_reader() Signed-off-by: Botond Dénes <bdenes@scylladb.com> --- db/view/view_updating_consumer.hh \| 51 ++++++++++++++++++++++++++++--- db/view/view.cc \| 39 +++++++++++++++++------ db/view/view_update_generator.cc \| 19 +++++++++--- 3 files changed, 91 insertions(+), 18 deletions(-)	2020-07-27 09:19:37 +02:00
Botond Dénes	fe127a2155	sstables: clamp estimated_partitions to [1, +inf) in writers In some cases estimated number of partitions can be 0, which is albeit a legit estimation result, breaks many low-level sstable writer code, so some of these have assertions to ensure estimated partitions is > 0. To avoid hitting this assert all users of the sstable writers do the clamping, to ensure estimated partitions is at least 1. However leaving this to the callers is error prone as #6913 has shown it. As this clamping is standard practice, it is better to do it in the writers themselves, avoiding this problem altogether. This is exactly what this patch does. It also adds two unit tests, one that reproduces the crash in #6913, and another one that ensures all sstable writers are fine with estimated partitions being 0 now. Call sites previously doing the clamping are changed to not do it, it is unnecessary now as the writer does it itself. Fixes #6913 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200724120227.267184-1-bdenes@scylladb.com>	2020-07-27 09:19:37 +02:00
Avi Kivity	91619d77a1	Merge "Simplify the lifetime management of write monitors" from Raphael " This makes sure that monitors are always owned by the same struct that owns the monitored writer, simplifying the lifetime management. This hopefully fixes some of the crashes we have observed around this area. " * 'espindola/use-compaction_writer-v6' of https://github.com/espindola/scylla: sstables: Rename _writer to _compaction_writer sstables: Move compaction_write_monitor to compaction_writer sstables: Add couple of writer() getters to garbage_collected_sstable_writer sstables: Move compaction_write_monitor earlier in the file	2020-07-27 09:19:37 +02:00
Dejan Mircevski	c11b2de84c	cql3: Fix tombstone-range check for TRUE A DELETE statement checks that the deletion range is symmetrically bounded. This check was broken for expression TRUE. Test the fix by setting initial_key_restrictions::expression to TRUE, since CQL doesn't currently allow WHERE TRUE. That change has been proposed anyway in feedback to #5763: https://github.com/scylladb/scylla/pull/5763#discussion_r443213343 Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-27 09:19:37 +02:00
Dejan Mircevski	ba74659f5a	cql/restrictions: Constrain to_sorted_vector As requested in #5763 feedback, enforce the function's assumptions with concept asserts. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-27 09:19:37 +02:00
Botond Dénes	0df4c2fd3b	messaging: make verb handler registering independent of current scheduling group `0c6bbc8` refactored `get_rpc_client_idx()` to select different clients for statement verbs depending on the current scheduling group. The goal was to allow statement verbs to be sent on different connections depending on the current scheduling group. The new connections use per-connection isolation. For backward compatibility the already existing connections fall-back to per-handler isolation used previously. The old statement connection, called the default statement connection, also used this. `get_rpc_client_idx()` was changed to select the default statement connection when the current scheduling group is the statement group, and a non-default connection otherwise. This inadvertently broke `scheduling_group_for_verb()` which also used this method to get the scheduling group to be used to isolate a verb at handle register time. This method needs the default client idx for each verb, but if verb registering is run under the system group it instead got the non-default one, resulting in the per-handler isolation not being set-up for the default statement connection, resulting in default statement verb handlers running in whatever scheduling group the process loop of the rpc is running in, which is the system scheduling group. This caused all sorts of problems, even beyond user queries running in the system group. Also as of `0c6bbc8` queries on the replicas are classified based on the scheduling group they are running on, so user reads also ended up using the system concurrency semaphore.	2020-07-27 10:11:21 +03:00
Asias He	cd7d64f588	gossip: Introduce GOSSIP_GET_ENDPOINT_STATES verb The new verb is used to replace the current gossip shadow round implementation. Current shadow round implementation reuses the gossip syn and ack async message, which has plenty of drawbacks. It is hard to tell if the syn messages to a specific peer node has responded. The delayed responses from shadow round can apply to the normal gossip states even if the shadow round is done. The syn and ack message handler are full special cases due to the shadow round. All gossip application states including the one that are not relevant are sent back. The gossip application states are applied and the gossip listeners are called as if is in the normal gossip operation. It is completely unnecessary to call the gossip listeners in the shadow round. This patch introduces a new verb to request the exact gossip application states the shadow round needed with a synchronous verb and applies the application states without calling the gossip listeners. This patch makes the shadow round easier to reason about, more robust and efficient. Refs: #6845 Tests: update_cluster_layout_tests.py	2020-07-27 09:15:11 +08:00
Asias He	bebd683177	gossip: Add do_apply_state_locally helper The code in do_apply_state_locally will be shared in the next patch. Refs: #6845 Tests: update_cluster_layout_tests.py	2020-07-27 09:00:47 +08:00
Piotr Sarna	d08e22c4eb	alternator: fix tracing BatchGetItem The BatchGetItem request did not pass its trace state to lower layers in a correct manner, which resulted in losing tracing information. Refs #6891 Message-Id: <078f58a0f76b9f182f671a8d16e147ded489138c.1595515815.git.sarna@scylladb.com>	2020-07-23 20:05:10 +03:00
Piotr Sarna	7256572e41	alternator: fix tracing GetItem The GetItem request did not pass the trace state properly, which resulted in having almost empty traces. Refs #6891 Tests: manual: Before: session_id \| event_id \| activity \| scylla_parent_id \| scylla_span_id \| source \| source_elapsed \| thread --------------------------------------+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+------------------+-----------------+-----------+----------------+--------- 57995da0-cce4-11ea-97ea-000000000000 \| 579971c4-cce4-11ea-97ea-000000000000 \| GetItem \| 0 \| 131309406144163 \| 127.0.0.1 \| 0 \| shard 0 After: session_id \| event_id \| activity \| scylla_parent_id \| scylla_span_id \| source \| source_elapsed \| thread --------------------------------------+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+------------------+-----------------+-----------+----------------+--------- 57995da0-cce4-11ea-97ea-000000000000 \| 579971c4-cce4-11ea-97ea-000000000000 \| GetItem \| 0 \| 131309406144163 \| 127.0.0.1 \| 0 \| shard 0 57995da0-cce4-11ea-97ea-000000000000 \| 57997327-cce4-11ea-97ea-000000000000 \| Creating read executor for token -7535857341981351089 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE \| 0 \| 131309406144163 \| 127.0.0.1 \| 35 \| shard 0 57995da0-cce4-11ea-97ea-000000000000 \| 5799733d-cce4-11ea-97ea-000000000000 \| read_data: querying locally \| 0 \| 131309406144163 \| 127.0.0.1 \| 38 \| shard 0 57995da0-cce4-11ea-97ea-000000000000 \| 57997358-cce4-11ea-97ea-000000000000 \| Start querying the token range that starts with -7535857341981351089 \| 0 \| 131309406144163 \| 127.0.0.1 \| 40 \| shard 0 57995da0-cce4-11ea-97ea-000000000000 \| 57997579-cce4-11ea-97ea-000000000000 \| Querying is done \| 0 \| 131309406144163 \| 127.0.0.1 \| 95 \| shard 0 Message-Id: <d585ff7aaaeebf2050890643d40cdafb2efb8d98.1595509338.git.sarna@scylladb.com>	2020-07-23 20:05:06 +03:00
Avi Kivity	39db54a758	Merge "Use seastar::with_file_close_on_failure in commitlog" from Benny " `close_on_failure` was committed to seastar so use the library version. This requires making the lambda function passed to it nothrow move constructible, so this series also makes db::commitlog::descriptor move constructor noexcept and changes allocate_segment_ex and segment::segment to get a descriptor by value rather than by reference. Test: unit(dev), commitlog_test(debug) " * tag 'commit-log-use-with_file_close_on_failure-v1' of github.com:bhalevy/scylla: commitlog: use seastar::with_file_close_on_failure commitlog: descriptor: make nothrow move constructible commitlog: allocate_segment_ex, segment: pass descriptor by value commitlog: allocate_segment_ex: filename capture is unused	2020-07-23 19:23:23 +03:00
Rafael Ávila de Espíndola	bca4eb8b8c	Build: Garbage collect dead sections In another patch I noticed gcc producing dead functions. I am not sure why gcc is doing that. Some of those functions are already placed in independent sections, and so can be garbage collected by the linker. This is a 1% text section reduction in scylla, from 39363380 to 38974324 bytes. There is no difference in the tps reported by perf_simple_query. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200723152511.8214-1-espindola@scylladb.com>	2020-07-23 18:57:01 +03:00
Piotr Sarna	6cdc9f1a43	Merge 'alternator: refactor api_error class' from Nadav In the patch "Add exception overloads for Dynamo types", Alternator's single api_error exception type was replaced by a more complex hierarchy of types. The implementation was not only longer and more complex to understand - I believe it also negated an important observation: The "api_error" exception type is special. It is not an exception created by code for other code. It is not meant to be caught in Alternator code. Instead, it is supposed to contain an error message created for the user, containing one of the few supported exception exception "names" described in the DynamoDB documentation, and a user-readable text message. Throwing such an exception in Alternator code means the thrower wants the request to abort immediately, and this message to reach the user. These exceptions are not designed to be caught in Alternator code. Code should use other exceptions - or alternatives to exceptions (e.g., std::optional) for problems that should be handled before returning a different error to the user. Moreover, "api_error" isn't just thrown as an exception - it can also be returned-by-value in a executor::request_return_type) - which is another reason why it should not be subclassed. For these reasons, I believe we should have a single api_error type, and it's wrong to subclass it. So in this patch I am reverting the subclasses and template added in the aforementioned patch. Still, one correct observation made in that patch was that it is inconvenient to type in DynamoDB exception names (no help from the editor in completing those strings) and also error-prone. In this patch we propse a different - simpler - solution to the same problem: We add trivial factory functions, e.g., api_error::validation(std::string) as a shortcut to api_error("ValidationException"). The new implementation is easy to understand, and also more self explanatory to readers: It is now clear that "api_error::validation()" is actually a user-visible "api_error", something which was obscured by the name validation_exception() used before this patch. Finally, this patch also improves the comment in error.hh explaining the purpose of api_error and the fact it can be returned or thrown. The fact it should not be subclassed is legislated with a "finally". There is also no point of this class inheriting from std::exception or having virtual functions, or an empty constructor - so all these are dropped as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com> * 'api-error-refactor' of https://github.com/nyh/scylla: alternator: use api_error factory functions in auth.cc alternator: use api_error::validation() alternator: use api_error factory functions in executor.cc alternator: use api_error factory functions in server.cc alternator: refactor api_error class	2020-07-23 17:35:56 +02:00
Piotr Sarna	e7c18963e4	test: check sizes before dereferencing the vector It's better to assert a certain vector size first and only then dereference its elements - otherwise, if a bug causes the size to be different, the test can crash with a segfault on an invalid dereference instead of graciously failing with a test assertion.	2020-07-23 16:49:35 +03:00
Piotr Sarna	6b04034566	cql3: fix multi column restriction bounds Generating bounds from multi-column restrictions used to create incorrect nonwrapping intervals, which only happened to work because they're implemented as wrapping intervals underneath. The following CQL restriction: WHERE (a, b) >= (1, 0) should translate to (a, b) >= (1, 0), no upper bound, while it incorrectly translates to (a, b) >= (1, 0) AND (a, b) < empty-prefix. Since empty prefix is smaller than any other clustering key, this range was in fact not correct, since the assumption was that starting bound was never greater than the ending bound. While the bug does not trigger any errors in tests right now, it starts to do so after the code is modified in order to correctly handle empty intervals (intervals with end > start).	2020-07-23 16:49:24 +03:00
Botond Dénes	b7cfa4ea97	multishard_mutation_query: validate the semaphore of the looked-up reader To make sure it belongs to the same semaphore that the database thinks is appropriate for the current query. Since a semaphore mismatch points to a serious bug, we use `on_internal_error()` to allow generating coredumps on-demand.	2020-07-23 16:43:37 +03:00
Botond Dénes	11105cbb78	reader_concurrency_semaphore: make inactive read handles unique across semaphores Currently inactive read handles are only unique within the same semaphore, allowing for an unregister against another semaphore to potentially succeed. This can lead to disasters ranging from crashes to data corruption. While a handle should never be used with another semaphore in the first place, we have recently seen a bug (#6613) causing exactly that, so in this patch we prevent such unregister operations from ever succeeding by making handles unique across all semaphores. This is achieved by adding a pointer to the semaphore to the handle.	2020-07-23 16:43:33 +03:00
Botond Dénes	d12540bfbf	reader_concurrency_semaphore: add name() accessor Allows identifying the semaphore in question in semaphore related error messages.	2020-07-23 16:42:54 +03:00
Botond Dénes	88129f500f	reader_concurrency_semaphore: allow passing name to no-limit constructor So tests can provide names for semaphores as well, making test output more clear.	2020-07-23 16:42:36 +03:00
Nadav Har'El	b661c1eae2	alternator: use api_error factory functions in auth.cc All the places in auth.cc where we constructed an api_error with inline strings now use api_error factory functions. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-07-23 15:36:39 +03:00
Nadav Har'El	bca88521ba	alternator: use api_error::validation() All the places in conditions.cc, expressions.cc and serialization.cc where we constructed an api_error, we always used the ValidationException type string, which the code repeated dozens of times. This patch converts all these places to use the factory function api_error::validation(). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-07-23 15:36:39 +03:00
Nadav Har'El	06ba0c0232	alternator: use api_error factory functions in executor.cc All the places in executor.cc where we constructed an api_error with inline strings now use api_error factory functions. Most of them, but not all of them, were api_error::validation(). We also needed to add a couple more of these factory functions. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-07-23 15:36:39 +03:00
Nadav Har'El	81589be00a	alternator: use api_error factory functions in server.cc All the places in server.cc where we constructed an api_error with inline strings now use api_error factory functions - we needed to add a few more. Interestingly, we had a wrong type string for "Internal Server Error", which we fix in this patch. We wrote the type string like that - with spaces - because this is how it was listed in the DynamoDB documentation at https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html But this was in fact wrong, and it should be without spaces: "InternalServerError". The botocore library (for example) recognizes it this way, and this string can also be seen in other online DynamoDB examples. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-07-23 15:36:39 +03:00
Nadav Har'El	5a35632cd3	alternator: refactor api_error class In the patch "Add exception overloads for Dynamo types", Alternator's single api_error exception type was replaced by a more complex hierarchy of types. The implementation was not only longer and more complex to understand - I believe it also negated an important observation: The "api_error" exception type is special. It is not an exception created by code for other code. It is not meant to be caught in Alternator code. Instead, it is supposed to contain an error message created for the user, containing one of the few supported exception exception "names" described in the DynamoDB documentation, and a user-readable text message. Throwing such an exception in Alternator code means the thrower wants the request to abort immediately, and this message to reach the user. These exceptions are not designed to be caught in Alternator code. Code should use other exceptions - or alternatives to exceptions (e.g., std::optional) for problems that should be handled before returning a different error to the user. Moreover, "api_error" isn't just thrown as an exception - it can also be returned-by-value in a executor::request_return_type) - which is another reason why it should not be subclassed. For these reasons, I believe we should have a single api_error type, and it's wrong to subclass it. So in this patch I am reverting the subclasses and template added in the aforementioned patch. Still, one correct observation made in that patch was that it is inconvenient to type in DynamoDB exception names (no help from the editor in completing those strings) and also error-prone. In this patch we propse a different - simpler - solution to the same problem: We add trivial factory functions, e.g., api_error::validation(std::string) as a shortcut to api_error("ValidationException"). The new implementation is easy to understand, and also more self explanatory to readers: It is now clear that "api_error::validation()" is actually a user-visible "api_error", something which was obscured by the name validation_exception() used before this patch. Finally, this patch also improves the comment in error.hh explaining the purpose of api_error and the fact it can be returned or thrown. The fact it should not be subclassed is legislated with a "finally". There is also no point of this class inheriting from std::exception or having virtual functions, or an empty constructor - so all these are dropped as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-07-23 15:36:39 +03:00
Avi Kivity	01b838e291	Merge "Unregister RPC verbs on stop" from Pavel E " There are 5 services, that register their RPC handlers in messaging service, but quite a few of them unregister them on stop. Unregistering is somewhat critical, not just because it makes the code look clean, but also because unregistration does wait for the message processing to complete, thus avoiding use-after-free's in the handlers. In particular, several handlers call service::get_schema_for_write() which, in turn, may end up in service::maybe_sync() calling for the local migration manager instance. All those handlers' processing must be waited for before stopping the migration manager. The set brings the RPC handlers unregistration in sync with the registration part. tests: unit (dev) dtest (dev: simple_boot_shutdown, repair) start-stop by hands (dev) fixes: #6904 " * 'br-rpc-unregister-verbs' of https://github.com/xemul/scylla: main: Add missing calls to unregister RPC hanlers messaging: Add missing per-service unregistering methods messaging: Add missing handlers unregistration helpers streaming: Do not use db->invoke_on_all in vain storage_proxy: Detach rpc unregistration from stop main: Shorten call to storage_proxy::init_messaging_service	2020-07-23 12:03:49 +03:00
Pekka Enberg	f9092bc4fc	Replace MAINTAINERS with CODEOWNERS Replace the MAINTAINERS file with a CODEOWNERS file, which Github is able to parse, and suggest reviewers for pull requests.	2020-07-23 09:25:40 +03:00
Asias He	55271f714e	gossip: Do not talk to seed node explicitly Currently, we talk to a seed node in each gossip round with some probability, i.e., nr_of_seeds / (nr_of_live_nodes + nr_of_unreachable_nodes) For example, with 5 seeds in a 50 nodes cluster, the probability is 0.1. Now that we talk to all live nodes, including the seed nodes, in a bounded time period. It is not a must to talk to seed node in each gossip round. In order to get rid of the seed concept, do not talk to seed node explicitly in each gossip round. This patch is a preparatory patch to remove the seed concept in gossip. Refs: #6845 Tests: update_cluster_layout_tests.py	2020-07-23 14:24:06 +08:00
Asias He	8e219e10e7	gossip: Talk to live endpoints in a shuffled fashion Currently, we select 10 percent of random live nodes to talk with in each gossip round. There is no upper bound how long it will take to talk to all live nodes. This patch changes the way we select live nodes to talk with as below: 1) Shuffle all the live endpoints randomly 2) Split the live endpoints into 10 groups 3) Talk to one of the groups in each gossip round 4) Go to step 1 to shuffle again after we groups are talked with We keep both randomness of selecting nodes as before and determinacy to complete talking to all live nodes. In addition, the way to favor newly added node is simplified. When a new live node is added, it is always added to the front of the group, so it will be talked with in the next gossip round. This patch is a preparatory patch to remove the seed concept in gossip. Refs: #6845 Tests: update_cluster_layout_tests.py	2020-07-23 14:23:59 +08:00
Pekka Enberg	39885dbdc8	Update MAINTAINERS	2020-07-23 09:03:39 +03:00
Avi Kivity	b4b9deadf3	build: install jmx and tools-java submodule dependencies Let each submodule be responsible for its own dependencies, and call the submodule's dependency installation script. Reviewed-by: Piotr Jastrzebski <piotr@scylladb.com> Reviewed-by: Takuya ASADA <syuu@scylladb.com>	2020-07-22 20:13:50 +03:00
Avi Kivity	7fbe50a4e4	build: remove pystache from install-dependencies As of `d6165bc1c3` we do not depend on pystache, so don't install it. Reviewed-by: Takuya ASADA <syuu@scylladb.com>	2020-07-22 20:12:31 +03:00
Avi Kivity	19da4a5b8f	build: don't package tools/java and tools/jmx in relocatable pacakge tools/java and tools/jmx have their own relocatable packages (and rpm/deb), so they should not be part of the main relocatable package. Enforce this by enabling the filter parameter in reloc_add, and passing a filter that excludes tools/java and tools/jmx.	2020-07-22 20:03:18 +03:00
Avi Kivity	98a22e572a	dist: redhat: reduce log spam from unpacking sources when building rpm rpmbuild defaults to logging the name of every file it unpacks from the archive. Make it quiet with the %setup -q flag.	2020-07-22 20:02:04 +03:00
Rafael Ávila de Espíndola	87b261ab32	sstables: Rename _writer to _compaction_writer Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-22 08:15:55 -07:00
Rafael Ávila de Espíndola	97b7fee78e	sstables: Move compaction_write_monitor to compaction_writer There is one monitor per writer, so we new keep them together in the compaction_writer struct. This trivially guarantees that the monitor is always destroyed before the writer. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-22 08:15:53 -07:00
Rafael Ávila de Espíndola	f8cc582e4a	sstables: Add couple of writer() getters to garbage_collected_sstable_writer This just reduces the noise of an upcoming patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-22 07:46:05 -07:00
Rafael Ávila de Espíndola	c740c66840	sstables: Move compaction_write_monitor earlier in the file This will used by followup patches. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-22 07:46:05 -07:00
Pavel Emelyanov	50d07696e4	main: Add missing calls to unregister RPC hanlers The gossiper's and migration_manager's unregistration is done on the services' stopm, for the rest we need to call the recently introduced methods. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-22 16:35:07 +03:00
Pavel Emelyanov	5060063cd6	messaging: Add missing per-service unregistering methods 5 services register handlers in messaging, but not all of them have clear unregistration methods. Summary: migration_manager: everything is in place, no changes gossiper: ditto proxy: some verbs unregistration is missing repair: no unregistration at all streaming: ditto This patch adds the needed unregistration methods. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-22 16:34:00 +03:00
Pavel Emelyanov	7a7b1b3108	messaging: Add missing handlers unregistration helpers Handlers for each verb have both -- register and unregister helpers, but unregistration ones for some verbs are missing, so here they are. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-22 16:31:57 +03:00
Pavel Emelyanov	08e36ca77c	streaming: Do not use db->invoke_on_all in vain The db instance is not needed to initialize messages, so use plain smp::invoke_on_all Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-22 16:31:57 +03:00
Pavel Emelyanov	f845a78d9a	storage_proxy: Detach rpc unregistration from stop The proxy's stop method is not called (and unlikely will be soon), but stopping the message handlers is needed now, so prepare the existing method for this.' Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-22 16:31:57 +03:00
Pavel Emelyanov	cc070ceca0	main: Shorten call to storage_proxy::init_messaging_service Just for brevity Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-22 16:31:57 +03:00
Kamil Braun	12e2891c60	cdc: if ring_delay == 0, don't add delay to newly created generation If ring_delay == 0, something fishy is going on, e.g. single-node tests are being performed. In this case we want the CDC generation to start operating immediately. There is no need to wait until it propagates to the cluster. You should not use ring_delay == 0 in production. Fixes https://github.com/scylladb/scylla/issues/6864.	2020-07-22 16:06:09 +03:00
Avi Kivity	5e1fa13d08	Merge 'docker: Make I/O configuration setup configurable' from Pekka " This adds a '--io-setup N' command line option, which users can pass to specify whether they want to run the "scylla_io_setup" script or not. This is useful if users want to specify I/O settings themselves in environments such as Kubernetes, where running "iotune" is problematic. While at it, add the same option to "scylla_setup" to keep the interface between that script and Docker consistent. Fixes #6587 " * penberg-penberg/docker-no-io-setup: scylla_setup: Add '--io-setup ENABLE' command line option dist/docker: Add '--io-setup ENABLE' command line option	2020-07-22 14:17:53 +03:00
Rafael Ávila de Espíndola	e83e91e352	alternator: Fix use after return Avoid a copy of timeout so that we don't end up with a reference to a stack allocated variable. Fixes #6897 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200721184939.111665-1-espindola@scylladb.com>	2020-07-21 22:06:13 +03:00
Rafael Ávila de Espíndola	e15c8ee667	Everywhere: Explicitly instantiate make_lw_shared seastar::make_lw_shared has a constructor taking a T&&. There is no such constructor in std::make_shared: https://en.cppreference.com/w/cpp/memory/shared_ptr/make_shared This means that we have to move from make_lw_shared(T(...) to make_lw_shared<T>(...) If we don't want to depend on the idiosyncrasies of seastar::make_lw_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-21 10:33:49 -07:00
Rafael Ávila de Espíndola	efeaded427	Everywhere: Add a make_shared_schema helper This replaces a lot of make_lw_shared(schema(...)) with make_shared_schema(...). This makes it easier to drop a dependency on the differences between seastar::make_shared and std::make_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-21 10:33:49 -07:00
Rafael Ávila de Espíndola	ad6d65dbbd	Everywhere: Explicitly instantiate make_shared seastar::make_shared has a constructor taking a T&&. There is no such constructor in std::make_shared: https://en.cppreference.com/w/cpp/memory/shared_ptr/make_shared This means that we have to move from make_shared(T(...) to make_shared<T>(...) If we don't want to depend on the idiosyncrasies of seastar::make_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-21 10:33:49 -07:00
Rafael Ávila de Espíndola	abba521199	cql3: Add a create_multi_column_relation helper This moves a few calls to make_shared to a single location. This makes it easier to drop a dependency on the differences between seastar::make_shared and std::make_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-21 10:33:49 -07:00
Rafael Ávila de Espíndola	8858873d85	main: Return a shared_ptr from defer_verbose_shutdown This moves a few calls to make_shared to a single location. This makes it easier to drop a dependency on the differences between seastar::make_shared and std::make_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-21 10:33:44 -07:00
Avi Kivity	098d24fd6d	Update seastar submodule * seastar 4a99d56453...02ad74fa7d (5): > TLS: Use "known" (precalculated) DH parameters if available > tutorial: fix advanced service_loop examples > tutorial: further fix service_loop example text > linux-aio: make the RWF_NOWAIT support work again > locking_test: Fix a use after return	2020-07-21 19:08:36 +03:00
Avi Kivity	5ead33d486	Update tools/jmx and tools/java submodules * tools/java 113c7d993b...a9480f3a87 (3): > reloc/build_deb.sh: Fix extra whitespace in"mv" command path > README.md: Document repository purpose for Scylla > reloc: Add "--builddir" option to build_{rpm,deb}.sh * tools/jmx aa94fe5...c0d9d0f (2): > add build/ to gitignore > reloc: Add "--builddir" option to build_{rpm,deb}.sh	2020-07-21 15:33:54 +03:00
Pekka Enberg	0b8c9668e3	scylla_setup: Add '--io-setup ENABLE' command line option To make the "scylla_setup" interface similar to Docker image, let's add a "--io-setup ENABLE" command line option. The old "--no-io-setup" option is retained for compatibility.	2020-07-21 14:48:01 +03:00
Pekka Enberg	fc1851cdc1	dist/docker: Add '--io-setup ENABLE' command line option This adds a '--io-setup N' command line option, which users can pass to specify whether they want to run the "scylla_io_setup" script or not. This is useful if users want to specify I/O settings themselves in environments such as Kubernetes, where running "iotune" is problematic. Fixes #6587	2020-07-21 14:42:46 +03:00
Rafael Ávila de Espíndola	bc20b71e6a	configure: Don't use pkg-config for xxhash The pkg-config for xxhash points to the wrong directory. I reported https://bugzilla.redhat.com/show_bug.cgi?id=1858407 But xxhash is such a simple library that it is trivial to avoid pkg-config. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200717204344.601729-1-espindola@scylladb.com>	2020-07-20 21:51:23 +03:00
Botond Dénes	929cdd3a15	test/boost: view_build_test: add test_view_update_generator_buffering To exercise the new buffering and pausing logic of the view updating consumer.	2020-07-20 14:32:45 +03:00
Botond Dénes	e316796b3f	test/boost: view_build_test: add test test_view_update_generator_deadlock A test case which reproduces the view update generator hang, where the staging reader consumes all resources and leaves none for the pre-image reader which blocks on the semaphore indefinitely.	2020-07-20 14:32:13 +03:00
Pekka Enberg	9d183aed2d	scripts: Fix submodule names in refresh-submodules.sh The submodules were moved under tools/jmx and tools/java. Message-Id: <20200720112447.754850-1-penberg@scylladb.com>	2020-07-20 14:28:39 +03:00
Asias He	28f8798464	repair: Do not use libfmt format specifiers if not needed We recently saw a weird log message: WARN 2020-07-19 10:22:46,678 [shard 0] repair - repair id [id=4, uuid=0b1092a1-061f-4691-b0ac-547b281ef09d] failed: std::runtime_error ({shard 0: fmt::v6::format_error (invalid type specifier), shard 1: fmt::v6::format_error (invalid type specifier)}) It turned out we have: throw std::runtime_error(format("repair id {:d} on shard {:d} failed to repair {:d} sub ranges", id, shard, nr_failed_ranges)); in the code, but we changed the id from integer to repair_uniq_id class. We do not really need to specify the format specifiers for numbers. Fixes #6874	2020-07-20 12:52:36 +03:00
Botond Dénes	e5db1ce785	reader_permit: reader_resources: add operator- and operator+ In addition to the already available operator+= and operator-=.	2020-07-20 11:23:39 +03:00
Botond Dénes	aabbdc34ac	reader_concurrency_semaphore: add initial_resources() To allow tests to reliably calculate the amount of resources they need to consume in order to effectively reduce the resources of the semaphore to a desired amount. Using `available_resources()` is not reliable as it doesn't factor in resources that are consumed at the moment but will be returned later. This will also benefit debugging coredumps where we will now be able to tell how much resources the semaphore was created with and this calculate the amount of memory and count currently used.	2020-07-20 11:23:39 +03:00
Botond Dénes	f264d2b00f	test: cql_test_env: allow overriding database_config	2020-07-20 11:23:39 +03:00
Botond Dénes	5de0afdab7	mutation_reader: expose new_reader_base_cost So that test code can use it.	2020-07-20 11:23:39 +03:00
Botond Dénes	566e31a5ac	db/view: view_updating_consumer: allow passing custom update pusher So that tests can test the `view_update_consumer` in isolation, without having to set up the whole database machinery. In addition to less infrastructure setup, this allows more direct checking of mutations pushed for view generation.	2020-07-20 11:23:39 +03:00
Botond Dénes	0166f97096	db/view: view_update_generator: make staging reader evictable The view update generation process creates two readers. One is used to read the staging sstables, the data which needs view updates to be generated for, and another reader for each processed mutation, which reads the current value (pre-image) of each row in said mutation. The staging reader is created first and is kept alive until all staging data is processed. The pre-image reader is created separately for each processed mutation. The staging reader is not restricted, meaning it does not wait for admission on the relevant reader concurrency semaphore, but it does register its resource usage on it. The pre-image reader however is restricted. This creates a situation, where the staging reader possibly consumes all resources from the semaphore, leaving none for the later created pre-image reader, which will not be able to start reading. This will block the view building process meaning that the staging reader will not be destroyed, causing a deadlock. This patch solves this by making the staging reader restricted and making it evictable. To prevent thrashing -- evicting the staging reader after reading only a really small partition -- we only make the staging reader evictable after we have read at least 1MB worth of data from it.	2020-07-20 11:23:39 +03:00
Botond Dénes	84357f0722	db/view: view_updating_consumer: move implementation from table.cc to view.cc table.cc is a very counter-intuitive place for view related stuff, especially if the declarations reside in `db/view/`.	2020-07-20 11:23:39 +03:00
Botond Dénes	cd849ed40d	database: add make_restricted_range_sstable_reader() A variant of `make_range_sstable_reader()` that wraps the reader in a restricting reader, hence making it wait for admission on the read concurrency semaphore, before starting to actually read.	2020-07-20 11:23:39 +03:00
Raphael S. Carvalho	b67066cae2	table: Fix Staging SSTables being incorrectly added or removed from the backlog tracker Staging SSTables can be incorrectly added or removed from the backlog tracker, after an ALTER TABLE or TRUNCATE, because the add and removal don't take into account if the SSTable requires view building, so a Staging SSTable can be added to the tracker after a ALTER table, or removed after a TRUNCATE, even though not added previously, potentially causing the backlog to become negative. Fixes #6798. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200716180737.944269-1-raphaelsc@scylladb.com>	2020-07-20 10:57:38 +03:00
Nadav Har'El	e0693f19d0	alternator test: produce newer xunit format for test results test.py passes the "--junit-xml" option to test/alternator/run, which passes this option to pytest to get an xunit-format summary of the test results. However, unfortunately until very recent versions (which aren't yet in Linux distributions), pytest defaulted to a non-standard xunit format which tools like Jenkins couldn't properly parse. The more standard format can be chosen by passing the option "-o junit_family=xunit2", so this is what we do in this patch. Fixes #6767 (hopefully). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200719203414.985340-1-nyh@scylladb.com>	2020-07-20 09:24:50 +03:00
Avi Kivity	5371be71e9	Merge "Reduce fanout of some mutation-related headers" from Pavel E " The set's goal is to reduce the indirect fanout of 3 headers only, but likely affects more. The measured improvement rates are flat_mutation_reader.hh: -80% mutation.hh : -70% mutation_partition.hh : -20% tests: dev-build, 'checkheaders' for changed headers (the tree-wide fails on master) " * 'br-debloat-mutation-headers' of https://github.com/xemul/scylla: headers:: Remove flat_mutation_reader.hh from several other headers migration_manager: Remove db/schema_tables.hh inclustion into header storage_proxy: Remove frozen_mutation.hh inclustion storage_proxy: Move paxos/*.hh inclusions from .hh to .cc storage_proxy: Move hint_wrapper from .hh to .cc headers: Remove mutation.hh from trace_state.hh	2020-07-19 19:47:59 +03:00
Rafael Ávila de Espíndola	9fd2682bfd	restrictions_test: Fix use after return The query_options constructor captures a reference to the cql_config. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200718013221.640926-1-espindola@scylladb.com>	2020-07-19 15:44:38 +03:00
Takuya ASADA	c99f31f770	scylla_setup: abort RAID disk prompt when no free disks available Fixes #6860	2020-07-19 14:48:59 +03:00
Eliran Sinvani	b97f466438	schema: take into account features when converting a table creation to schema_mutations When upgrading from a version that lacks some schema features, during the transition, when we have a mixed cluster. Schema digests are calculated without taking into account the mixed cluster supported features. Every node calculate the digest as if the whole cluster supports its supported features. Scylla already has a mechanism of redaction to the lowest common denominator, but it haven't been used in this context. This commit is using the redaction mechanism when calculating the digest on the newly added table so it will match the supported features of the whole cluster. Tests: Manual upgrading - upgraded to a version with an additional feature and additional schema column and validated that the digest of the tables schema is identical on every node on the mixed cluster.	2020-07-19 10:30:51 +03:00
Avi Kivity	e4deaaced3	Update tools/java submodule * tools/java 3eca0e3511...113c7d993b (1): > dist: redhat: reduce log spam from unpacking sources when building rpm	2020-07-18 12:07:57 +03:00
Pavel Emelyanov	92f58f62f2	headers:: Remove flat_mutation_reader.hh from several other headers All they can live with forward declaration of the f._m._r. plus a seastar header in commitlog code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:54:47 +03:00
Pavel Emelyanov	8618a02815	migration_manager: Remove db/schema_tables.hh inclustion into header The schema_tables.hh -> migration_manager.hh couple seems to work as one of "single header for everyhing" creating big blot for many seemingly unrelated .hh's. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:54:43 +03:00
Pavel Emelyanov	a80403e8f3	storage_proxy: Remove frozen_mutation.hh inclustion Nothing in it requres the needed classes any longer, forward declarations are enough. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:47:30 +03:00
Pavel Emelyanov	6174252282	storage_proxy: Move paxos/*.hh inclusions from .hh to .cc The storage_proxy.hh can live with forward declarations of paxos classes it refers to. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:44:02 +03:00
Pavel Emelyanov	3df4f3078f	storage_proxy: Move hint_wrapper from .hh to .cc It's only used there, but requires mutation_query.hh, which can thus be removed from storage_proxy.hh Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:40:25 +03:00
Pavel Emelyanov	757a7145b9	headers: Remove mutation.hh from trace_state.hh Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:40:23 +03:00
Nadav Har'El	3b5122fd04	alternator test: fix warning message in test_streams.py In test_streams.py, we had the line: assert desc['StreamDescription'].get('StreamLabel') In Alternator, the 'StreamLabel' attribute is missing, which the author of this test probably thought would cause this test to fail (which is expected, the test is marked with "xfail"). However, my version of pytest actually doesn't like that assert is given a value instead of a comparison, and we get the warning message: PytestAssertRewriteWarning: asserting the value None, please use "assert is None" I think that the nicest replacement for this line is assert 'StreamLabel' in desc['StreamDescription'] This is very readable, "pythonic", and checks the right thing - it checks that the JSON must include the 'StreamLabel' item, as the get() assertion was supposed to have been doing. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200716124621.906473-1-nyh@scylladb.com>	2020-07-17 14:36:23 +03:00
Rafael Ávila de Espíndola	9fe4dc91d7	sstables: Move noop_write_monitor to a .cc file There is no need to expose a type that is only used via a virtual interface. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200717021215.545525-1-espindola@scylladb.com>	2020-07-17 11:59:03 +03:00
Rafael Ávila de Espíndola	66d866427d	sstable_datafile_test: Use BOOST_REQUIRE_EQUAL This only works for types that can be printed, but produces a better error message if the check fails. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200716232700.521414-1-espindola@scylladb.com>	2020-07-17 11:58:58 +03:00
Rafael Ávila de Espíndola	c5405a5268	managed_bytes: Delete dead 'if' If external is true, _u.ptr is not null. An empty managed_bytes uses the internal representation. The current code looks scary, since it seems possible that backref would still point to the old location, which would invite corruption when the reclaimer runs. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200716233124.521796-1-espindola@scylladb.com>	2020-07-17 11:58:53 +03:00
Avi Kivity	0ae770da35	Update seastar submodule * seastar 0fe32ec59...4a99d5645 (3): > httpd: Don't warn on ECONNABORTED > httpd: Avoid calling future::then twice on the same future Fixes #6709. > futures: Add a test for a broken promise in repeat	2020-07-17 08:42:26 +03:00
Rafael Ávila de Espíndola	44cf4d74cd	build: Put test.py invocations in the console pool Ninja has a special pool called console that causes programs in that pool to output directly to the console instead of being logged. By putting test.py in it is now possible to run just $ ninja dev-test And see the test.py output while it is running. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200716204048.452082-1-espindola@scylladb.com>	2020-07-17 00:33:10 +03:00
Benny Halevy	3ab1d9fe1d	commitlog: use seastar::with_file_close_on_failure `close_on_failure` was committed to seastar so use the library version. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-16 20:32:32 +03:00
Benny Halevy	742298fa2a	commitlog: descriptor: make nothrow move constructible inherit from sstring nothrow move constructor. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-16 20:32:22 +03:00
Benny Halevy	54c5583b8d	commitlog: allocate_segment_ex, segment: pass descriptor by value Besdies being more robust than passing const descriptor& to continuations, this helps simplify making allocate_segment_ex's continuations nothrow_move_constructible, that is need for using seastar::with_file_close_on_failure(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-16 20:31:12 +03:00
Benny Halevy	22c384c2e9	commitlog: allocate_segment_ex: filename capture is unused Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-16 20:23:57 +03:00
Raphael S. Carvalho	09d3a35438	Update MAINTAINERS Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200716142642.918204-1-raphaelsc@scylladb.com>	2020-07-16 17:29:41 +03:00
Avi Kivity	7bf51b8c6c	Merge 'Distinguish single-column expressions in AST' from Dejan " Fix #6825 by explicitly distinguishing single- from multi-column expressions in AST. Tests: unit (dev), dtest secondary_indexes_test.py (dev) " * dekimir-single-multiple-ast: cql3/restrictions: Separate AST for single column cql3/restrictions: Single-column helper functions	2020-07-16 16:59:14 +03:00
Pavel Solodovnikov	5ff5df1afd	storage_proxy: un-hardcode force sync flag for `mutate_locally(mutation)` overload Corresponding overload of `storage_proxy::mutate_locally` was hardcoded to pass `db::commitlog::force_sync::no` to the `database::apply`. Unhardcode it and substitute `force_sync::no` to all existing call sites (as it were before). `force_sync::yes` will be used later for paxos learn writes when trying to apply mutations upgraded from an obsolete schema version (similar to the current case when applying locally a `frozen_mutation` stored in accepted proposal). Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200716124915.464789-1-pa.solodovnikov@scylladb.com>	2020-07-16 16:38:48 +03:00
Avi Kivity	0c7c255f94	Merge "compaction uuid for log and compaction_history" from Benny " We'd like to use the same uuid both for printing compaction log messages and to update compaction_history. Generate one when starting compaction and keep it in compaction_info. Then use it by convention in all compaction log messages, along with compaction type, and keyspace.table information. Finally, use the same uuid to update compaction_history. Fixes #6840 " * tag 'compaction-uuid-v1' of github.com:bhalevy/scylla: compaction: print uuid in log messages compaction: report_(start\|finish): just return description compaction: move compaction uuid generation to compaction_info	2020-07-16 16:38:48 +03:00
Dejan Mircevski	cc86d915ed	configure.py: $mode-test targets depend on scylla The targets {dev\|debug\|release}-test run all unit tests, including alternator/run. But this test requires the Scylla executable, which wasn't among the dependencies. Fix it by adding build/$mode/scylla to the dependency list. Fixes #6855. Tests: `ninja dev-test` after removing build/dev/scylla Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-16 16:38:48 +03:00
Piotr Dulikowski	e2462bce3b	cdc: fix a corner case inside get_base_table It is legal for a user to create a table with name that has a _scylla_cdc_log suffix. In such case, the table won't be treated as a cdc log table, and does not require a corresponding base table to exist. During refactoring done as a part of initial implemetation of of Alternator streams (#6694), `is_log_for_some_table` started throwing when trying to check a name like `X_scylla_cdc_log` when there was no table with name `X`. Previously, it just returned false. The exception originates inside `get_base_table`, which tries to return the base table schema, not checking for its existence - which may throw. It makes more sense for this function to return nullptr in such case (it already does when provided log table name does not have the cdc log suffix), so this patch adds an explicit check and returns nullptr when necessary. A similar oversight happened before (see #5987), so this patch also adds a comment which explains why existence of `X_scylla_cdc_log` does not imply existence of `X`. Fixes: #6852 Refs: #5724, #5987	2020-07-16 16:38:48 +03:00
Benny Halevy	eb1d558d00	compaction: print uuid in log messages By convention, print the following information in all compaction log messages: [{compaction.type} {keyspace}.{table} {compaction.uuid}] Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-16 13:55:23 +03:00
Benny Halevy	dec751cfbe	compaction: report_(start\|finish): just return description Rather than logging the message in the virtual callee method just return a string description and make the logger call in the common caller. 1. There is no need to do the logger call in the callee, it is simpler to format the log message in the the caller and just retrieve the per-compaction-type description. 2. Prepare to centrally print the compaction uuid. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-16 13:55:23 +03:00
Benny Halevy	e39fbe1849	compaction: move compaction uuid generation to compaction_info We'd like to use the same uuid both for printing compaction log messages and to update compaction_history. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-16 13:55:23 +03:00
Dejan Mircevski	0047e1e44d	cql3/restrictions: Separate AST for single column Existing AST assumes the single-column expression is a special case of multi-column expressions, so it cannot distinguish `c=(0)` from `(c)=(0)`. This leads to incorrect behaviour and dtest failures. Fix it by separating the two cases explicitly in the AST representation. Modify AST-creation code to create different AST for single- and multi-column expressions. Modify AST-consuming code to handle column_name separately from vector<column_name>. Drop code relying on cardinality testing to distinguisn single-column cases. Add a new unit test for `c=(0)`. Fixes #6825. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-16 12:27:25 +02:00
Dejan Mircevski	a23e43090f	cql3/restrictions: Single-column helper functions This commit is separated out for ease of review. It introduces some functions that subsequent commits will use, thereby reducing diff complexity of those subsequent commits. Because the new functions aren't invoked anywhere, they are guarded by `#if 0` to avoid unused-function errors. The new functions perform the same work as their existing namesakes, but assuming single-column expressions. The old versions continue to try handling single-column as a special case of multi-column, remaining unable to distinguish between `c = (1)` and `(c) = (1)`. This will change in the next commit, which will drop attempts to handle single-column cases from existing functions. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-16 11:25:30 +02:00
Asias He	4d7faac350	repair: Add uuid to a repair job Currently, repair uses an integer to identify a repair job. The repair id starts from 1 since node restart. As a result, different repair jobs will have same id across restart. To make the id more unique across restart, we can use an uuid in addition to the integer id. We can not drop the use of the integer id completely since the http api and nodetool use it. Fixes #6786	2020-07-16 11:03:19 +03:00
Pekka Enberg	8b5121ea0c	README.md: Add Slack and Twitter social banners Add social banners for Slack and Twitter in README that are easy to find for people new to the project. Fixes #6538 Message-Id: <20200716070449.630864-1-penberg@scylladb.com>	2020-07-16 10:55:15 +03:00
Pekka Enberg	ed71ebafe5	README.md: Improve contribution section The markdown syntax in the contribution section is incorrect, which is why the links appear on the same line. Improve the contribution section by making it more explicit what the links are about. Message-Id: <20200714070716.143768-1-penberg@scylladb.com>	2020-07-16 10:53:12 +03:00
Nadav Har'El	dcf9c888a2	alternator test: disable test_streams.py::test_get_records This test usually fails, with the following error. Marking it "xfail" until we can get to the bottom of this. dynamodb = dynamodb.ServiceResource() dynamodbstreams = <botocore.client.DynamoDBStreams object at 0x7fa91e72de80> def test_get_records(dynamodb, dynamodbstreams): # TODO: add tests for storage/transactionable variations and global/local index with create_stream_test_table(dynamodb, StreamViewType='NEW_AND_OLD_IMAGES') as table: arn = wait_for_active_stream(dynamodbstreams, table) p = 'piglet' c = 'ninja' val = 'lucifers' val2 = 'flowers' > table.put_item(Item={'p': p, 'c': c, 'a1': val, 'a2': val2}) test_streams.py:316: ... E botocore.exceptions.ClientError: An error occurred (Internal Server Error) when calling the PutItem operation (reached max retries: 3): Internal server error: std::runtime_error (cdc::metadata::get_stream: could not find any CDC stream (current time: 2020/07/15 17:26:36). Are we in the middle of a cluster upgrade?) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-07-16 08:24:25 +03:00
Nadav Har'El	61f52da9b1	merge: Alternator/CDC: Implement streams support Merged pull request https://github.com/scylladb/scylla/pull/6694 by Calle Wilund: Implementation of DynamoDB streams using Scylla CDC. Fixes #5065 Initial, naive implementation insofar that it uses 1:1 mapping CDC stream to DynamoDB shard. I.e. there are a lot of shards. Includes tests verified against both local DynamoDB server and actual AWS remote one. Note: Because of how data put is implemented in alternator, currently we do not get "proper" INSERT labels for first write of data, because to CDC it looks like an update. The test compensates for this, but actual users might not like it.	2020-07-16 08:18:25 +03:00
Nadav Har'El	c4497bf770	alternator test: enable experimental CDC In the script test/alternator/run, which runs Scylla for the Alternator tests, add the "--experimental-features=cdc" option, to allow us testing the streams API whose implementation requires the experimenal CDC feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-07-16 08:18:09 +03:00
Takuya ASADA	e52ae78f79	reloc: support unified relocatable package This introduce unified relocatable package, a single tarball to install all Scylla packages. Fixes #6626 See scylladb/scylla-pkg#1218	2020-07-15 20:29:31 +03:00
Avi Kivity	afd9b29627	Update tools/jmx and tools/java submodules * tools/java 50dbf77123...3eca0e3511 (1): > config: Avoid checking options and filtering scylla.yaml * tools/jmx 5820992...aa94fe5 (3): > dist: redhat: reduce log spam from unpacking sources when building rpm > Merge 'gitignore: fix typo and add scylla-apiclient/target/' from Benny > apiclient: Bump Jackson version to 2.10.4	2020-07-15 19:42:47 +03:00
Takuya ASADA	5da8784494	install.sh: support calling install.sh from other directory On .deb package with new relocatable package format, all files moved to under scylla/ directory. So we need to call ./scylla/install.sh on debian/rules, but it does not work correctly, since install.sh does not support calling from other directory. To support this, we need to changedir to scylla top directory before copying files.	2020-07-15 18:55:12 +03:00
Nadav Har'El	09a71ccd84	merge: cql3/restrictions: exclude NULLs from comparison in filtering Merge pull request https://github.com/scylladb/scylla/pull/6834 by Juliusz Stasiewicz: NULLs used to give false positives in GT, LT, GEQ and LEQ ops performed upon ALLOW FILTERING. That was a consequence of not distinguishing NULL from an empty buffer. This patch excludes NULLs on high level, preventing them from entering LHS of any comparison, i.e. it assumes that any binary operation should return false whenever the LHS operand is NULL (note: at the moment filters with RHS NULL, such as ...WHERE x=NULL ALLOW FILTERING, return empty sets anyway). Fixes #6295 * '6295-do-not-compare-nulls-v2' of github.com:jul-stas/scylla: filtering_test: check that NULLs do not compare to normal values cql3/restrictions: exclude NULLs from comparison in filtering	2020-07-15 18:32:14 +03:00
Takuya ASADA	9b5f28a2e3	scylla_raid_setup: fix incorrect block device path To use UUID, we need a tag "UUID=<uuid>". reference: https://www.freedesktop.org/software/systemd/man/systemd.mount.html reference: https://man7.org/linux/man-pages/man8/mount.8.html	2020-07-15 18:22:46 +03:00
Tomasz Grabiec	b8531fb885	Merge "Switch partitions cache from BST to B+tree & array" from Pavel E. The data model is now bplus::tree<Key = int64_t, T = array<entry>> where entry can be cache_entry or memtable_entry. The whole thing is encapsulated into a collection called "double_decker" from patch #3. The array<T> is an array of T-s with 0-bytes overhead used to resolve hash conflicts (patch #2). branch: tests: unit(debug) tests before v7: unit(debug) for new collections, memtable and row_cache unit(dev) for the rest perf(dev) * https://github.com/xemul/scylla/commits/row-cache-over-bptree-9: test: Print more sizes in memory_footprint_test memtable: Switch onto B+ rails row_cache: Switch partition tree onto B+ rails memtable: Count partitions separately token: Introduce raw() helper and raw comparator row-cache: Use ring_position_comparator in some places dht: Detach ring_position_comparator_for_sstables double-decker: A combination of B+tree with array intrusive-array: Array with trusted bounds utils: B+ tree implementation test: Move perf measurement helpers into header	2020-07-15 14:54:29 +02:00
Calle Wilund	3b74b9585f	cql3::lists: Fix setter_by_uuid not handing null value Fixes #6828 When using the scylla list index from UUID extension, null values were not handled properly causing throws from underlying layer.	2020-07-15 13:52:09 +02:00
Raphael S. Carvalho	7a728803f7	cql3/functions: protect against uninitialized value impl_count_function doesn't explicitly initialize _count, so its correctness depends on default initialization. Let's explicitly initialize _count to make the code future proof. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200714162604.64402-1-raphaelsc@scylladb.com>	2020-07-15 12:38:39 +03:00
Calle Wilund	ac4d0bb144	docs/alternator: Change the streams sectionSmall but updated blurb describing the state of streams inalternator/scylla.	2020-07-15 08:21:34 +00:00
Calle Wilund	76f6fe679a	alternator tests: Add streams test Small set of positive and negative tests of streams functionality. Verified against DynamoDB and Alternator.	2020-07-15 08:21:34 +00:00
Calle Wilund	cbb70f4af4	executor: "UpdateTable" support for streams Partial implementation of the "UpdateTable" command. Supports only enabling/disabling streams.	2020-07-15 08:21:34 +00:00
Calle Wilund	45ee73969d	executor: Allow streams specification in CreateTable schema	2020-07-15 08:21:34 +00:00
Calle Wilund	3376209718	cdc::schema: Make extensions expicitly settable from builder To make non-cql cdc schema options a reality.	2020-07-15 08:21:34 +00:00
Calle Wilund	bbc544748f	alternator: Implement GetRecords Simplistic variant, using 1:1 mapping of scylla stream id <-> shard	2020-07-15 08:21:34 +00:00
Calle Wilund	3756febbf5	alternator: expose describe_single_item and default_timeout To be able to describe single alternator items from other files. And query with the default timeout.	2020-07-15 08:10:23 +00:00
Calle Wilund	c45781de1e	alternator: Implement GetShardIterator	2020-07-15 08:10:23 +00:00
Calle Wilund	8084b5a9b7	alternator: Implement DescribeStream	2020-07-15 08:10:23 +00:00
Calle Wilund	8fb9b32bd3	alternator: Implement ListStreams command	2020-07-15 08:10:23 +00:00
Calle Wilund	811b531e2d	db::config: Add option to set streams confidence window Option to control the alternator streams CDC query/shard range time confidence interval, i.e. the period we enforce as timestamp threshold when reading. The default, 10s, should be sufficient on a normal cluster, but across DCs:, or with client timestamps or whatever, one might need a larger window.	2020-07-15 08:10:23 +00:00
Calle Wilund	a9641d4f02	system_distributed_keyspace: Add cdc topology/stream ids reader To read the full topology (with expired and expirations etc) from within.	2020-07-15 08:10:23 +00:00
Calle Wilund	0158f6473b	cdc: Add stream ids structure with time and expiration For reading the topology tables from within scylla.	2020-07-15 08:10:23 +00:00
Calle Wilund	331aa7c501	cdc: Add "is_cdc_metacolumn_name" predicate To sift column names	2020-07-15 08:10:23 +00:00
Calle Wilund	8a728ce618	cdc: Add get_base_table helper	2020-07-15 08:10:23 +00:00
Calle Wilund	8f462e8606	CDC::log: Add `base_name` helper To extract base table name from CDC log table name.	2020-07-15 08:10:23 +00:00
Calle Wilund	0708a9971a	executor: Add system_distributed_keyspace as parameter/member Streams implementation will require querying system tables etc to do its work, thus will need access to this object.	2020-07-15 08:10:23 +00:00
Calle Wilund	e382d79bcd	executor: Make some helper and subroutines class-visible Subroutines needed by (in this case) streams implementation moved from being file-static to class-static (exported). To make putting handler routines in separate sources possible. Because executor.cc is large and slow to compile. Separation is nice. Unfortunately, not all methods can be kept class-private, since unrelated types also use them. Reviewer suggested to instead place there is a top-level header for export, i.e. not class-private at all. I am skipping that for now, mainly because I can't come up with a good file name. Can be part of a generate refactor of helper routine organization in executor.	2020-07-15 08:10:23 +00:00
Calle Wilund	8a7b24dea1	alternator::error: Add exception overloads for Dynamo types Add types exception overloads for ValidationException, ResourceNotFoundException, etc, to avoid writing explicit error type as string everywhere (with the potential for spelling errors ever present). Also allows intellisense etc to complete the exception when coded.	2020-07-15 08:10:23 +00:00
Calle Wilund	699c4d2c7e	rjson: Add templated get/set overloads and optional get<T> To allow immediate json value conversion for types we have TypeHelper<...>:s for. Typed opt-get to get both automatic type conversion, _and_ find functionality in one call.	2020-07-15 08:10:23 +00:00
Calle Wilund	72ec525045	rjson: Add exception overloads To avoid copying error message composing, as well as forcing said code info rjson.cc. Also helps caller to determine fault by catch type.	2020-07-15 08:10:23 +00:00
Piotr Sarna	f1c1043701	README: update the alternator paragraph Since alternator is no longer experimental, its paragraph in README.md is rephrased to better reflect its current state. Message-Id: <a89eb70c4350e021ad9d6f684e49f94e4c735c19.1594792604.git.sarna@scylladb.com>	2020-07-15 09:27:35 +03:00
Juliusz Stasiewicz	c25398e8cf	filtering_test: check that NULLs do not compare to normal values Tested operators are: `<` and `>`. Tests all types of NULLs except `duration` because durations are explicitly not comparable. Values to compare against were chosen arbitrarily.	2020-07-14 15:37:17 +02:00
Pavel Emelyanov	f8ffc31218	test: Print more sizes in memory_footprint_test The row cache memory footprint changed after switch to B+ because we no longer have a sole cache_entry allocation, but also the bplus::data and bplus::node. Knowing their sizes helps analyzing the footprint changes. Also print the size of memtable_entry that's now also stored in B+'s data. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	4d2f5f93a4	memtable: Switch onto B+ rails The change is the same as with row-cache -- use B+ with int64_t token as key and array of memtable_entry-s inside it. The changes are: Similar to those for row_cache: - compare() goes away, new collection uses ring_position_comparator - insertion and removal happens with the help of double_decker, most of the places are about slightly changed semantics of it - flags are added to memtable_entry, this makes its size larger than it could be, but still smaller than it was before Memtable-specific: - when the new entry is inserted into tree iterators _might_ get invalidated by double-decker inner array. This is easy to check when it happens, so the invalidation is avoided when possible - the size_in_allocator_without_rows() is now not very precise. This is because after the patch memtable_entries are not allocated individually as they used to. They can be squashed together with those having token conflict and asking allocator for the occupied memory slot is not possible. As the closest (lower) estimate the size of enclosing B+ data node is used Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	174b101a49	row_cache: Switch partition tree onto B+ rails The row_cache::partitions_type is replaced from boost::intrusive::set to bplus::tree<Key = int64_t, T = array_trusted_bounds<cache_entry>> Where token is used to quickly locate the partition by its token and the internal array -- to resolve hashing conflicts. Summary of changes in cache_entry: - compare's goes away as the new collection needs tri-compare one which is provided by ring_position_comparator - when initialized the dummy entry is added with "after_all_keys" kind, not "before_all_keys" as it was by default. This is to make tree entries sorted by token - insertion and removing of cache_entries happens inside double_decker, most of the changes in row_cache.cc are about passing constructor args from current_allocator.construct into double_decker.empace_before() - the _flags is extended to keep array head/tail bits. There's a room for it, sizeof(cache_entry) remains unchanged The rest fits smothly into the double_decker API. Also, as was told in the previous patch, insertion and removal _may_ invalidate iterators, but may leave them intact. However, currently this doesn't seem to be a problem as the cache_tracker ::insert() and ::on_partition_erase do invalidate iterators unconditionally. Later this can be otimized, as iterators are invalidated by double-decker only in case of hash conflict, otherwise it doesn't change arrays and B+ tree doesn't invalidate its. tests: unit(dev), perf(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	dff5eb6f25	memtable: Count partitions separately The B+ will not have constant-time .size() call, so do it by hands Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	ae28814b1c	token: Introduce raw() helper and raw comparator In next patches the entries having token on-board will be moved onto B+-tree rails. For this the int64_t value of the token will be used as B+ key, so prepare for this. One corner case -- the after_all_keys tokens must be resolved to int64::max value to appear at the "end" of the tree. This is not the same as "before_all_keys" case, which maps to the int64::min value which is not allowed for regular tokens. But for the sake of B+ switch this is OK, the conflicts of token raw values are explicitly resolved in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	7b2754cf5f	row-cache: Use ring_position_comparator in some places The row cache (and memtable) code uses own comparators built on top of the ring_position_comparator for collections of partitions. These collections will be switched from the key less-compare to the pair of token less-compare + key tri-compare. Prepare for the switch by generalizing the ring_partition_comparator and by patching all the non-collections usage of less-compare to use one. The memtable code doesn't use it outside of collections, but patch it anyway as a part of preparations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	1e15c06889	dht: Detach ring_position_comparator_for_sstables Next patches will generalize ring_position_comparator with templates to replace cache_entry's and memtable_entry's comparators. The overload of operator() for sstables has its own implementation, that differs from the "generic" one, for smoother generalization it's better to detach it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	cf1315cde5	double-decker: A combination of B+tree with array The collection is K:V store bplus::tree<Key = K, Value = array_trusted_bounds<V>> It will be used as partitions cache. The outer tree is used to quickly map token to cache_entry, the inner array -- to resolve (expected to be rare) hash collisions. It also must be equipped with two comparators -- less one for keys and full one for values. The latter is not kept on-board, but it required on all calls. The core API consists of just 2 calls - Heterogenuous lower_bound(search_key) -> iterator : finds the element that's greater or equal to the provided search key Other than the iterator the call returns a "hint" object that helps the next call. - emplace_before(iterator, key, hint, ...) : the call construct the element right before the given iterator. The key and hint are needed for more optimal algo, but strictly speaking not required. Adding an entry to the double_decker may result in growing the node's array. Here to B+ iterator's .reconstruct() method comes into play. The new array is created, old elements are moved onto it, then the fresh node replaces the old one. // TODO: Ideally this should be turned into the // template <typename OuterCollection, typename InnerCollection> // but for now the double_decker still has some intimate knowledge // about what outer and inner collections are. Insertion into this collection _may_ invalidate iterators, but may leave intact. Invalidation only happens in case of hashing conflict, which can be clearly seen from the hint object, so there's a good room for improvement. The main usage by row_cache (the find_or_create_entry) looks like cache_entry find_or_create_entry() { bound_hint hint; it = lower_bound(decorated_key, &hint); if (!hint.found) { it = emplace_before(it, decorated_key.token(), hint, <constructor args>) } return *it; } Now the hint. It contains 3 booleans, that are - match: set to true when the "greater or equal" condition evaluated to "equal". This frees the caller from the need to manually check whether the entry returned matches the search key or the new one should be inserted. This is the "!found" check from the above snippet. To explain the next 2 bools, here's a small example. Consider the tree containing two elements {token, partition key}: { 3, "a" }, { 5, "z" } As the collection is sorted they go in the order shown. Next, this is what the lower_bound would return for some cases: { 3, "z" } -> { 5, "z" } { 4, "a" } -> { 5, "z" } { 5, "a" } -> { 5, "z" } Apparently, the lower bound for those 3 elements are the same, but the code-flows of emplacing them before one differ drastically. { 3, "z" } : need to get previous element from the tree and push the element to it's vector's back { 4, "a" } : need to create new element in the tree and populate its empty vector with the single element { 5, "a" } : need to put the new element in the found tree element right before the found vector position To make one of the above decisions the .emplace_before would need to perform another set of comparisons of keys and elements. Fortunately, the needed information was already known inside the lower_bound call and can be reported via the hint. Said that, - key_match: set to true if tree.lower_bound() found the element for the Key (which is token). For above examples this will be true for cases 3z and 5a. - key_tail: set to true if the tree element was found, but when comparing values from array the bounding element turned out to belong to the next tree element and the iterator was ++-ed. For above examples this would be true for case 3z only. And the last, but not least -- the "erase self" feature. Which is given only the cache_entry pointer at hands remove it from the collection. To make this happen we need to make two steps: 1. get the array the entry sits in 2. get the b+ tree node the vectors sits in Both methods are provided by array_trusted_bounds and bplus::tree. So, when we need to get iterator from the given T pointer, the algo looks like - Walk back the T array untill hitting the head element - Call array_trusted_bounds::from_element() getting the array - Construct b+ iterator from obtained array - Construct the double_decker iterator from b+ iterator and from the number of "steps back" from above - Call double_decker::iterator.erase() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:29:53 +03:00
Pavel Emelyanov	eb70644c1c	intrusive-array: Array with trusted bounds A plain array of elements that grows and shrinks by constructing the new instance from an existing one and moving the elements from it. Behaves similarly to vector's external array, but has 0-bytes overhead. The array bounds (0-th and N-th elemements) are determined by checking the flags on the elements themselves. For this the type must support getters and setters for the flags. To remove an element from array there's also a nothrow option that drops the requested element from array, shifts the righter ones left and keeps the trailing unused memory (so called "train") until reconstruction or destruction. Also comes with lower_bound() helper that helps keeping the elements sotred and the from_element() one that returns back reference to the array in which the element sits. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:29:49 +03:00
Pavel Emelyanov	95f15ea383	utils: B+ tree implementation // The story is at // https://groups.google.com/forum/#!msg/scylladb-dev/sxqTHM9rSDQ/WqwF1AQDAQAJ This is the B+ version which satisfies several specific requirements to be suitable for row-cache usage. 1. Insert/Remove doesn't invalidate iterators 2. Elements should be LSA-compactable 3. Low overhead of data nodes (1 pointer) 4. External less-only comparator 5. As little actions on insert/delete as possible 6. Iterator walks the sorted keys The design, briefly is: There are 3 types of nodes: inner, leaf and data, inner and leaf keep build-in array of N keys and N(+1) nodes. Leaf nodes sit in a doubly linked list. Data nodes live separately from the leaf ones and keep pointers on them. Tree handler keeps pointers on root and left-most and right-most leaves. Nodes do _not_ keep pointers or references on the tree (except 3 of them, see below). changes in v9: - explicitly marked keys/kids indices with type aliases - marked the whole erase/clear stuff noexcept - disposers now accept object pointer instead of reference - clear tree in destructor - added more comments - style/readability review comments fixed Prior changes - Add noexcepts where possible - Restrict Less-comparator constraint -- it must be noexcept - Generalized node_id - Packed code for beging()/cbegin() - Unsigned indices everywhere - Cosmetics changes - Const iterators - C++20 concepts - The index_for() implmenetation is templatized the other way to make it possible for AVX key search specialization (further patching) - Insertion tries to push kids to siblings before split Before this change insertion into full node resulted into this node being split into two equal parts. This behaviour for random keys stress gives a tree with ~2/3 of nodes half-filled. With this change before splitting the full node try to push one element to each of the siblings (if they exist and not full). This slows the insertion a bit (but it's still way faster than the std::set), but gives 15% less total number of nodes. - Iterator method to reconstruct the data at the given position The helper creates a new data node, emplaces data into it and replaces the iterator's one with it. Needed to keep arrays of data in tree. - Milli-optimize erase() - Return back an iterator that will likely be not re-validated - Do not try to update ancestors separation key for leftmost kid This caused the clear()-like workload work poorly as compared to std:set. In particular the row_cache::invalidate() method does exactly this and this change improves its timing. - Perf test to measure drain speed - Helper call to collect tree counters - Fix corner case of iterator.emplace_before() - Clean heterogenous lookup API - Handle exceptions from nodes allocations - Explicitly mark places where the key is copied (for future) - Extend the tree.lower_bound() API to report back whether the bound hit the key or not - Addressed style/cleanness review comments Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:29:43 +03:00
Pekka Enberg	7ef50d7c71	configure.py: Don't install dependencies when building submodules Let's pass the "--nodeps" option to "build_reloc.sh" script of the submodules to avoid the build system running "sudo"... Reported-by: Piotr Sarna <sarna@scylladb.com> Reported-by: Pavel Emelyanov <xemul@scylladb.com> Tested-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200714114340.440781-1-penberg@scylladb.com>	2020-07-14 14:50:59 +03:00
Tomasz Grabiec	f20c77d0f8	Merge "Make handle_state_left more robust when tokens are empty" from Asias 1. storage_service: Make handle_state_left more robust when tokens are empty In case the tokens for the node to be removed from the cluster are empty, log the application_state of the leaving node to help understand why the tokens are empty and try to get the tokens from token_metadata. 2. token_metadata: Do not throw if empty tokens are passed to remove_bootstrap_tokens Gossip on_change callback calls storage_service::excise which calls remove_bootstrap_tokens to remove the tokens of the leaving node from bootstrap tokens. If empty tokens, e.g., due to gossip propagation issue as we saw in #6468, are passed to remove_bootstrap_tokens, it will throw. Since the on_change callback is marked as noexcept, such throw will cause the node to terminate which is an overkill. To avoid such error causing the whole cluster to down in worse cases, just log the tokens are empty passed to remove_bootstrap_tokens. Refs #6468	2020-07-14 13:19:45 +02:00
Asias He	116f6141d5	token_metadata: Fix incorrect log in update_normal_tokens Currently, when update_normal_tokens is called, a warning logged. Token X changing ownership from A to B It is not correct to log so because we can call update_normal_tokens against a temporary token_metadata object during topology calculation. Refs: #6437	2020-07-14 14:13:37 +03:00
Juliusz Stasiewicz	c69075bbef	cql3/restrictions: exclude NULLs from comparison in filtering NULLs used to give false positives in GT, LT, GEQ and LEQ ops performed upon `ALLOW FILTERING`. That was a consequence of not distinguishing NULL from an empty buffer. This patch excludes NULLs on high level, preventing them from entering any comparison, i.e. it assumes that any binary operator should return `false` whenever one of the operands is NULL (note: ATM filters such as `...WHERE x=NULL ALLOW FILTERING` return empty sets anyway). `restriction_test/regular_col_slice` had to be updated accordingly. Fixes #6295	2020-07-14 12:59:01 +02:00
Pekka Enberg	f0ae550553	configure.py: Add 'build' target for building artifats The default ninja build target now builds artifacts and packages. Let's add a 'build' target that only builds the artifacts. Message-Id: <20200714105042.416698-1-penberg@scylladb.com>	2020-07-14 13:55:32 +03:00
Pavel Emelyanov	9d38846ed2	test: Move perf measurement helpers into header To use the code in new perf tests in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 12:58:26 +03:00
Asias He	38d964352d	repair: Relax node selection in bootstrap when nodes are less than RF Consider a cluster with two nodes: - n1 (dc1) - n2 (dc2) A third node is bootstrapped: - n3 (dc2) The n3 fails to bootstrap as follows: [shard 0] init - Startup failed: std::runtime_error (bootstrap_with_repair: keyspace=system_distributed, range=(9183073555191895134, 9196226903124807343], no existing node in local dc) The system_distributed keyspace is using SimpleStrategy with RF 3. For the keyspace that does not use NetworkTopologyStrategy, we should not require the source node to be in the same DC. Fixes: #6744 Backports: 4.0 4.1, 4.2	2020-07-14 11:54:34 +02:00
Pekka Enberg	16baf98d67	README.md: Add project description This adds a short project description to README to make the git repository more discoverable. The text is an edited version of a Scylla blurb provided by Peter Corless. Message-Id: <20200714065726.143147-1-penberg@scylladb.com>	2020-07-14 11:28:43 +03:00
Asias He	271fac56a3	repair: Add synchronous API to query repair status This new api blocks until the repair job is either finished or failed or timeout. E.g., - Without timeout curl -X GET http://127.0.0.1:10000/storage_service/repair_status/?id=123 - With timeout curl -X GET http://127.0.0.1:10000/storage_service/repair_status/?id=123&timeout=5 The timeout is in second. The current asynchronous api returns immediately even if the repair is in progress. E.g., curl -X GET http://127.0.0.1:10000/storage_service/repair_async/ks?id=123 User can use the new synchronous API to avoid keep sending the query to poll if the repair job is finished. Fixes #6445	2020-07-14 11:20:15 +03:00
Amnon Heiman	186301aff8	per table metrics: change estimated_histogram to time_estimated_histogram This patch changes the per table latencies histograms: read, write, cas_prepare, cas_accept, and cas_learn. Beside changing the definition type and the insertion method, the API was changed to support the new metrics. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-07-14 11:17:43 +03:00
Amnon Heiman	ea8d52b11c	row_locking: change estimated histogram with time_estimated_histogram This patch changes the row locking latencies to use time_estimated_histogram. The change consist of changing the histogram definition and changing how values are inserted to the histogram. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-07-14 11:17:43 +03:00
Amnon Heiman	edd3c97364	alternator: change estimated_histogram to time_estimated_histogram This patch moves the alternator latencies histograms to use the time_estimated_histogram. The changes requires changing the defined type and use the simpler insertion method. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-07-14 11:17:43 +03:00
Takuya ASADA	a233b0ab3b	redis: add strlen command Add strlen command that returns string length of the key. see: https://redis.io/commands/strlen	2020-07-14 10:56:23 +03:00
Asias He	a00ab8688f	repair: Relax size check of get_row_diff and set_diff In case a row hash conflict, a hash in set_diff will get more than one row from get_row_diff. For example, Node1 (Repair master): row1 -> hash1 row2 -> hash2 row3 -> hash3 row3' -> hash3 Node2 (Repair follower): row1 -> hash1 row2 -> hash2 We will have set_diff = {hash3} between node1 and node2, while get_row_diff({hash3}) will return two rows: row3 and row3'. And the error below was observed: repair - Got error in row level repair: std::runtime_error (row_diff.size() != set_diff.size()) In this case, node1 should send both row3 and row3' to peer node instead of fail the whole repair. Because node2 does not have row3 or row3', otherwise node1 won't send row with hash3 to node1 in the first place. Refs: #6252	2020-07-14 10:39:30 +03:00
Nadav Har'El	8e3be5e7d6	alternator test: configurable temporary directory The test/alternator/run script creates a temporary directory for the Scylla database in /tmp. The assumption was that this is the fastest disk (usually even a ramdisk) on the test machine, and we didn't need anything else from it. But it turns out that on some systems, /tmp is actually a slow disk, so this patch adds a way to configure the temporary directory - if the TMPDIR environment variable exists, it is used instead of /tmp. As before this patch, a temporary subdirectry is created in $TMPDIR, and this subdirectory is automatically deleted when the test ends. The test.py script already passes an appropriate TMPDIR (testlog/$mode), which after this patch the Alternator test will use instead of /tmp. Fixes #6750 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200713193023.788634-1-nyh@scylladb.com>	2020-07-14 08:52:22 +03:00
Konstantin Osipov	e628da863d	Export TMPDIR pointing at subdir of testlog/ Export TMPDIR environment variable pointing at a subdir of testlog. This variable is used by seastar/scylla tests to create a a subdirectory with temporary test data. Normally a test cleans up the temporary directory, but if it crashes or is killed the directory remains. By resetting the default location from /tmp to testlog/{mode} we allow test.py we consolidate all test artefacts in a single place. Fixes #6062, "test.py uses tmpfs"	2020-07-13 22:22:43 +03:00
Avi Kivity	60c115add2	Update seastar submodule * seastar 5632cf2146...0fe32ec596 (11): > futures: Add a test for a broken promise in a parallel_for_each > future: Simplify finally_body implementation > futures_test: Extend nested_exception test > Merge "make gate methods noexcept" from Benny > tutorial: fix service_loop example > sharded: fix doxygen \example clause for sharded_parameter > Merge "future: Don't call need_preempt in 'then' and 'then_impl'" from Rafael > future: Refactor a bit of duplicated code > Merge "Add with_file helpers" from Benny > Merge "Fix doxygen warnings" from Benny > build: add doxygen to install-dependencies.sh	2020-07-13 20:19:42 +03:00
Juliusz Stasiewicz	d1dec3fcd7	cdc: Retry generation fetching after `read_failure_exception` While fetching CDC generations, various exceptions can occur. They are divided into "fatal" and "nonfatal", where "fatal" ones prevent retrying of the fetch operation. This patch makes `read_failure_exception` "non-fatal", because such error may appear during restart. In general this type of error can mean a few different things (e.g. an error code in a response from replica, but also a broken connection) so retrying seems reasonable. Fixes #6804	2020-07-13 18:17:45 +03:00
Pekka Enberg	d67f4dba1e	README.md: Consolidate Docker image build instructions Consolidate the Docker image build instructions into the "Building Scylla" section of the README instead of having it in a separate section in a different place of the file. Message-Id: <20200713132600.126360-1-penberg@scylladb.com>	2020-07-13 17:14:44 +03:00
Nadav Har'El	35f7048228	alternator: CreateTable with bad Tags shouldn't create a table Currently, if a user tries to CreateTable with a forbidden set of tags, e.g., the Tags list is too long or contains an invalid value for system:write_isolation, then the CreateTable request fails but the table is still created. Without the tag of course. This patch fixes this bug, and adds two test cases for it that fail before this patch, and succeed with it. One of the test cases is scylla_only because it checks the Scylla-specific system:write_isolation tag, but the second test case works on DynamoDB as well. What this patch does is to split the update_tags() function into two parts - the first part just parses the Tags, validates them, and builds a map. Only the second part actually writes the tags to the schema. CreateTable now does the first part early, before creating the table, so failure in parsing or validating the Tags will not leave a created table behind. Fixes #6809. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200713120611.767736-1-nyh@scylladb.com>	2020-07-13 17:14:44 +03:00
Pekka Enberg	c6116c36e0	configure.py: Remove obsolete "--with-osv" option The "--with-osv" option is has been a no-op since commit `cc17c44640` ("Move seastar to a submodule"). Let's remove it as obsolete. Message-Id: <20200713131333.125634-1-penberg@scylladb.com>	2020-07-13 17:14:44 +03:00
Nadav Har'El	21ae457e8a	test.py: print test durations When tests are run in parallel, it is hard to tell how much time each test ran. The time difference between consecutive printouts (indicating a test's end) says nothing about the test's duration. This patch adds in "--verbose" mode, at the end of each test result, the duration in seconds (in wall-clock time) of the test. For example, $ ./test.py --mode dev --verbose alternator ================================================================================ [N/TOTAL] TEST MODE RESULT ------------------------------------------------------------------------------ [1/2] boost/alternator_base64_test dev [ PASS ] 0.02s [2/2] alternator/run dev [ PASS ] 26.57s These durations are useful for recognizing tests which are especially slow, or runs where all the tests are unusually slow (which might indicate some sort of misconfiguration of the test machine). Fixes #6759 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200706142109.438905-1-nyh@scylladb.com>	2020-07-13 17:14:44 +03:00
Pekka Enberg	ace1b15ed6	configure.py: Make "dist" part of default target This adds a new "dist-<mode>" target, which builds the server package in selected build mode together with the other packages, and wires it to the "<mode>" target, which is built as part of default "ninja" invocation. This allows us to perform a full build, package, and test cycle across all build modes with: ./configure.py && ninja && ./test.py Message-Id: <20200713101918.117692-1-penberg@scylladb.com>	2020-07-13 17:14:44 +03:00
Takuya ASADA	e6e4359414	scylla_raid_setup: switch to systemd mount unit Since we already use systemd unit file for coredump bind mount and swapfile, we should move to systemd mount unit for data partition as well.	2020-07-13 17:14:44 +03:00
Pekka Enberg	c807c903ab	pull_github_pr.sh: Use "cherry-pick" for single-commit pull requests Improve the "pull_github_pr.sh" to detect the number of commits in a pull request, and use "git cherry-pick" to merge single-commit pull requests. Message-Id: <20200713093044.96764-1-penberg@scylladb.com>	2020-07-13 17:14:44 +03:00
Avi Kivity	d74582fbc5	move jmx/tools submodules to tools directory Move all package repositories to tools directory.	2020-07-13 17:14:14 +03:00
Avi Kivity	06341d2528	dist: fix debian generated files for non-default PRODUCT setting There are a bunch of renames that are done if PRODUCT is not the default, but the Python code for them is incorrect. Path.glob() is not a static method, and Path does not support .endswith(). Fix by constructing a Path object, and later casting to str.	2020-07-13 11:51:31 +03:00
Pekka Enberg	f2b4c1a212	scylla_prepare: Improve error message on missing CPU features Let's report each missing CPU feature individually, and improve the error message a bit. For example, if the "clmul" instruction is missing, the report looks as follows: ERROR: You will not be able to run Scylla on this machine because its CPU lacks the following features: pclmulqdq If this is a virtual machine, please update its CPU feature configuration or upgrade to a newer hypervisor. Fixes #6528	2020-07-13 11:39:29 +03:00
Pekka Enberg	bc053b3cfa	README.md: Add links to mailing lists and Slack Add links to the users and developers mailing lists, and the Slack channel in README.md to make them more discoverable. Message-Id: <20200713074654.90204-1-penberg@scylladb.com>	2020-07-13 10:48:55 +03:00
Pekka Enberg	df6a0ec5e5	README.md: Update build and run instructions Simplify the build and run instructions by splitting the text in three sections (prerequisites, building, and running) and streamlining the steps a bit. Message-Id: <20200713065910.84582-1-penberg@scylladb.com>	2020-07-13 10:04:12 +03:00
Pekka Enberg	5476efabb3	configure.py: Make output less verbose by default The configure.py script outputs the Seastar build command it executes: ['./cooking.sh', '-i', 'dpdk', '-d', '../build/release/seastar', '--', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DCMAKE_C_COMPILER=gcc', '-DCMAKE_CXX_COMPILER=g++', '-DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON', '-DSeastar_CXX_FLAGS=;-Wno-error=stack-usage=-ffile-prefix-map=/home/penberg/src/scylla/scylla=.;-march=westmere;-O3;-Wstack-usage=13312;--param;inline-unit-growth=300', '-DSeastar_LD_FLAGS=-Wl,--build-id=sha1,--dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 ', '-DSeastar_CXX_DIALECT=gnu++20', '-DSeastar_API_LEVEL=4', '-DSeastar_UNUSED_RESULT_ERROR=ON', '-DSeastar_DPDK=ON', '-DSeastar_DPDK_MACHINE=wsm'] The output is mostly useful for debugging the build process itself, so hide it behind a "--verbose" flag, and make it more human-readable while at it: ./cooking.sh \ -i \ dpdk \ -d \ ../build/release/seastar \ -- \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DCMAKE_C_COMPILER=gcc \ -DCMAKE_CXX_COMPILER=g++ \ -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON \ -DSeastar_CXX_FLAGS=;-Wno-error=stack-usage=-ffile-prefix-map=/home/penberg/src/scylla/scylla=.;-march=westmere;-O3;-Wstack-usage=13312;--param;inline-unit-growth=300 \ -DSeastar_LD_FLAGS=-Wl,--build-id=sha1,--dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 \ -DSeastar_CXX_DIALECT=gnu++20 \ -DSeastar_API_LEVEL=4 \ -DSeastar_UNUSED_RESULT_ERROR=ON \ -DSeastar_DPDK=ON \ -DSeastar_DPDK_MACHINE=wsm Message-Id: <20200713065509.83184-1-penberg@scylladb.com>	2020-07-13 09:57:38 +03:00
Botond Dénes	ef2c8f563b	scylla-gdb.py: scylla fiber: add suggestion for further investigation scylla fiber often fails to really unwind the entire fiber, stopping sooner than expected. This is expected as scylla fiber only recognizes the most standard continuations but can drop the ball as soon as there is an unusual transmission. This commits adds a message below the found tasks explaining that the list might not be exhaustive and prints a command which can be used to explain why the unwinding stopped at the last task. While at it also rephrase an out-of-date comment. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200710120813.100009-1-bdenes@scylladb.com>	2020-07-12 15:43:21 +03:00
Dejan Mircevski	29fccd76ea	cql/restrictions: Rename find_if to find_atom As requested in #5763 feedback, rename to avoid clashes with std::find_if and boost::find_if. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-12 14:12:30 +03:00
Dejan Mircevski	9dac9a25e5	cql/restrictions: Constrain find_if and count_if As requested in #5763 feedback, require that Fn be callable with binary_operator in the functions mentioned above. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-12 14:11:39 +03:00
Pavel Emelyanov	1331623465	test.py: Don't feed fail-on-abandoned-failed-futures to unit tests The problem is that this option is defined in seastar testing wrapper, while no unit tests use it, all just start themselves with app.run() and would complain on unknown option. "Would", because nowadays every single test in it declares its own options in suite.yaml, that override test.py's defaults. Once an option-less unit test is added (B+ tree ones) it will complain. The proposal is to remove this option from defaults, if any unit test will use the seastar testing wrappers and will need this option, it can add one to the suite.yaml. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200709084602.8386-1-xemul@scylladb.com>	2020-07-10 16:21:14 +02:00
Tomasz Grabiec	883ac4a78c	Merge "Some selective noexcept bombing" form Pavel E. The goal is to make the lambdas, that are fed into partition cache's clear_and_dispose() and erase_in_dispose(), to be noexcept. This is to satisfy B+, which strictly requires those to be noexcept (currently used collections don't care). The set covers not only the strictly required minimum, but also some other methods that happened to be nearby. * https://github.com/xemul/scylla/tree/br-noexcepts-over-the-row-cache: row_cache: Mark invalidation lambda as noexcept cache_tracker: Mark methods noexcept cache_entry: Mark methods noexcept region: Mark trivial noexcept methods as such allocation_strategy: Mark returning lambda as noexcept allocation_strategy: Mark trivial noexcept methods as such dht: Mark noexcept methods	2020-07-10 15:02:52 +02:00
Nadav Har'El	f549d147ea	alternator: fix Expected's "NULL" operator with missing AttributeValueList The "NULL" operator in Expected (old-style conditional operations) doesn't have any parameters, so we insisted that the AttributeValueList be empty. However, we forgot to allow it to also be missing - a possibility which DynamoDB allows. This patch adds a test to reproduce this case (the test passes on DyanmoDB, fails on Alternator before this patch, and succeeds after this patch), and a fix. Fixes #6816. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200709161254.618755-1-nyh@scylladb.com>	2020-07-10 07:45:02 +02:00
Benny Halevy	3ce86a7160	test: restrictions_test: set_contains: uncomment check depnding on #6797 Now that #6797 is fixed. Refs #5763 Cc: Dejan Mircevski <dejan@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Test: restrictions_test(debug) Message-Id: <20200709123703.955897-1-bhalevy@scylladb.com>	2020-07-09 17:56:09 +03:00
Benny Halevy	ec77777bda	bytes: compare_unsigned: do not pass nullptr to memcmp If any of the compared bytes_view's is empty consider the empty prefix is same and proceed to compare the size of the suffix. A similar issue exists in legacy_compound_view::tri_comparator::operator(). It too must not pass nullptr to memcmp if any of the compared byte_view's is empty. Fixes #6797 Refs #6814 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Test: unit(dev) Branches: all Message-Id: <20200709123453.955569-1-bhalevy@scylladb.com>	2020-07-09 17:54:46 +03:00
Nadav Har'El	9042161ba3	merge: cdc: better pre/postimages for complicated batches Merged pull request https://github.com/scylladb/scylla/pull/6741 by Piotr Dulikowski: This PR changes the algorithm used to generate preimages and postimages in CDC log. While its behavior is the same for non-batch operations (with one exception described later), it generates pre/postimages that are organized more nicely, and account for multiple updates to the same row in one CQL batch. Fixes #6597, #6598 Tests: - unit(dev), for each consecutive commit - unit(debug), for the last commit Previous method The previous method worked on a per delta row basis. First, the base table is queried for the current state of the rows being modified in the processed mutation (this is called the "preimage query"). Then, for each delta row (representing a modification of a row): If preimage is enabled and the row was already present in the table, a corresponding preimage row is inserted before the delta row. The preimage row contains data taken directly from the preimage query result. Only columns that are modified by the delta are included in the preimage. If postimage is enabled, then a postimage row is inserted after the delta row. The postimage row contains data which was a result of taking row data directly from the preimage query result and applying the change the corresponding delta row represented. All columns of the row are included in the postimage. The above works well for simple cases such like singular CQL INSERT, UPDATE, DELETE, or simple CQL BATCH-es. An example: cqlsh:ks> BEGIN UNLOGGED BATCH INSERT INTO tbl (pk, ck, v) VALUES (0, 1, 111); INSERT INTO tbl (pk, ck, v) VALUES (0, 2, 222); APPLY BATCH; cqlsh:ks> SELECT "cdc$batch_seq_no", "cdc$operation", "cdc$ttl", pk, ck, v from ks.tbl_scylla_cdc_log ; cdc$batch_seq_no \| cdc$operation \| cdc$ttl \| pk \| ck \| v ------------------+---------------+---------+----+----+----- ...snip... 0 \| 0 \| null \| 0 \| 1 \| 100 1 \| 2 \| null \| 0 \| 1 \| 111 2 \| 9 \| null \| 0 \| 1 \| 111 3 \| 0 \| null \| 0 \| 2 \| 200 4 \| 2 \| null \| 0 \| 2 \| 222 5 \| 9 \| null \| 0 \| 2 \| 222 Preimage rows are represented by cdc operation 0, and postimage by 9. Please note that all rows presented above share the same value of cdc$time column, which was not shown here for brevity. Problems with previous approach This simple algorithm has some conceptual and implementational problems which arise when processing more complicated CQL BATCH-es. Consider the following example: cqlsh:ks> BEGIN UNLOGGED BATCH INSERT INTO tbl (pk, ck, v1) VALUES (0, 0, 1) USING TTL 1000; INSERT INTO tbl (pk, ck, v2) VALUES (0, 0, 2) USING TTL 2000; APPLY BATCH; cqlsh:ks> SELECT "cdc$batch_seq_no", "cdc$operation", "cdc$ttl", pk, ck, v1, v2 FROM tbl_scylla_cdc_log; cdc$batch_seq_no \| cdc$operation \| cdc$ttl \| pk \| ck \| v1 \| v2 ------------------+---------------+---------+----+----+------+------ ...snip... 0 \| 0 \| null \| 0 \| 0 \| null \| 0 1 \| 2 \| 2000 \| 0 \| 0 \| null \| 2 2 \| 9 \| null \| 0 \| 0 \| 0 \| 2 3 \| 0 \| null \| 0 \| 0 \| 0 \| null 4 \| 1 \| 1000 \| 0 \| 0 \| 1 \| null 5 \| 9 \| null \| 0 \| 0 \| 1 \| 0 A single cdc group (corresponding to rows sharing the same cdc$time) might have more than one delta that modify the same row. For example, this happens when modifying two columns of the same row with different TTLs - due to our choice of CDC log schema, we must represent such change with two delta rows. It does not make sense to present a postimage after the first delta and preimage before the second - both deltas are applied simultaneously by the same CQL BATCH, so the middle "image" is purely imaginary and does not appear at any point in the table. Moreover, in this example, the last postimage is wrong - v1 is updated, but v2 is not. None of the postimages presented above represent the final state of the row. New algorithm The new algorithm works now on per cdc group basis, not delta row. When starting processing a CQL BATCH: Load preimage query results into a data structure representing current state of the affected rows. For each cdc group: For each row modified within the group, a preimage is produced, regardless if the row was present in the table. The preimage is calculated based on the current state. Only include columns that are modified for this row within the group. For each delta, produce a delta row and update the current state accordingly. Produce postimages in the same way as preimages - but include all columns for each row in the postimage. The new algorithm produces postimage correctly when multiple deltas affect one, because the state of the row is updated on the fly. This algorithm moves preimage and postimage rows to the beginning and the end of the cdc group, accordingly. This solves the problem of imaginary preimages and postimages appearing inside a cdc group. Unfortunately, it is possible for one CQL BATCH to contain changes that use multiple timestamps. This will result in one CQL BATCH creating multiple cdc groups, with different cdc$time. As it is impossible, with our choice of schema, to tell that those cdc groups were created from one CQL BATCH, instead we pretend as if those groups were separate CQL operations. By tracking the state of the affected rows, we make sure that preimage in later groups will reflect changes introduces in previous groups. One more thing - this algorithm should have the same results for singular CQL operations and simple CQL BATCH-es, with one exception. Previously, preimage not produced if a row was not present in the table. Now, the preimage row will appear unconditionally - it will have nulls in place of column values. * 'cdc-pre-postimage-persistence' of github.com:piodul/scylla: cdc: fix indentation cdc: don't update partition state when not needed cdc: implement pre/postimage persistence cdc: add interface for producing pre/postimages cdc: load preimage query result into partition state fields cdc: introduce fields for keeping partition state cdc: rename set_pk_columns -> allocate_new_log_row cdc: track batch_no inside transformer cdc: move cdc$time generation to transformer cdc: move find_timestamp to split.cc cdc: introduce change_processor interface cdc: remove redundant schema arguments from cdc functions cdc: move management of generated mutations inside transformer cdc: move preimage result set into a field of transformer cdc: keep ts and tuuid inside transformer cdc: track touched parts of mutations inside transformer cdc: always include preimage for affected rows	2020-07-09 16:55:55 +03:00
Pavel Emelyanov	bb32cff23d	row_cache: Mark invalidation lambda as noexcept It calls noexcept functions inside and handles the exception from throwing one itself Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:46:38 +03:00
Pavel Emelyanov	1346289151	cache_tracker: Mark methods noexcept All but few are trivially such. The clear_continuity() calls cache_entry::set_continuous() that had become noexcept a patch ago. The allocator() calls region.allocator() which had been marked noexcept few patches back. The on_partition_erase() calls allocator().invalidate_references(), both had been marked noexcept few patches back. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:44:17 +03:00
Pavel Emelyanov	d4ef845136	cache_entry: Mark methods noexcept All but one are trivially such, the position() one calls is_dummy_entry() which has become noexcept right now. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:41:43 +03:00
Pavel Emelyanov	3237796e00	region: Mark trivial noexcept methods as such Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:41:37 +03:00
Pavel Emelyanov	2c4a94aeab	allocation_strategy: Mark returning lambda as noexcept It just calls current_alloctor().destroy() which is noexcept Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:41:23 +03:00
Pavel Emelyanov	a497dfdd0b	allocation_strategy: Mark trivial noexcept methods as such Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:41:03 +03:00
Pavel Emelyanov	6d7ae4ead1	dht: Mark noexcept methods These are either trivially noexcept already, or call each-other, thus becoming noexcept too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-09 14:41:03 +03:00
Piotr Sarna	7ae3b25d8e	alternator: cleanup raw GetString() calls Instead of using raw GetString() from rapidjson, it's neater to use a helper for creating string views: rjson::to_string_view(). Message-Id: <3afda97403d4601c9600f6838f2028bfabd2f2f9.1594289250.git.sarna@scylladb.com>	2020-07-09 13:58:40 +03:00
Piotr Sarna	75dbaa0834	test: add alternator test for incorrect numeric values The test case is put inside test_manual_requests suite, because boto3 validates numeric inputs and does not allow passing arbitrary incorrect values. Tests: unit(dev), alternator(local, remote) Message-Id: <ac2baedc2ea61f0d857e7c01839f34cd15f7e02d.1594289250.git.sarna@scylladb.com>	2020-07-09 13:58:33 +03:00
Piotr Sarna	96426df72e	alternator: translate number errors to ValidationException In order to be consistent with returned error types, marshaling exceptions thrown from parsing big decimals are translated to ValidationException. Message-Id: <1446878cd63ad8291327a399cf700e4f402d108c.1594289250.git.sarna@scylladb.com>	2020-07-09 13:58:25 +03:00
Dejan Mircevski	d956233a80	cql_query_test: Drop get() on cquery_nofail result cquery_nofail returns the query result, not a future. Invoking .get() on its result is unnecessary. This just happened to compile because shared_ptr has a get() method with the same signature as future::get. Tests: cql_query_test unit test (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-09 13:52:52 +03:00
Nadav Har'El	8b3dac040a	alternator: add request headers to trace-level logging When "trace"-level logging is enabled for Alternator, we log every request, but currently only the request's body. For debugging, it is sometimes useful to also see the headers - which are important to debug authentication, for example. So let's print the headers as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200709103414.599883-1-nyh@scylladb.com>	2020-07-09 12:38:45 +02:00
Asias He	67f6da6466	repair: Switch to btree_set for repair_hash. In one of the longevity tests, we observed 1.3s reactor stall which came from repair_meta::get_full_row_hashes_source_op. It traced back to a call to std::unordered_set::insert() which triggered big memory allocation and reclaim. I measured std::unordered_set, absl::flat_hash_set, absl::node_hash_set and absl::btree_set. The absl::btree_set was the only one that seastar oversized allocation checker did not warn in my tests where around 300K repair hashes were inserted into the container. - unordered_set: hash_sets=295634, time=333029199 ns - flat_hash_set: hash_sets=295634, time=312484711 ns - node_hash_set: hash_sets=295634, time=346195835 ns - btree_set: hash_sets=295634, time=341379801 ns The btree_set is a bit slower than unordered_set but it does not have huge memory allocation. I do not measure real difference of total time to finish repair of the same dataset with unordered_set and btree_set. To fix, switch to absl btree_set container. Fixes #6190	2020-07-09 11:35:18 +03:00
Nadav Har'El	9ff9cd37c3	alternator test: tests for the number type We had some tests for the number type in Alternator and how it can be stored, retrieved, calculated and sorted, but only had rudementary tests for the allowed magnitude and precision of numbers. This patch creates a new test file, test_number.py, with tests aiming to check exactly the supported magnitudes and precision of numbers. These tests verify two things: 1. That Alternator's number type supports the full precision and magnitude that DynamoDB's number type supports. We don't want to see precision or magnitude lost when storing and retrieving numbers, or when doing calculations on them. 2. That Alternator's number type does not have better precision or magnitude than DynamoDB does. If it did, users may be tempted to rely on that implementation detail. The three tests of the first type pass; But all four tests of the second type xfail: Alternator currently stores numbers using big_decimal which has unlimited precision and almost-unlimited magnitude, and is not yet limited by the precision and magnitude allowed by DynamoDB. This is a known issue - Refs #6794 - and these four new xfailing tests will can be used to reproduce that issue. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200707204824.504877-1-nyh@scylladb.com>	2020-07-09 07:38:36 +02:00
Piotr Sarna	91a616968c	Update seastar submodule * seastar cbf88f59...5632cf21 (1): > Merge "Handle or avoid a few std::bad_alloc" from Rafael	2020-07-08 21:22:31 +02:00
Hagit Segev	aec910278f	build-deb.sh: fix rm to erase only python While building unified-deb we first use scylla/reloc/build_deb.sh to create the scylla core package, and after that scylla/reloc/python3/build_deb.sh to create python3. On 058da69#diff-4a42abbd0ed654a1257c623716804c82 a new rm -rf command was added. It causes python3 process to erase Scylla-core process. Set python3 to erase its own dir scylla-python3-package only.	2020-07-08 17:58:38 +03:00
Piotr Dulikowski	ad811a48bf	cdc: fix indentation	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	20b236d27d	cdc: don't update partition state when not needed In some cases, tracking the state of processed rows inside `transformer` is not needd at all. We don't need to do it if either: - Preimage and postimage are disabled for the table, - Only preimage is enabled and we are processing the last timestamp. This commit disables updating the state in the cases listed above.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	246f8da6f6	cdc: implement pre/postimage persistence Moves responsibility for generating pre/postimage rows from the "process_change" method to "produce_preimage" and "produce_postimage". This commit actually affects the contents of generated CDC log mutations. Added a unit test that verifies more complicated cases with CQL BATCH.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	24b50ffbc8	cdc: add interface for producing pre/postimages Introduces new methods to the change_processor interface that will cause it to produce pre/postimage rows for requested clustering key, or for static row. Introduces logic in split.cc responsible for calling pre/postimage methods of the change_processor interface. This does not have any effect on generated CDC log mutations yet, because the transformer class has empty implementations in place of those methods.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	761c59d92a	cdc: load preimage query result into partition state fields Instead of looking up preimage data directly from the raw preimage query results, use the raw results to populate current partition state data, and read directly from the current partition state.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	946354ee74	cdc: introduce fields for keeping partition state Introduces data structures that will be used for keeping the current state of processed rows: _clustering_row_states, and _static_row_state.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	bb587a93be	cdc: rename set_pk_columns -> allocate_new_log_row The new name better describes what this function does.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	82ddeb1992	cdc: track batch_no inside transformer Move tracking of batch_no inside the transformer.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	7b47f84965	cdc: move cdc$time generation to transformer Generate the timeuuid on the transformer side, which allows to simplify the change_processor interface.	2020-07-08 15:36:41 +02:00
Piotr Dulikowski	7691568b0a	cdc: move find_timestamp to split.cc The function is no longer used in log.cc, so instead it is moved to split.cc. Removed declaration of the function from the log.hh header, because it is not used elsewhere - apart from testing code, but it already declared find_timestamp in the cdc_test.cc file.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	51d97be0b3	cdc: introduce change_processor interface This allows for a more refined use of the transformer by the for_each_change function (now named "process_changes_with_splitting). The change_processor interface exposes two methods so far: begin_timestamp, and process_change (previously named "transform"). By separating those two and exposing them, process_changes_with\ _splitting can cause the transformer to generate less CDC log mutations - only one for each timestamp in the batch.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	f907cab156	cdc: remove redundant schema arguments from cdc functions A `mutation` object already has a reference to its schema. It does not make sense to call functions changed in this commit with a different schema.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	fa00ea996a	cdc: move management of generated mutations inside transformer CDC log mutations are now stored inside `transformer`, and only moved to the final set of mutations at the end of `transformer`'s lifetime.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	76a323a02d	cdc: move preimage result set into a field of transformer Instead of passing the preimage result set in each `transform` call, it is now assigned to a field, and `transform` uses that field.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	79eabc04a8	cdc: keep ts and tuuid inside transformer Adds a `begin_timestamp` method which tells the `transformer` to start using the following timestamp and timeuuid when generating new log row mutations.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	3c01b3c41d	cdc: track touched parts of mutations inside transformer Moves tracking of the "touched parts" statistics inside the transformer class. This commit is the first of multiple commits in this series which move parts of the state used in CDC log row generation inside the `transformer` class. There is a lot of state being passed to `transformer` each time its methods are called, which could be as well tracked by the `transformer` itself. This will result in a nicer interface and will allow us to generate less CDC log mutations which give the same result.	2020-07-08 15:36:40 +02:00
Piotr Dulikowski	027d20c654	cdc: always include preimage for affected rows This changes the current algorithm so that the preimage row will not be skipped if the corresponding rows was not present in preimage query results.	2020-07-08 15:36:40 +02:00
Rafael Ávila de Espíndola	b10beead61	memtable_snapshot_source: Avoid a std::bad_alloc crash _should_compact is a condition_variable and condition_variable::wait() allocates memory. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200706223201.903072-1-espindola@scylladb.com>	2020-07-08 15:21:50 +02:00
Avi Kivity	7ea9ee27dd	Merge 'aggregates: Use type-specific comparators in min/max' from Juliusz " For collections and UDTs the `MIN()` and `MAX()` functions are generated on the fly. Until now they worked by comparing just the byte representations of their arguments. This patch employs specific per-type comparators to provide semantically sensible, dynamically created aggregates. Fixes #6768 " * jul-stas-6768-use-type-comparators-for-minmax: tests: Test min/max on set aggregate_fcts: Use per-type comparators for dynamic types	2020-07-08 15:07:57 +03:00
Juliusz Stasiewicz	f08e0e10be	tests: Test min/max on set Expected behavior is the lexicographical comparison of sets (element by element), so this test was failing when raw byte representations were compared.	2020-07-08 13:39:15 +02:00
Juliusz Stasiewicz	5b438e79be	aggregate_fcts: Use per-type comparators for dynamic types For collections and UDTs the `MIN()` and `MAX()` functions are generated on the fly. Until now they worked by comparing just the byte representations of arguments. This patch uses specific per-type comparators to provide semantically sensible, dynamically created aggregates. Fixes #6768	2020-07-08 13:39:10 +02:00
Benny Halevy	d4615f4293	sstables: sstable_version_types: implement operator<=> Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200707061715.578604-1-bhalevy@scylladb.com>	2020-07-08 14:23:11 +03:00
Avi Kivity	5a59bb948c	Update seastar submodule * seastar dbecfff5a4...cbf88f59f2 (14): > future: mark detach_promise noexcept > net/tls: wait for flush() in shutdown > httpd: Use handle_exception instead of then_wrapped > httpd: Use std::unique_ptr instead of a raw pointer > with_lock: Handle mutable lambdas > Merge "Make the coroutine implementation a bit more like seastar::thread" from Rafael > tests: Fix perf fair queue stuck > util/backtrace: don't get hash from moved simple_backtrace in tasktrace ctor > scheduling: Allow scheduling_group_get_specific(key) to access elements with queue_is_initialized = false > prometheus: be compatible with protobuf < 3.4.0 > Merge "fix with_lock error handling" from Benny > Merge "Simplify the logic for detecting broken promises" from Rafael > Merge "make scheduling_group functions noexcept" from Benny > Merge "io_queue: Fixes and reworks for shared fair-queue" from Pavel E	2020-07-08 10:38:52 +03:00
Avi Kivity	b0698dfb38	Merge 'Rewrite CQL3 restriction representation' from dekimir " This is the first stage of replacing the existing restrictions code with a new representation. It adds a new class `expression` to replace the existing class `restriction`. Lots of the old code is deleted, though not all -- that will come in subsequent stages. Tests: unit (dev, debug restrictions_test), dtest (next-gating) " * dekimir-restrictions-rewrite: cql3/restrictions: Drop dead code cql3/restrictions: Use free functions instead of methods cql3/restrictions: Create expression objects cql3/restrictions: Add free functions over new classes cql3/restrictions: Add new representation	2020-07-08 10:22:17 +03:00
Avi Kivity	bced68f187	Update tools and jmx submodules * scylla-jmx b219573...5820992 (1): > dist/debian: apply generated package version for .orig.tar.gz file * scylla-tools 1639b12061...50dbf77123 (1): > dist/debian: apply generated package version for .orig.tar.gz file	2020-07-08 08:49:42 +03:00
Dejan Mircevski	61288ea7db	cql3/restrictions: Drop dead code Delete unused parts of the old restrictions representation: - drop all methods, members, and types from class restriction, but keep the class itself: it's the return type of relation::to_restriction, which we're keeping intact for now - drop all subclasses of single_column_restriction and token_restriction, but keep multi_column_restriction subclasses for their bounds_ranges method Keep the restrictions (plural) class, because statement_restrictions still keeps partition/clustering/other columns in separate collections. Move the restriction::merge_with method to primary_key_restrictions, where it's still being used. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-07 23:08:09 +02:00
Dejan Mircevski	37ebe521e3	cql3/restrictions: Use free functions instead of methods Instead of `restriction` class methods, use the new free functions. Specific replacement actions are listed below. Note that class `restrictions` (plural) remains intact -- both its methods and its type hierarchy remain intact for now. Ensure full test coverage of the replacement code with new file test/boost/restrictions_test.cc and some extra testcases in test/cql/*. Drop some existing tests because they codify buggy behaviour (reference #6369, #6382). Drop others because they forbid relation combinations that are now allowed (eg, mixing equality and inequality, comparing to NULL, etc.). Here are some specific categories of what was replaced: - restriction::is_foo predicates are replaced by using the free function find_if; sometimes it is used transitively (see, eg, has_slice) - restriction::is_multi_column is replaced by dynamic casts (recall that the `restrictions` class hierarchy still exists) - utility methods is_satisfied_by, is_supported_by, to_string, and uses_function are replaced by eponymous free functions; note that restrictions::uses_function still exists - restriction::apply_to is replaced by free function replace_column_def - when checking infinite_bound_range_deletions, the has_bound is replaced by local free function bounded_ck - restriction::bounds and restriction::value are replaced by the more general free function possible_lhs_values - using free functions allows us to simplify the multi_column_restriction and token_restriction hierarchies; their methods merge_with and uses_function became identical in all subclasses, so they were moved to the base class - single_column_primary_key_restrictions<clustering_key>::needs_filtering was changed to reuse num_prefix_columns_that_need_not_be_filtered, which uses free functions Fixes #5799. Fixes #6369. Fixes #6371. Fixes #6372. Fixes #6382. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-07 23:08:09 +02:00
Avi Kivity	4c221855a1	Merge 'hinted handoff: fix commitlog memory leak' from Piotr D " When commitlog is recreated in hints manager, only shutdown() method is called, but not release(). Because of that, some internal commitlog objects (`segment_manager` and `segment`s) may be left pointing to each other through shared_ptr reference cycles, which may result in memory leak when the parent commitlog object is destroyed. This PR prevents memory leaks that may happen this way by calling release() after shutdown() from the hints manager. Fixes: #6409, Fixes #6776 " * piodul-fix-commitlog-memory-leak-in-hinted-handoff: hinted handoff: disable warnings about segments left on disk hinted handoff: release memory on commitlog termination	2020-07-07 21:36:14 +03:00
Piotr Dulikowski	b955793088	hinted handoff: disable warnings about segments left on disk When a mutation is written to the commitlog, a rp_handle object is returned which keeps a reference to commitlog segment. A segment is "dirty" when its reference count is not zero, otherwise it is "clean". When commitlog object is being destroyed, a warning is being printed for every dirty segment. On the other hand, clean segments are deleted. In case of standard mutation writing path, the rp_handle moves responsibility for releasing the reference to the memtable to which the mutation is written. When the memtable is flushed to disk, all references accumulated in the memtable are released. In this context, it makes sense to warn about dirty segments, because such segments contain mutations that are not written to sstables, and need to be replayed. However, hinted handoff uses a different workflow - it recreates a commitlog object periodically. When a hint is written to commitlog, the rp_handle reference is not released, so that segments with hints are not deleted when destroying the commitlog. When commitlog is created again, we get a list of saved segments with hints that we can try to send at a later time. Although this is intended behavior, now that releasing the hints commitlog is done properly, it causes the mentioned warning to periodically appear in the logs. This patch adds a parameter for the commitlog that allows to disable this warning. It is only used when creating hinted handoff commitlogs.	2020-07-07 19:40:42 +02:00
Piotr Dulikowski	002e6c4056	hinted handoff: release memory on commitlog termination When commitlog is recreated in hints manager, only shutdown() method is called, but not release(). Because of that, some internal commitlog objects (`segment_manager` and `segment`s) may be left pointing to each other through shared_ptr reference cycles, which may result in memory leak when the parent commitlog object is destroyed. This commit prevents memory leaks that may happen this way by calling release() after shutdown() from the hints manager. Fixes: #6409, #6776	2020-07-07 19:40:32 +02:00
Nadav Har'El	0143aaa5a8	merge: Forbid internal schema changes for distributed tables Merged patch set from Piotr Sarna: This series addresses issue #6700 again (it was reopened), by forbidding all non-local schema changes to be performed from within the database via CQL interface. These changes are dangerous since they are not directly propagated to other nodes. Tests: unit(dev) Fixes #6700 Piotr Sarna (4): test: make schema changes in query_processor_test global cql3: refuse to change schema internally for distributed tables test: expand testing internal schema changes cql3: add explanatory comments to execute_internal cql3/query_processor.hh \| 13 ++++++++++++- cql3/statements/alter_table_statement.cc \| 6 ------ cql3/statements/schema_altering_statement.cc \| 15 +++++++++++++++ test/boost/cql_query_test.cc \| 8 ++++++-- test/boost/query_processor_test.cc \| 16 ++++++++-------- 5 files changed, 41 insertions(+), 17 deletions(-)	2020-07-07 18:27:16 +03:00
Takuya ASADA	967084b567	scylla_coredump_setup: support older version of coredumpctl message format "coredumpctl info" behavior had been changed since systemd-v232, we need to support both version. Before systemd-v232, it was simple. It print 'Coredump' field only when the coredump exists on filesystem. Otherwise print nothing. After the change made on systemd-v232, it become more complex. It always print 'Storage' field even the coredump does not exists. Not just available/unavailable, it describe more: - Storage: none - Storage: journal - Storage: /path/to/file (inacessible) - Storage: /path/to/file To support both of them, we need to detect message version first, then try to detect coredump path. Fixes: #6789 reference: `47f5064207`	2020-07-07 18:27:16 +03:00
Takuya ASADA	ef05ea8e91	node_exporter_install: stop service before force installing Stop node-exporter.service before re-install it, to avoid 'Text file busy' error. Fixes #6782	2020-07-07 18:27:16 +03:00
Takuya ASADA	f34001ff14	debian: use symlink copying files to build/debian/debian/ Instead of running shutil.copy() for each *.{service,default}, create symlink for these files. Python will copy original file when copying debian directory.	2020-07-07 18:27:16 +03:00
Asias He	0929a5e82b	repair: Fix inaccurate exception message in check_failed_ranges The reason for the failure can be other reasons than failure of checksum. Fixes #6785	2020-07-07 18:27:16 +03:00
Asias He	6e6e554944	repair: Use warn level for logs with recoverable failures Those logs are not fatal and recoverable. We should make them warn level instead of info level. Fixes #5612	2020-07-07 18:27:16 +03:00
Piotr Sarna	86f8b83ece	cql3: add explanatory comments to execute_internal Executing internal CQL queries needs to be done with caution, since they were designed to be used mainly for local tables and have very specific semantics wrt. propagating schema changes. A short comment is added in order to prevent future misuse of this interface.	2020-07-07 11:54:36 +02:00
Piotr Sarna	8ecae38d6b	test: expand testing internal schema changes ... in order to ensure that not only ALTER TABLE, but also other schema altering statements are not allowed for distributed tables/keyspaces.	2020-07-07 10:02:58 +02:00
Piotr Sarna	a544ca64e2	cql3: refuse to change schema internally for distributed tables Changing the schemas via internal calls to CQL is dangerous, since the changes are not propagated to other nodes. Thus, it should never be used for regular distributed tables. The guarding code was already added for ALTER TABLE statement and it's now expanded to cover all schema altering statements. Tests: unit(dev) Fixes #6700	2020-07-07 09:32:33 +02:00
Piotr Sarna	9bdf17a804	test: make schema changes in query_processor_test global Now that schema changes are going to be forbidden for non-local tables, query_processor_test is updated accordingly.	2020-07-07 09:09:40 +02:00
Botond Dénes	5ebe2c28d1	db/view: view_update_generator: re-balance wait/signal on the register semaphore The view update generator has a semaphore to limit concurrency. This semaphore is waited on in `register_staging_sstable()` and later the unit is returned after the sstable is processed in the loop inside `start()`. This was broken by `4e64002`, which changed the loop inside `start()` to process sstables in per table batches, however didn't change the `signal()` call to return the amount of units according to the number of sstables processed. This can cause the semaphore units to dry up, as the loop can process multiple sstables per table but return just a single unit. This can also block callers of `register_staging_sstable()` indefinitely as some waiters will never be released as under the right circumstances the units on the semaphore can permanently go below 0. In addition to this, `4e64002` introduced another bug: table entries from the `_sstables_with_tables` are never removed, so they are processed every turn. If the sstable list is empty, there won't be any update generated but due to the unconditional `signal()` described above, this can cause the units on the semaphore to grow to infinity, allowing future staging sstables producers to register a huge amount of sstables, causing memory problems due to the amount of sstable readers that have to be opened (#6603, #6707). Both outcomes are equally bad. This patch fixes both issues and modifies the `test_view_update_generator` unit test to reproduce them and hence to verify that this doesn't happen in the future. Fixes: #6774 Refs: #6707 Refs: #6603 Tests: unit(dev) Signed-off-by: Botond DÃ©nes <bdenes@scylladb.com> Message-Id: <20200706135108.116134-1-bdenes@scylladb.com>	2020-07-07 08:53:00 +02:00
Wojciech Mitros	76038b8d8e	view: differentiate identical error messages and change them to warnings Modified log message in view_builder::calculate_shard_build_step to make it distinct from the one in view_builder::execute, changed their logging level to warning, since we're continuing even if we handle an exception. Fixes #4600	2020-07-06 20:50:34 +03:00
Dejan Mircevski	921dbd0978	cql/restrictions: Handle `WHERE a>0 AND a<0` WHERE clauses with start point above the end point were handled incorrectly. When the slice bounds are transformed to interval bounds, the resulting interval is interpreted as wrap-around (because start > end), so it contains all values above 0 and all values below 0. This is clearly incorrect, as the user's intent was to filter out all possible values of a. Fix it by explicitly short-circuiting to false when start > end. Add a test case. Fixes #5799. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-07-06 19:11:20 +03:00
Piotr Sarna	e4b74356bb	Merge 'view_update_generator: use partitioned sstable set' from Botond. Recently it was observed (#6603) that since 4e6400293ea, the staging reader is reading from a lot of sstables (200+). This consumes a lot of memory, and after this reaches a certain threshold -- the entire memory amount of the streaming reader concurrency semaphore -- it can cause a deadlock within the view update generation. To reduce this memory usage, we exploit the fact that the staging sstables are usually disjoint, and use the partitioned sstable set to create the staging reader. This should ensure that only the minimum number of sstable readers will be opened at any time. Refs: #6603 Fixes: #6707 Tests: unit(dev) * 'view-update-generator-use-partitioned-set/v1' of https://github.com/denesb/scylla: db/view: view_update_generator: use partitioned sstable set sstables: make_partitioned_sstable_set(): return an sstable_set	2020-07-06 14:36:08 +02:00
Botond Dénes	62c6859b69	db/view: view_update_generator: use partitioned sstable set And pass it to `make_range_sstable_reader()` when creating the reader, thus allowing the incremental selector created therein to exploit the fact that staging sstables are disjoint (in the case of repair and streaming at least). This should reduce the memory consumption of the staging reader considerably when reading from a lot of sstables.	2020-07-06 13:38:23 +03:00
Botond Dénes	84b5d6d6d0	sstables: make_partitioned_sstable_set(): return an sstable_set Instead of an `std::unique_ptr<sstable_set_impl>`. The latter doesn't have a publicly available destructor, so it can only be called from withing `sstables/compaction_strategy.cc` where its definition resides. Thus it is not really usable as a public function in its current form, which shows as it has no users either. This patch makes it usable by returning an `sstable_set`. That is what potential callers would want anyway. In fact this patch prepares the ground for the next one, which wishes to use this function for just that but can't in its current form.	2020-07-06 13:38:23 +03:00
Takuya ASADA	2d63acdd6a	scylla_util.py: use correct ID value for distro.id() It seems distro.id() is NOT always same output as ID in /etc/os-release. We need to replace "ol" to "oracle", "amzn" to "amazon". Fixes #6761	2020-07-06 11:40:00 +03:00
Asias He	a19917eb91	gossiper: Drop replacement_quarantine It is not used any more after "gossiper: Drop unused replaced_endpoint". Refs #5482	2020-07-06 11:27:55 +03:00
Asias He	2bc73ad290	gossiper: Drop unused replaced_endpoint It is not used any more after `75cf1d18b5` (storage_service: Unify handling of replaced node removal from gossip) in the "Make replacing node take writes" series. Refs #5482	2020-07-06 11:27:55 +03:00
Piotr Sarna	446b89f408	test: move json tests from manual/ to boost/ Manual tests are, as the name suggests, not run automatically, which makes them more prone to regressions. JSON tests are fast and correct, so there's no reason for them to be marked as manual. Message-Id: <dea75b0a0d1c238d12382a28840978884ac6ec2c.1594023481.git.sarna@scylladb.com>	2020-07-06 11:24:12 +03:00
Asias He	7926ff787b	storage_service: Make handle_state_left more robust when tokens are empty In case the tokens for the node to be removed from the cluster are empty, log the application_state of the leaving node to help understand why the tokens are empty and try to get the tokens from token_metadata. Refs #6468	2020-07-06 15:51:19 +08:00
Avi Kivity	058b30b891	Merge "scylla-gdb.py: scylla_fiber: protect against reference loops" from Botond " This mini-series adds protection against reference loops between tasks, preventing infinite recursion in this case. It also contains some other improvements, like updating the task whitelist as well as the task identification mechanism w.r.t. recent changes in seastar. It also improves verbose logging, which was found to not work well while investigating the other issues fixed herein. " * 'scylla-gdb.py-scylla-fiber-update/v1' of https://github.com/denesb/scylla: scylla-gdb.py: scylla_fiber: add protection against reference loops scylla-gdb.py: scylla_fiber: relax requirement w.r.t. what object qualifies as task scylla-gdb.py: scylla_fiber: update whitelist scylla-gdb.py: scylla_fiber: improve verbose log output	2020-07-06 10:34:13 +03:00
Piotr Sarna	83ab41c76d	test: add json test for parsing from map Our JSON legacy helper functions for parsing documents to/from string maps are indirectly tested by several unit tests, e.g. caching_options_test.cc. They however lacked one corner case detected only by dtest - parsing an empty map from a null JSON document. This case is hereby added in order to prevent future regressions. Message-Id: <df8243bd083b2ba198df665aeb944c8710834736.1594020411.git.sarna@scylladb.com>	2020-07-06 10:28:55 +03:00
Avi Kivity	cc7a906149	Merge "random_access_reader: futurize seek" from Benny " Rather than relying on a gate to serialize seek's background work with close(), change seek() to return a future<> and wait on it. Also, now random_access_reader read_exactly(), seek(), and close() are made noexcept. This will be followed up by making sstable parse methods noexcept. Test: unit(dev) " * tag 'random_access_reader-v4' of github.com:bhalevy/scylla: sstables: random_access_reader: make methods noexcept sstables: random_access_reader: futurize seek sstables: random_access_reader: unify input stream close code sstables: random_access_reader: let file_random_access_reader set the input stream sstables: random_access_reader: move functions out of line	2020-07-06 10:16:18 +03:00
Asias He	027fa022e2	token_metadata: Do not throw if empty tokens are passed to remove_bootstrap_tokens Gossip on_change callback calls storage_service::excise which calls remove_bootstrap_tokens to remove the tokens of the leaving node from bootstrap tokens. If empty tokens, e.g., due to gossip propagation issue as we saw in https://github.com/scylladb/scylla/issues/6468, are passed to remove_bootstrap_tokens, it will throw. Since the on_change callback is marked as noexcept, such throw will cause the node to terminate which is an overkill. To avoid such error causing the whole cluster to down in worse cases, just log the tokens are empty passed to remove_bootstrap_tokens. Refs #6468	2020-07-06 14:28:23 +08:00
Botond Dénes	54bb9ddaae	docs/debugging.md: drop --privileged from dbuild start instructions Instead, label the mapped volume by passing `:z` options to `-v` argument, like we do for other mapped volumes in the `dbuild` script. Passing the `--privileged` flag doesn't work after the most recent Fedora update and anyway, using `:z` is the proper way to make sure the mounted volume is accessible. Historically it was needed to be able to open cores as well, but since `5b08e91bd` this is not necessary as the container is created with SYS_PTRACE capability. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200703072703.10355-1-bdenes@scylladb.com>	2020-07-06 08:09:58 +02:00
Benny Halevy	fc89018146	sstables: random_access_reader: make methods noexcept handle all exceptions in read_exactly, seek, and close and specify them as noexcept. Also, specify eof() as noexcept as it trivially is. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-05 19:40:48 +03:00
Benny Halevy	94460f3199	sstables: random_access_reader: futurize seek And adjust its callers to wait on the returned future. With this, there is no need for a gate to serialize close() with the background work seek() used to leave behind. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-05 19:40:26 +03:00
Benny Halevy	765c5752c2	sstables: random_access_reader: unify input stream close code Define a close_if_needed() helper function, to be called from seek() and close(). A future patch will call it with a possibly disengaged `_in` so it will close it only if it was engaged. close_if_needed() captures the input stream unique ptr so it will remain valid throughout close. This was missing from close(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-05 19:37:39 +03:00
Benny Halevy	e7fdadd748	sstables: random_access_reader: let file_random_access_reader set the input stream Allow file_random_access_reader constructor to set the input stream to prepare for futurizing seek() by adding a protected set() method. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-05 19:37:36 +03:00
Benny Halevy	0bb1c0f37d	sstables: random_access_reader: move functions out of line These are not good candidates for inlining. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-07-05 18:47:04 +03:00
Avi Kivity	36b6ee7b11	Merge 'python3: simplified .rpm/.deb build process' from Takuya " Follow scylla-server package changes, simplified .rpm/.deb build process which merge build scripts into single script. " * syuu1228-python3_simplified_pkg_scripts: python3: simplified .deb build process python3: simplified .rpm build process	2020-07-05 18:09:17 +03:00
Avi Kivity	cc891a5de8	Merge "Convert a few uses of sstring to std::string_view" from Rafael " This series converts an API to use std::string_view and then converts a few sstring variables to be constexpr std::string_view. This has the advantage that a constexpr variables cannot be part of any initialization order problem. " * 'espindola/convert-to-constexpr' of https://github.com/espindola/scylla: auth: Convert sstring variables in common.hh to constexpr std::string_view auth: Convert sstring variables in default_authorizer to constexpr std::string_view cql_test_env: Make ks_name a constexpr std::string_view class_registry: Use std::string_view in (un)?qualified_name	2020-07-05 17:08:54 +03:00
Dmitry Kropachev	de82b3efae	dist/common/scripts/scylla-housekeeping: wrap urllib.request with try ... except We could hit "cannot serialize '_io.BufferedReader' object" when request get 404 error from the server Now you will get legit error message in the case. Fixes #6690	2020-07-05 16:33:11 +03:00
Takuya ASADA	d94fe346ee	scylla_coredump_setup: detect missing coredump file Print error message and exit with non-zero status by following condition: - coredumpctl says the coredump file is inaccessible - failed to detect coredump file path from 'coredumpctl info <pid>' - deleting coredump file failed because the file is missing Fixes #6654	2020-07-05 14:24:51 +03:00
Takuya ASADA	d65b15f3b2	dist/debian/python3: apply version number fixup on scylla-python3 Sync version number fixup from main package, contains #6546 and #6752 fixes. Note that scylla-python3 likely does not affect this versioning issue, since it uses python3 version, which normally does not contain 'rcX'.	2020-07-05 14:21:18 +03:00
Takuya ASADA	8750c5ccf3	python3: simplified .deb build process We don't really need to have two build_deb.sh, merge it to reloc.	2020-07-04 23:41:33 +09:00
Takuya ASADA	fc320ac49d	python3: simplified .rpm build process We don't really need to have two build_rpm.sh, merge it to reloc.	2020-07-04 23:41:22 +09:00
Rafael Ávila de Espíndola	400212e81f	auth: Convert sstring variables in common.hh to constexpr std::string_view This converts the following variables: DEFAULT_SUPERUSER_NAME AUTH_KS USERS_CF AUTH_PACKAGE_NAME Since they are now constexpr they will not be part of any initialization order problems. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-03 12:35:58 -07:00
Rafael Ávila de Espíndola	53ed39e64a	auth: Convert sstring variables in default_authorizer to constexpr std::string_view This converts the following variables: ROLE_NAME RESOURCE_NAME PERMISSIONS_NAME PERMISSIONS_CF Since they are now constexpr they will not be part of any initialization order problems. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-03 12:33:33 -07:00
Rafael Ávila de Espíndola	33af0c293f	cql_test_env: Make ks_name a constexpr std::string_view Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-03 12:28:20 -07:00
Rafael Ávila de Espíndola	a2110e413f	class_registry: Use std::string_view in (un)?qualified_name This gives more flexibility for constructing a qualified_name or unqualified_name. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-03 12:28:14 -07:00
Nadav Har'El	8e3ecc30a9	merge: Migrate from libjsoncpp to rjson Merged patch series by Piotr Sarna: The alternator project was in need of a more optimized JSON library, which resulted in creating "rjson" helper functions. Scylla generally used libjsoncpp for its JSON handling, but in order to reduce the dependency hell, the usage is now migrated to rjson, which is faster and offers the same functionality. The original plan was to be able to drop the dependency on libjsoncpp-lib altogether and remove it from install-dependencies.sh, but one last usage of it remains in our test suite, namely cql_repl. The tool compares its output JSON textually, so it depends on how a library presents JSON - what are the delimeters, indentation, etc. It's possible to provide a layer of translation to force rjson to print in an identical format, but the other issue is that libjsoncpp keeps subobjects sorted by their name, while rjson uses an unordered structure. There are two possible solutions for the last remaining usage of libjsoncpp: 1. change our test suite to compare JSON documents with a JSON parser, so that we don't rely on internal library details 2. provide a layer of translation which forces rjson to print its objects in a format idential to libjsoncpp. (1.) would be preferred, since now we're also vulnerable for changes inside libjsoncpp itself - if they change anything in their output format, tests would start failing. The issue is not critical however, so it's left for later. Tests: unit(dev), manual(json_test), dtest(partitioner_tests.TestPartitioner.murmur3_partitioner_test) Piotr Sarna (8): alternator,utils: move rjson.hh to utils/ alternator: remove ambiguous string overloads in rjson rjson: add parse_to_map helper function rjson: add from_string_map function rjson: add non-throwing parsing rjson: move quote_json_string to rjson treewide: replace libjsoncpp usage with rjson configure: drop json.cc and json.hh helpers alternator/base64.hh \| 2 +- alternator/conditions.cc \| 2 +- alternator/executor.hh \| 2 +- alternator/expressions.hh \| 2 +- alternator/expressions_types.hh \| 2 +- alternator/rmw_operation.hh \| 2 +- alternator/serialization.cc \| 2 +- alternator/serialization.hh \| 2 +- alternator/server.cc \| 2 +- caching_options.hh \| 9 +- cdc/log.cc \| 4 +- column_computation.hh \| 5 +- configure.py \| 3 +- cql3/functions/functions.cc \| 4 +- cql3/statements/update_statement.cc \| 24 ++-- cql3/type_json.cc \| 212 ++++++++++++++++++---------- cql3/type_json.hh \| 7 +- db/legacy_schema_migrator.cc \| 12 +- db/schema_tables.cc \| 1 - flat_mutation_reader.cc \| 1 + index/secondary_index.cc \| 80 +++++------ json.cc \| 80 ----------- json.hh \| 113 --------------- schema.cc \| 25 ++-- test/boost/cql_query_test.cc \| 9 +- test/manual/json_test.cc \| 4 +- test/tools/cql_repl.cc \| 1 + {alternator => utils}/rjson.cc \| 75 +++++++++- {alternator => utils}/rjson.hh \| 40 +++++- 29 files changed, 344 insertions(+), 383 deletions(-) delete mode 100644 json.cc delete mode 100644 json.hh rename {alternator => utils}/rjson.cc (86%) rename {alternator => utils}/rjson.hh (81%)	2020-07-03 18:23:56 +02:00
Piotr Sarna	449e72826f	configure: drop json.cc and json.hh helpers Now that only rjson is used in the code, the old helper is not used anywhere in the code, so it can be dropped.	2020-07-03 10:27:23 +02:00
Piotr Sarna	4cb79f04b0	treewide: replace libjsoncpp usage with rjson In order to eventually switch to a single JSON library, most of the libjsoncpp usage is dropped in favor of rjson. Unfortunately, one usage still remains: test/utils/test_repl utility heavily depends on the exact textual format of its output JSON files, so replacing a library results in all tests failing because of differences in formatting. It is possible to force rjson to print its documents in the exact matching format, but that's left for later, since the issue is not critical. It would be nice though if our test suite compared JSON documents with a real JSON parser, since there are more differences - e.g. libjsoncpp keeps children of the object sorted, while rapidjson uses an unordered data structure. This change should cause no change in semantics, it strives just to replace all usage of libjsoncpp with rjson.	2020-07-03 10:27:23 +02:00
Piotr Sarna	1b37517aab	rjson: move quote_json_string to rjson This utility function is used for type serialization, but it also has a dedicated unit test, so it needs to be globally reachable.	2020-07-03 10:27:23 +02:00
Piotr Sarna	f568fe869f	rjson: add non-throwing parsing Returning a disengaged optional instead of throwing an error can be useful when the input string is expected not to be a valid JSON in certain cases.	2020-07-03 10:27:23 +02:00
Piotr Sarna	3fda9908f2	rjson: add from_string_map function This legacy function is needed because the existing implementation relies on being able to parse flat JSON documents to and from maps of strings.	2020-07-03 10:27:23 +02:00
Piotr Sarna	39b5408a84	rjson: add parse_to_map helper function Existing infrastructure relies on being able to parse a JSON string straight into a map of strings. In order to make rjson a drop-in replacement(tm) for libjsoncpp, a similar helper function is provided.	2020-07-03 10:27:23 +02:00
Piotr Sarna	1df6d98b1a	alternator: remove ambiguous string overloads in rjson It's redundant to provide function overloads for both string_view and const string&, since both of them can be implicitly created from const char*. Thus, only string_view overloads are kept. Example code which was ambiguous before the patch, but compiles fine after it: rjson::from_string("hello"); Without the patch, one had to explicitly state the type, e.g.: rjson::from_string(std::string_view("hello")); which is excessive.	2020-07-03 08:30:01 +02:00
Piotr Sarna	4de23d256e	alternator,utils: move rjson.hh to utils/ rjson is going to replace libjsoncpp, so it's moved from alternator to the common utils/ directory.	2020-07-03 08:30:01 +02:00
Takuya ASADA	a107f086bc	dist/debian: apply generated package version for .orig.tar.gz file We currently does not able to apply version number fixup for .orig.tar.gz file, even we applied correct fixup on debian/changelog, becuase it just reading SCYLLA-VERSION-FILE. We should parse debian/{changelog,control} instead. Fixes #6736	2020-07-03 08:24:41 +02:00
Takuya ASADA	4769f30a11	python3: fix incorrect variable name builddir should be BUILDDIR.	2020-07-03 08:24:41 +02:00
Avi Kivity	a3dd1ba76f	build: thrift: avoid rebuild if cassandra.thrift is touched but not modified Thrift 0.12 includes a change [1] that avoids writing the generated output if it has not changed. As a result, if you touch cassandra.thrift (but not change it), the generated files will not update, and as a result ninja will try to rebuild them every time. The compilation of thrift files will be fast due to ccache, but still we will re-link everything. This touching of cassandra.thrift can happen naturally when switching to a different git branch and then switching back. The net result is that cassandra.thrift's contents has not changed, but its timestamp has. Fix by adding the "restat" option to the thrift rule. This instructs ninja to check of the output has changed as expected or not, and to avoid unneeded rebuilds if it has not. [1] https://issues.apache.org/jira/browse/THRIFT-4532	2020-07-03 08:24:41 +02:00
Rafael Ávila de Espíndola	6fe7706fce	mutation_reader_test: Wait for a future Nothing was waiting for this future. Found while testing another patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200630183929.1704908-1-espindola@scylladb.com>	2020-07-03 08:24:41 +02:00
Rafael Ávila de Espíndola	b7f5e2e0dd	big_decimal: Add more tests It looks like an order version of my patch series was merged. The only difference is that the new one had more tests. This patch adds the missing ones. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200630141150.1286893-1-espindola@scylladb.com>	2020-07-03 08:24:41 +02:00
Botond Dénes	b91cb8cc60	scylla-gdb.py: scylla_fiber: add protection against reference loops Remember all previously visited tasks and stop if one of them is seen again. The walk algorithm is converted from recursive to iterative to facilitate this.	2020-07-01 16:37:47 +03:00
Botond Dénes	427dae61f8	scylla-gdb.py: scylla_fiber: relax requirement w.r.t. what object qualifies as task Don't require that the object is located at the start of the allocation block. Some tasks, like `seastar::internal::when_all_state_component` might not.	2020-07-01 16:34:36 +03:00
Botond Dénes	bb5b0ccbd9	scylla-gdb.py: scylla_fiber: update whitelist We have some new task derivatives.	2020-07-01 16:33:47 +03:00
Botond Dénes	6814f8c762	scylla-gdb.py: scylla_fiber: improve verbose log output Gdb doesn't seem to handle multiple calls to `gdb.write()` writing to the same line well, the content of some of these calls just disappears. So make sure each log message is a separate line and use indentation instead to illustrate references between messages.	2020-07-01 16:31:48 +03:00
Asias He	7f3eb8b4e8	repair: Handle dropped table in repair_range In commit `12d929a5ae` (repair: Add table_id to row_level_repair), a call to find_column_family() was added in repair_range. In case of the table is dropped, it will fail the repair_range which in turn fails the bootstrap operation. Tests: update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_new_node_while_schema_changes_test Fixes: #5942	2020-07-01 12:13:14 +03:00
Takuya ASADA	03ce19d53a	scylla_setup: follow hugepages package name change on Ubuntu 20.04LTS hugepages package now renamed to libhugetlbfs-bin, we need to follow the change. Fixes #6673	2020-07-01 11:41:07 +03:00
Takuya ASADA	01f9be1ced	scylla_setup: improve help message	2020-07-01 11:39:44 +03:00
Avi Kivity	c84217adaa	Merge 'Compaction fix stall in perform cleanup' from Asias " compaction_manager: Avoid stall in perform_cleanup The following stall was seen during a cleanup operation: scylla: Reactor stalled for 16262 ms on shard 4. \| std::_MakeUniq<locator::tokens_iterator_impl>::__single_object std::make_unique<locator::tokens_iterator_impl, locator::tokens_iterator_impl&>(locator::tokens_iterator_impl&) at /usr/include/fmt/format.h:1158 \| (inlined by) locator::token_metadata::tokens_iterator::tokens_iterator(locator::token_metadata::tokens_iterator const&) at ./locator/token_metadata.cc:1602 \| locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at simple_strategy.cc:? \| (inlined by) locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at ./locator/simple_strategy.cc:56 \| locator::abstract_replication_strategy::get_ranges(gms::inet_address, locator::token_metadata&) const at /usr/include/fmt/format.h:1158 \| locator::abstract_replication_strategy::get_ranges(gms::inet_address) const at /usr/include/fmt/format.h:1158 \| service::storage_service::get_ranges_for_endpoint(seastar::basic_sstring<char, unsigned int, 15u, true> const&, gms::inet_address const&) const at /usr/include/fmt/format.h:1158 \| service::storage_service::get_local_ranges(seastar::basic_sstring<char, unsigned int, 15u, true> const&) const at /usr/include/fmt/format.h:1158 \| (inlined by) operator() at ./sstables/compaction_manager.cc:691 \| (inlined by) _M_invoke at /usr/include/c++/9/bits/std_function.h:286 \| std::function<std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > (table const&)>::operator()(table const&) const at /usr/include/fmt/format.h:1158 \| (inlined by) compaction_manager::rewrite_sstables(table, sstables::compaction_options, std::function<std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > (table const&)>) at ./sstables/compaction_manager.cc:604 \| compaction_manager::perform_cleanup(table) at /usr/include/fmt/format.h:1158 To fix, we furturize the function to get sstables. If get_local_ranges() is called inside a thread, get_local_ranges will yield automatically. Fixes #6662 " * asias-compaction_fix_stall_in_perform_cleanup: compaction_manager: Avoid stall in perform_cleanup compaction_manager: Return exception future in perform_cleanup abstract_replication_strategy: Add get_ranges_in_thread	2020-07-01 11:30:37 +03:00
Avi Kivity	7e9a3b08ac	Merge "mutation_reader: shard_reader fix fast-forwarding with read-ahead" from Botond " Currently, the fast forwarding implementation of the shard reader is broken in some read-ahead related corner cases, namely: * If the reader was not created yet, but there is an ongoing read-ahead (which is going to create it), the function bails out. This will result in this shard reader not being fast-forwarded to the new range at all. * If the reader was already created and there is an ongoing read-ahead, the function will wait for this to complete, then fast-forward the reader, as it should. However, the buffer is cleared before the read-ahead is waited for. So if the read-ahead brings in new data, this will land in the buffer. This data will be outside of the fast-forwarded-to range and worse, as we just cleared the buffer, it might violate mutation fragment stream monotonicity requirements. This series fixes these two bugs and adds a unit test which reproduces both of them. There are no known field issues related to these bugs. Only row-level repair ever fast-forwards the multishard reader, but it only uses it in heterogenous clusters. Even so, in theory none of these bugs affect repair as it doesn't ever fast-forward the multishard reader before all shards arrive at EOS. The bugs were found while auditing the code, looking for the possible cause of #6613. Fixes: #6715 Tests: unit(dev) " * 'multishard-combining-reader-fast-forward-to-fixes/v1.1' of https://github.com/denesb/scylla: test/boost/mutation_reader_test: multishard reader: add tests for fast-forwarding with read-ahead test/boost/mutation_reader_test: extract multishard read-ahead test setup test/boost/mutation_reader_test: puppet_reader: fast-forward-support mutation_reader_test: puppet_reader: make interface more predictable dht::sharder: add virtual destructor mutation_reader: shard_reader: fix fast-forwarding with read-ahead	2020-07-01 11:22:41 +03:00
Botond Dénes	cb69406f6c	test/boost/mutation_reader_test: multishard reader: add tests for fast-forwarding with read-ahead	2020-07-01 10:15:49 +03:00
Botond Dénes	d6e2033d8a	test/boost/mutation_reader_test: extract multishard read-ahead test setup Testing the multishard reader's various read-ahead related corner cases requires a non-trivial setup. Currently there is just one such test, but we plan to add more so in this patch we extract this setup code to a free function to allow reuse across multiple tests.	2020-07-01 10:15:49 +03:00
Botond Dénes	851ae8c650	test/boost/mutation_reader_test: puppet_reader: fast-forward-support A fast-forwarded puppet reader goes immediately to EOS. A counter is added to the remote control to allow tests to check which readers were actually fast forwarded.	2020-07-01 10:15:49 +03:00
Botond Dénes	741f0c276d	mutation_reader_test: puppet_reader: make interface more predictable Currently the puppet reader will do an automatic (half) buffer-fill in the constructor. This makes it very hard to reason about when and how the action that was passed to it will be executed. Refactor it to take a list of actions and only execute those, no hidden buffer-fill anymore. No better proof is needed for this than the fact that the test which is supposed to test the multishard reader being destroyed with a pending read-ahead was silently broken (not testing what it should). This patch fixes this test too. Also fixed in this patch is the `pending` and `destroyed` fields of the remote control, tests can now rely on these to be correct and add additional checkpoints to ensure the test is indeed doing what it was intended to do.	2020-07-01 10:15:49 +03:00
Botond Dénes	6ae8e0bc7d	dht::sharder: add virtual destructor This is a class with virtual methods, it should have a virtual destructor too.	2020-07-01 10:15:49 +03:00
Asias He	07e253542d	compaction_manager: Avoid stall in perform_cleanup The following stall was seen during a cleanup operation: scylla: Reactor stalled for 16262 ms on shard 4. \| std::_MakeUniq<locator::tokens_iterator_impl>::__single_object std::make_unique<locator::tokens_iterator_impl, locator::tokens_iterator_impl&>(locator::tokens_iterator_impl&) at /usr/include/fmt/format.h:1158 \| (inlined by) locator::token_metadata::tokens_iterator::tokens_iterator(locator::token_metadata::tokens_iterator const&) at ./locator/token_metadata.cc:1602 \| locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at simple_strategy.cc:? \| (inlined by) locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at ./locator/simple_strategy.cc:56 \| locator::abstract_replication_strategy::get_ranges(gms::inet_address, locator::token_metadata&) const at /usr/include/fmt/format.h:1158 \| locator::abstract_replication_strategy::get_ranges(gms::inet_address) const at /usr/include/fmt/format.h:1158 \| service::storage_service::get_ranges_for_endpoint(seastar::basic_sstring<char, unsigned int, 15u, true> const&, gms::inet_address const&) const at /usr/include/fmt/format.h:1158 \| service::storage_service::get_local_ranges(seastar::basic_sstring<char, unsigned int, 15u, true> const&) const at /usr/include/fmt/format.h:1158 \| (inlined by) operator() at ./sstables/compaction_manager.cc:691 \| (inlined by) _M_invoke at /usr/include/c++/9/bits/std_function.h:286 \| std::function<std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > (table const&)>::operator()(table const&) const at /usr/include/fmt/format.h:1158 \| (inlined by) compaction_manager::rewrite_sstables(table, sstables::compaction_options, std::function<std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > (table const&)>) at ./sstables/compaction_manager.cc:604 \| compaction_manager::perform_cleanup(table) at /usr/include/fmt/format.h:1158 To fix, we furturize the function to get local ranges and sstables. In addition, this patch removes the dependency to global storage_service object. Fixes #6662	2020-07-01 15:03:50 +08:00
Asias He	868e2da1c4	compaction_manager: Return exception future in perform_cleanup We should return the exception future instead of throw a plain exception. Refs #6662	2020-07-01 15:00:01 +08:00
Asias He	94995acedb	abstract_replication_strategy: Add get_ranges_in_thread Add a version that runs inside a seastar thread. The benefit is that get_ranges can yield to avoid stalls. Refs #6662	2020-07-01 15:00:01 +08:00
Botond Dénes	627054c3d7	mutation_reader: shard_reader: fix fast-forwarding with read-ahead The current `fast_forward_to(const dht::partition_range&)` implementation has two problems: * If the reader was not created yet, but there is an ongoing read-ahead (which is going to create it), the function bails out. This will result in this shard reader not being fast-forwarded to the new range at all. * If the reader was already created and there is an ongoing read-ahead, the function will wait for this to complete, then fast-forward the reader, as it should. However, the buffer is cleared before the read-ahead is waited for. So if the read-ahead brings in new data, this will land in the buffer. This data will be outside of the fast-forwarded-to range and worse, as we just cleared the buffer, it might violate mutation fragment stream monotonicity requirements. This patch fixes both of these bugs. Targeted reproducer unit tests are coming in the next patches.	2020-07-01 09:51:02 +03:00
Takuya ASADA	bbd3ed9d47	scylla_util.py: switch to subprocess.run() When we started to porting bash script to python script, we are not able to use subprocess.run() since EPEL only provides python 3.4, but now we have relocatable python, so we can switch to it.	2020-06-30 20:13:30 +03:00
Takuya ASADA	a9de438b1f	scylla_swap_setup: handle <1GB environment Show better error message and exit with non-zero status when memory size <1GB. Fixes #6659	2020-06-30 20:12:32 +03:00
Avi Kivity	5bcef44935	Update seastar submodule * seastar 5db34ea8d3...dbecfff5a4 (3): > sharded: Do not hang on never set freed promise Fixes #6606. > foreign_ptr: make constructors and methods conditionally noexcept > foreign_ptr: specify methods as noexcept	2020-06-30 19:27:21 +03:00
Botond Dénes	27a0772d71	docs/debugging.md: extend section on relocatable binaries Currently the section on "Debugging coredumps" only briefly mentions relocatable binaries, then starts with an extensive subsection on how to open cores generated by non-relocatable binaries. There is a subsection about relocatable binaries, but it just contains some out-of-date workaround without any context. In this patch we completely replace this outdated and not very useful subsection on relocatable binaries, with a much more extensive one, documenting step-by-step a procedure that is known to work. Also, this subsection is moved above the non-relocatable one. All our current releases except for 2019.1 use relocatable binaries, so the subsection about these should be the more prominent one. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200630145655.159926-1-bdenes@scylladb.com>	2020-06-30 18:35:36 +03:00
Avi Kivity	40f722f4c7	Update seastar submodule * seastar 11e86172ba...5db34ea8d3 (7): > scollectd: Avoid a deprecated warning > prometheus: Avoid protobuf deprecated warning > Merge "Avoid abandoned futures in tests" from Rafael > Merge "make circular_buffer methods noexcept" from Benny > futures_test: Wait for future in test > iotune: Report disk IOPS instead of kernel IOPS > io_tester: Ability to add dsync option to open_file_dma	2020-06-30 17:23:11 +03:00
Takuya ASADA	5e207696d9	scylla_ntp_setup: switch to distro package Use distro API to simplify distribution detection.	2020-06-30 13:57:08 +03:00
Raphael S. Carvalho	cf352e7c14	sstables: optimize procedure that checks if a sstable needs cleanup needs_cleanup() returns true if a sstable needs cleanup. Turns out it's very slow because it iterates through all the local ranges for all sstables in the set, making its complexity: O(num_sstables * local_ranges) We can optimize it by taking into account that abstract_replication_strategy documents that get_ranges() will return a list of ranges that is sorted and non-overlapping. Compaction for cleanup already takes advantage of that when checking if a given partition can be actually purged. So needs_cleanup() can be optimized into O(num_sstables * log(local_ranges)). With num_sstables=1000, RF=3, then local_ranges=256(num_tokens)*3, it means the max # of checks performed will go from 768000 to ~9584. Fixes #6730. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200629171355.45118-2-raphaelsc@scylladb.com>	2020-06-30 12:58:43 +03:00
Raphael S. Carvalho	a9eebdc778	sstables: export needs_cleanup() May be needed elsewhere, like in an unit test. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200629171355.45118-1-raphaelsc@scylladb.com>	2020-06-30 12:58:43 +03:00
Avi Kivity	08d41ee841	Merge "Fix API snapshot details getter" from Pavel E " Recent branch introduced uncaught by regular snapshot tests issue with snapshots listing via API, this set fixes it (patch #1), cleans the indentation after the fix (#2) and polishes the way stuff is captured nearby (#3) Tests: dtest(nodetool_additional_tests) " * 'br-fix-snap-get-details' of https://github.com/xemul/scylla: api: Remove excessive capture api: Fix indentation after previous patch api: Fix wrongly captured map of snapshots	2020-06-30 12:56:00 +03:00
Botond Dénes	effa632743	scylla-gdb.py: scylla_find: use ptr to start of object to lookup vptr And not the pointer to the offset where the searched-for value was found in the object. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200625081242.486929-1-bdenes@scylladb.com>	2020-06-30 12:54:18 +03:00
Raphael S. Carvalho	68e12bd17e	sstables: sstable_directory: place debug message in logger this message, intended for debugging purposes, is not going through the logger. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200629184642.53348-1-raphaelsc@scylladb.com>	2020-06-30 12:47:17 +03:00
Benny Halevy	7dc3ce4994	init: init_ms_fd_gossiper: use logger for error message Currently fmt::print is used to print an error message if (broadcast_address != listen && seeds.count(listen)) and the logger should be used instead. While at it, the information printed in this message is valueable also in the error-free case, so this change logs it at `info` level and then logs an error without repeating the said info. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Test: bootstrap_test.py:TestBootstrap.start_stop_test_node(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200630083826.153326-1-bhalevy@scylladb.com>	2020-06-30 12:46:44 +03:00
Avi Kivity	5125e9b51d	Merge "Avoid varidic futures" from Rafael " These patches removes the last few uses of variadic futures in scylla. " * 'espindola/no-variadic-future' of https://github.com/espindola/scylla: row level repair: Don't return a variadic future from get_sink_source row level repair: Don't return a variadic future from read_rows_from_disk messaging_service: Don't return variadic futures from make_sink_and_source_for_* cql3: Don't use variadic futures in select_statement	2020-06-30 12:45:37 +03:00
Avi Kivity	293d4117c1	Merge "Initial cleanup work post off-strategy" from Raphael " Offstrategy work, on boot and refresh, guarantees that a shared SSTable will not reach the table whatsoever. We have lots of extra code in table to make it able to live with those shared SSTables. Now we can fortunately get rid of all that code. tests: mode(dev). also manually tested it by triggering resharding both on boot/refresh. " * 'cleanup_post_offstrategy_v2' of https://github.com/raphaelsc/scylla: distributed_loader: kill unused invoke_shards_with_ptr() sstables:: kill unused sstables::sstable_open_info sstables: kill unused sstable::load_shared_components() distributed_loader: remove declaration of inexistent do_populate_column_family() table: simplify table::discard_sstables() table: simplify add_sstable() table: simplify update_stats_for_new_sstable() table: remove unused open_sstable function distributed_loader: remove unused code table: no longer keep track of sstables that need resharding table: Remove unused functions no longer used by resharding table: remove sstable::shared() condition from backlog tracker add/remove functions table: No longer accept a shared SSTable	2020-06-30 12:42:34 +03:00
Tomasz Grabiec	8bd7359d93	Merge "lwt: introduce LWT flag in prepared statement metadata" from Pavel This patch set adds a few new features in order to fix issue The list of changes is briefly as follows: - Add a new `LWT` flag to `cql3::prepared_metadata`, which allows clients to clearly distinguish betwen lwt and non-lwt statements without need to execute some custom parsing logic (e.g. parsing the prepared query with regular expressions), which is obviously quite fragile. - Introduce the negotiation procedure for cql protocol extensions. This is done via `cql_protocol_extension` enum and is expected to have an appropriate mirroring implementation on the client driver side in order to work properly. - Implmenent a `LWT_ADD_METADATA_MARK` cql feature on top of the aforementioned algorithm to make the feature negotiable and use it conditionally (iff both server and client agrees with each other on the set of cql extensions). The feature is meant to be further utilized by client drivers to use primary replicas consistently when dealing with conditional statements. * git@github.com:ManManson/scylla feature/lwt_prepared_meta_flag_2: lwt: introduce "LWT" flag in prepared statement metadata transport: introduce `cql_protocol_extension` enum and cql protocol extensions negotiation	2020-06-30 12:40:19 +03:00
Takuya ASADA	eb405f0908	scylla_util.py: stop using /etc/os-release, use distro Currently we we mistakenly made two different way to detect distribution, directly reading /etc/os-release and use distro package. distro package provides well abstracted APIs and still have full access to os-release informations, we should switch to it. Fixes #6691	2020-06-30 12:40:19 +03:00
Asias He	9abaf9bc2e	boot_strapper: Ignore node to be replaced explicitly as stream source After commit `7d86a3b208` (storage_service: Make replacing node take writes), during replace operation, tokens in _token_metadata for node being replaced are updated only after the replace operation is finished. As a result, in range_streamer::add_ranges, the node being replaced will be considered as a source to stream data from. Before commit `7d86a3b208`, the node being replaced will not be considered as a source node because it is already replaced by the replacing node before the replace operation is finished. This is the reason why it works in the past. To fix, filter out the node being replaced as a source node explicitly. Tests: replace_first_boot_test and replace_stopped_node_test Backports: 4.1 Fixes: #6728	2020-06-30 12:40:19 +03:00
Rafael Ávila de Espíndola	3964b1a551	row level repair: Don't return a variadic future from get_sink_source	2020-06-29 16:51:41 -07:00
Rafael Ávila de Espíndola	eeee63a9a3	row level repair: Don't return a variadic future from read_rows_from_disk	2020-06-29 16:51:10 -07:00
Rafael Ávila de Espíndola	af44684418	messaging_service: Don't return variadic futures from make_sink_and_source_for_*	2020-06-29 16:50:45 -07:00
Rafael Ávila de Espíndola	abb36cc7d1	cql3: Don't use variadic futures in select_statement	2020-06-29 16:49:41 -07:00
Raphael S. Carvalho	18880af9ad	distributed_loader: kill unused invoke_shards_with_ptr() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:23:50 -03:00
Raphael S. Carvalho	593c1e00c8	sstables:: kill unused sstables::sstable_open_info Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:23:48 -03:00
Raphael S. Carvalho	c7ba495691	sstables: kill unused sstable::load_shared_components() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:23:45 -03:00
Raphael S. Carvalho	4683cb06c2	distributed_loader: remove declaration of inexistent do_populate_column_family() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:23:42 -03:00
Raphael S. Carvalho	1e9c5b5295	table: simplify table::discard_sstables() no longer need to have any special code for shared SSTables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:23:40 -03:00
Raphael S. Carvalho	ce210a4420	table: simplify add_sstable() get_shards_for_this_sstable() can be called inside table::add_sstable() because the shards for a sstable is precomputed and so completely exception safe. We want a central point for checking that table will no longer added shared SSTables to its sstable set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:23:32 -03:00
Raphael S. Carvalho	68b527f100	table: simplify update_stats_for_new_sstable() no longer need to conditionally track the SSTable metadata, as table will no longer accept shared SSTables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:22:04 -03:00
Raphael S. Carvalho	607c74dc95	table: remove unused open_sstable function Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:22:00 -03:00
Raphael S. Carvalho	6dfeb107ae	distributed_loader: remove unused code Remove code no longer used by population procedure. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:21:40 -03:00
Raphael S. Carvalho	60467a7e36	table: no longer keep track of sstables that need resharding Now that table will no longer accept shared SSTables, it no longer needs to keep track of them. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:21:38 -03:00
Raphael S. Carvalho	cd548c6304	table: Remove unused functions no longer used by resharding Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:21:36 -03:00
Raphael S. Carvalho	68a4739a42	table: remove sstable::shared() condition from backlog tracker add/remove functions Now that table no longer accept shared SSTables, those two functions can be simplified by removing the shared condition. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:21:34 -03:00
Raphael S. Carvalho	343efe797d	table: No longer accept a shared SSTable With off-strategy work on reshard on boot and refresh, table no longer needs to work with Shared SSTables. That will unlock a host of cleanups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-06-29 14:21:04 -03:00
Pavel Emelyanov	d0d2da6ccb	api: Remove excessive capture The "result" in this lambda is already not used and can be removed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-29 19:08:59 +03:00
Pavel Emelyanov	4f5ffa980d	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-29 19:08:59 +03:00
Pavel Emelyanov	d99969e0e0	api: Fix wrongly captured map of snapshots The results of get_snapshot_details() is saved in do_with, then is captured on the json callback by reference, then the do_with's future returns, so by the time callback is called the map is already free and empty. Fix by capturing the result directly on the callback. Fixes recently merged `b6086526`. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-29 19:08:21 +03:00
Pavel Solodovnikov	6c6f3dbe42	lwt: introduce "LWT" flag in prepared statement metadata This patch adds a new `LWT` flag to `cql3::prepared_metadata`. That allows clients to clearly distinguish betwen lwt and non-lwt statements without need to execute some custom parsing logic (e.g. parsing the prepared query with regular expressions), which is obviously quite fragile. The feature is meant to be further utilized by client drivers to use primary replicas consistently when dealing with conditional statements. Whether to use lwt optimization flag or not is handled by negotiation procedure between scylla server and client library via SUPPORTED/STARTUP messages (`LWT_ADD_METADATA_MARK` extension). Tests: unit(dev, debug), manual testing with modified scylla/gocql driver Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-06-29 12:30:37 +03:00
Nadav Har'El	23ce6864a3	alternator test: ProjectionExpression test for BatchGetItem The tests in test_projection_expression.py test that ProjectionExpression works - including attribute paths - for the GetItem, Query and Scan operations. There is a fourth read operation - BatchGetItem, and it supports ProjectionExpression too. We tested BatchGetItem + ProjectionExpression in test_batch.py, but this only tests the basic feature, with top-level attributes, and we were missing a test for nested document paths. This patch adds such a test. It is still xfailing on Alternator (and passing on DynamoDB), because attribute paths are still not supported (this is issue #5024). Refs #5024. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200629063244.287571-1-nyh@scylladb.com>	2020-06-29 08:51:05 +02:00
Nadav Har'El	b6fdd956bd	alternator test: ProjectionExpression tests for document paths This patch adds three more tests for the ProjectionExpression parameter of GetItem. They are tests for nested document paths like a.b[2].c. We don't support nested paths in Alternator yet (this is issue #5024), so the new tests all xfail (and pass on DynamoDB). We already had similar tests for UpdateExpression, which also needs to support document paths, but the tests were missing for ProjectionExpression. I am planning to start the implementation of document paths with ProjectionExpression (which is the simplest use of document paths), so I want the tests for this expression to be as complete as possible. Refs #5024. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200628213208.275050-1-nyh@scylladb.com>	2020-06-29 08:50:55 +02:00
Avi Kivity	509442b128	Merge "Move snapshot code from storage_service into independent component" from Pavel E " The snapshotting code is already well isolated from the rest of the storage_service, so it's relatively easy to move it into independent component, thus de-bloating the storage_service. As a side effect this allows painless removal of calls to global get_storage_service() from schema::describe code. Test: unit(debug), dtest.snapshot_test(dev), manual start-stop " * 'br-snapshot-controller-4' of https://github.com/xemul/scylla: snap: Get rid of storage_service reference in schema.cc main: Stop http server snapshot: Make check_snapshot_not_exist a method snapshots: Move ops gate from storage_service snapshot: Move lock from storage_service snapshot: Move all code into db::snapshot_ctl class storage_service: Move all snapshot code into snapshot-ctl.cc snapshots: Initial skeleton snapshots: Properly shutdown API endpoints api: Rewrap set_server_snapshot lambda	2020-06-28 13:17:32 +03:00
Takuya ASADA	a77882f075	scylla_setup: don't print prompt message multiple times when disk list passed When comma sepalated disk list passed to RAID prompt, it show up prompt message multiple times. It should always just one time. Fixes #6724	2020-06-28 12:19:22 +03:00
Takuya ASADA	835e76fdfc	scylla_setup: don't add same disk device twice We shouldn't accept adding same disk twice for RAID prompt. Fixes #6711	2020-06-28 12:19:22 +03:00
Botond Dénes	e31f7316c0	mutation_reader: evictable_reader: add assert against pause handle leak We are currently investigating a segmentation fault, which is suspected to be caused by a leaked pause handle. Although according to the latest theory the handle leak is not the root cause of the issue, just a symptom, its better to catch any bugs that would cause a handle leaking at the act, and not later when some side-effect causes a segfault. Refs: #6613 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200625153729.522811-1-bdenes@scylladb.com>	2020-06-28 12:08:25 +03:00
Avi Kivity	3e2eeec83a	Merge "Fix handling of decimals with negative scales" from Rafael " Before this series scylla would effectively infinite loop when, for example, casting a decimal with a negative scale to float. Fixes #6720 " * 'espindola/fix-decimal-issue' of https://github.com/espindola/scylla: big_decimal: Add a test for a corner case big_decimal: Correctly handle negative scales big_decimal: Add a as_rational member function big_decimal: Move constructors out of line	2020-06-28 12:06:35 +03:00
Dejan Mircevski	2d91e5f6a0	configure.py: Drop unused var cassandra_interface The variable doesn't appear to be used anywhere. Tests: manually run configure.py Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-06-27 21:20:05 +03:00
Dejan Mircevski	65030f1406	configure.py: Update gcc version check As HACKING.md suggests, we now require gcc version >= 10. Set the minimum at 10.1.1, as that is the first official 10 release: https://gcc.gnu.org/releases.html Tests: manually run configure.py and ensure it passes/fails appropriately. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-06-27 21:19:00 +03:00
Dejan Mircevski	a12bbef980	README.md: dedupe "offers offers" The word "offers" was inadvertently repeated in a sentence. Signed-off-by: Dejan Mircevski <github@mircevski.com>	2020-06-27 21:17:33 +03:00
Pavel Emelyanov	f045cec586	snap: Get rid of storage_service reference in schema.cc Now when the snapshot stopping is correctly handled, we may pull the database reference all the way down to the schema::describe(). One tricky place is in table::napshot() -- the local db reference is pulled through an smp::submit_to call, but thanks to the shard checks in the place where it is needed the db is still "local" Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 20:28:25 +03:00
Pavel Emelyanov	8d2e05778c	main: Stop http server Currently it's not stopped at all, so calling a REST request shutdown-time may crash things at random places. Fixes: #5702 But it's not the end of the story. Since the server stays up while we are shutting things down, each subsystem should carefully handle the cases when it's half-down, but a request comes. A better solution is to unregister rest verbs eventually, but httpd's rules cannot do it now. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 20:27:28 +03:00
Pavel Emelyanov	9211df2cdf	snapshot: Make check_snapshot_not_exist a method Sanitation. It now can access the this->_db pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 20:26:15 +03:00
Pavel Emelyanov	ba47ef0397	snapshots: Move ops gate from storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 20:17:21 +03:00
Pavel Emelyanov	e439873319	snapshot: Move lock from storage_service For this de-static run_snapshot_*_operation (because we no longer have the static global to get the lock from) and make the snapshot_ctl be peering_sharded_service to call invoke_on. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 20:17:19 +03:00
Pavel Emelyanov	d674baacef	snapshot: Move all code into db::snapshot_ctl class This includes - rename namespace in snapshot-ctl.[cc\|hh] - move methods from storage_service to snapshot_ctl - move snapshot_details struct - temporarily make storage_service._snapshot_lock and ._snapshot_ops public - replace two get_local_storage_service() occurrences with this._db The latter is not 100% clear as the code that does this references "this" from another shard, but the _db in question is the distributed object, so they are all the same on all instances. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 19:59:53 +03:00
Pavel Emelyanov	8d36607044	storage_service: Move all snapshot code into snapshot-ctl.cc This is plain move, no other modifications are made, even the "service" namespace is kept, only few broken indentation fixes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 19:54:15 +03:00
Pavel Emelyanov	d989d9c1c7	snapshots: Initial skeleton A placeholder for snapshotting code that will be moved into it from the storage_service. Also -- pass it through the API for future use. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 19:54:14 +03:00
Pavel Emelyanov	9a8a1635b7	snapshots: Properly shutdown API endpoints Now with the seastar httpd routes unset() at hands we can shut down individual API endpoints. Do this for snapshot calls, this will make snapshot controller stop safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 17:27:45 +03:00
Pavel Emelyanov	b608652622	api: Rewrap set_server_snapshot lambda The lambda calls the core snapshot method deep inside the json marshalling callback. This will bring problems with stopping the snapshot controller in the next patches. To prepare for this -- call the .get_snapshot_details() first, then keep the result in do_with() context. This change doesn't affect the issue the lambde in question is about to solve as the whole result set is anyway kept in memory while being streamed outside. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-26 17:27:45 +03:00
Dejan Mircevski	0688f5c3f9	cql3/restrictions: Create expression objects Add expression as a member of restriction. Create or update expression everywhere restrictions are created or updated. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-06-26 09:19:36 -04:00
Dejan Mircevski	d33053b841	cql3/restrictions: Add free functions over new classes These functions will replace class methods from the existing restriction classes. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-06-26 09:19:36 -04:00
Dejan Mircevski	1d66b33325	cql3/restrictions: Add new representation These new classes will replace the existing restrictions hierarchy. Instead of having member functions, they expose their data publicly, for future free functions to operate on. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-06-26 09:19:36 -04:00
Rafael Ávila de Espíndola	85bb7ff743	big_decimal: Add a test for a corner case This behavior is different from cassandra, but without arithmetic operations it doesn't seem possible to notice the difference from CQL. Using avg produces the same results, since we use an initial value of 0 (scale = 0). Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-25 15:37:23 -07:00
Rafael Ávila de Espíndola	684f32c862	big_decimal: Correctly handle negative scales A negative scale was being passed an a positive value to boost::multiprecision::pow, which would never finish. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-25 15:34:10 -07:00
Rafael Ávila de Espíndola	bac0f3a9ee	big_decimal: Add a as_rational member function This just refactors some duplicated code so that it can be fixed in one place. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-25 15:33:31 -07:00
Rafael Ávila de Espíndola	77725ce1a4	big_decimal: Move constructors out of line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-25 15:33:01 -07:00
Benny Halevy	a843945115	comapction: restore % in compaction completion message The % sign fell off in `c4841fa735` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200625151352.736561-1-bhalevy@scylladb.com>	2020-06-25 18:11:59 +02:00
Avi Kivity	e5be3352cf	database, streaming, messaging: drop streaming memtables Before Scylla 3.0, we used to send streaming mutations using individual RPC requests and flush them together using dedicated streaming memtables. This mechanism is no longer in use and all versions that use it have long reached end-of-life. Remove this code.	2020-06-25 15:25:54 +02:00
Raphael S. Carvalho	b17d20b5f4	reshape: LCS: avoid unnecessary work on level 0 No need to sort level 0 as we only check if levels > 0 are disjoint. Also taking the opportunity to avoid copies when sorting. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200624151921.20160-1-raphaelsc@scylladb.com>	2020-06-24 18:27:22 +03:00
Rafael Ávila de Espíndola	67c22c8697	commitlog::read_log_file: Don't discard a future This makes the code a bit easier to read as there are no discarded futures and no references to having to keep a subscription alive, which we don't with current seastar. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200527013120.179763-1-espindola@scylladb.com>	2020-06-24 17:22:29 +03:00
Botond Dénes	5ff6ac52b2	scylla-gdb.py: collection element func: accept references and pointers to collections Add support to references (both lvalue and rvalue) and pointers to collections as well, in addition to plain values. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200624101305.428925-1-bdenes@scylladb.com>	2020-06-24 13:31:18 +03:00
Avi Kivity	a9c7a1a86c	Merge "repair: row_level: prevent deadlocks when repairing homogenous nodes" from Botond " Row level repair, when using a local reader, is prone to deadlocking on the streaming reader concurrency semaphore. This has been observed to happen with at least two participating nodes, running more concurrent repairs than the maximum allowed amount of reads by the concurrency semaphore. In this situation, it is possible that two repair instances, competing for the last available permits on both nodes, get a permit on one of the nodes and get queued on the other one respectively. As neither will let go of the permit it already acquired, nor give up waiting on the failed-to-acquired permit, a deadlock happens. To prevent this, we make the local repair reader evictable. For this we reuse the already existing evictable reader mechanism of the multishard combining reader. This patchset refactors this evictable reader mechanism into a standalone flat mutation reader, then exposes it to the outside world. The repair reader is paused after the repair buffer is filled, which is currently 32MB, so the cost of a possible reader recreation is amortized over 32MB read. The repair reader is said to be local, when it can use the shard-local partitioner. This is the case if the participating nodes are homogenous (their shard configuration is identical), that is the repair instance has to read just from one shard. A non-local reader uses the multishard reader, which already makes its shard readers evictable and hence is not prone to the deadlock described here. Fixes: #6272 Tests: unit(dev, release, debug) " * 'repair-row-level-evictable-local-reader/v3' of https://github.com/denesb/scylla: repair: row_level: destroy reader on EOS or error repair: row_level: use evictable_reader for local reads mutation_reader: expose evictable_reader mutation_reader: evictable_reader: add auto_pause flag mutation_reader: make evictable_reader a flat_mutation_reader mutation_reader: s/inactive_shard_read/inactive_evictable_reader/ mutation_reader: move inactive_shard_reader code up mutation_reader: fix indentation mutation_reader: shard_reader: extract remote_reader as evictable_reader mutation_reader: reader_lifecycle_policy: make semaphore() available early	2020-06-24 12:55:34 +03:00
Piotr Sarna	c2939c67b2	test: add a case for local altering of distributed tables Local altering, which does not propagate the change to other nodes, should not be allowed for a non-local table. Refs #6700 Message-Id: <34a2b191c0e827f296e6d720dc31bf8bda0fd160.1592990796.git.sarna@scylladb.com>	2020-06-24 12:51:41 +03:00
Piotr Sarna	835734c99d	cql3: disallow altering non-local tables with local queries The database has a mechanism of performing internal CQL queries, mainly to edit its own local tables. Unfortunately, it's easy to use the interface incorrectly - e.g. issuing an `ALTER TABLE` statement on a non-local table will result in not propagating the schema change to other nodes, which in turn leads to inconsistencies. In order to avoid such mistakes (one of them was a root cause of #6513), when an attempt to alter a distributed table via a local interface is performed, it results in an error. Tests: unit(dev) Fixes #6700 Message-Id: <61be3defb57be79f486e6067ceff4f4c965e34cb.1592990796.git.sarna@scylladb.com>	2020-06-24 12:51:40 +03:00
Raphael S. Carvalho	864eb20002	reshape: Fix reshaping procedure for LCS The function that determines if a level L, where L > 0, is disjoint, is returning false if level is disjoint. That's because it incorrectly accounts an overlapping SSTable in the level as a disjoint SSTable. So we need to inverse the logic. The side effect is that boot will always try to reshape levels greater than 0 because reshape procedure incorrectly thinks that levels are overlapping when they're actually disjoint. Fixes #6695. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200623180221.229695-1-raphaelsc@scylladb.com>	2020-06-24 12:50:19 +03:00
Avi Kivity	1398628e8a	Update seastar submodule cql3/functions/error_injection_fcts.cc adjusted for smp::invoke_on_all() now requiring nothrow move constructible functions. * seastar 7664f991b9...11e86172ba (4): > Merge "smp: make submit_to noexcept" from Benny > memory: Fix clang build > Fix a debug build with SEASTAR_TASK_BACKTRACE > manual_clock: Add missing includes	2020-06-24 12:49:50 +03:00
Botond Dénes	be452b1f91	service: storage_proxy: log exception returned from replica with more context Currently the message only mentions the endpoint and the error message returned from the replica. Add the keyspace and table to this message to provide more context. This should help investigations of such errors greatly, as in the case of tests where there is usually a single table, we can already guess what exactly is timing out based on this. We should add even more context, like the kind of the query (single partition or range scan) but this information is not readily available in the surrounding scope so this patch defers it. Refs: #6548 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200624054647.413256-1-bdenes@scylladb.com>	2020-06-24 11:30:37 +03:00
Piotr Sarna	df91e9a4c7	alternator: clean up string view conversions Manual translation from JSON to string_view is replaced with rjson::to_string_view helper function. In one place, a redundant string_view intermediary is removed in favor of creating the string straight from JSON. Message-Id: <2aa9d9fedd73f14b7640870d14db4f2f0bd7bd8a.1592936139.git.sarna@scylladb.com>	2020-06-23 21:45:27 +03:00
Piotr Sarna	4558401aee	alternator: drop using global migration manager As part of "war on globals", the unneeded usage of global migration manager instance is dropped. Message-Id: <c9b2fab57e62185daa2441458f9a3a5e7e0a3908.1592936139.git.sarna@scylladb.com>	2020-06-23 21:43:57 +03:00
Piotr Sarna	f4e8cfe03b	alternator: fix propagating tags Updating tags was erroneously done locally, which means that the schema change was not propagated to other nodes. The new code announces new schema globally. Fixes #6513 Branches: 4.0,4.1 Tests: unit(dev) dtest(alternator_tests.AlternatorTest.test_update_condition_expression_and_write_isolation) Message-Id: <3a816c4ecc33c03af4f36e51b11f195c231e7ce1.1592935039.git.sarna@scylladb.com>	2020-06-23 21:27:55 +03:00
Botond Dénes	fbbc86e18c	repair: row_level: destroy reader on EOS or error To avoid having to make it an optional with all the additional checks, we just replace it with an empty reader instead, this also also achieves the desired effect of releasing the read permit and all the associated resources early.	2020-06-23 21:08:21 +03:00
Botond Dénes	080f00b99a	repair: row_level: use evictable_reader for local reads Row level repair, when using a local reader, is prone to deadlocking on the streaming reader concurrency semaphore. This has been observed to happen with at least two participating nodes, running more concurrent repairs than the maximum allowed amount of reads by the concurrency semaphore. In this situation, it is possible that two repair instances, competing for the last available permits on both nodes, get a permit on one of the nodes and get queued on the other one respectively. As neither will let go of the permit it already acquired, nor give up waiting on the failed-to-acquired permit, a deadlock happens. To prevent this, we make the local repair reader evictable. For this we reuse the newly exposed evictable reader. The repair reader is paused after the repair buffer is filled, which is currently 32MB, so the cost of a possible reader recreation is amortized over 32MB read. The repair reader is said to be local, when it can use the shard-local partitioner. This is the case if the participating nodes are homogenous (their shard configuration is identical), that is the repair instance has to read just from one shard. A non-local reader uses the multishard reader, which already makes its shard readers evictable and hence is not prone to the deadlock described here.	2020-06-23 21:08:21 +03:00
Botond Dénes	542d9c3711	mutation_reader: expose evictable_reader Expose functions for the outside world to create evictable readers. We expose two functions, which create an evictable reader with `auto_pause::yes` and `auto_pause::no` respectively. The function creating the latter also returns a handle in addition to the reader, which can be used to pause the reader.	2020-06-23 21:08:21 +03:00
Botond Dénes	1cc31deff9	mutation_reader: evictable_reader: add auto_pause flag Currently the evictable reader unconditionally pauses the underlying reader after each use (`fill_buffer()` or `fast_forward_to()` call). This is fine for current users (the multishard reader), but the future user we are doing all this refactoring for -- repair -- will want to control when the underlying reader is paused "manually". Both these behaviours can easily be supported in a single implementation, so we add an `auto_pause` flag to allow the creator of the evictable reader to control this.	2020-06-23 21:08:21 +03:00
Botond Dénes	af9e1c23e1	mutation_reader: make evictable_reader a flat_mutation_reader The `evictable_reader` class is almost a proper flat mutation reader already, it roughly offers the same interface. This patch makes this formal: changing the class to inherit from `flat_mutation_reader::impl`, and implement all virtual methods. This also entails a departure from using the lifecycle policy to pause/resume and create readers, instead using more general building blocks like the reader concurrency semaphore and a mutation source.	2020-06-23 21:08:21 +03:00
Rafael Ávila de Espíndola	64c8164e6c	everywhere: Update to seastar api v4 (when_all_succeed returning a tuple) We now just need to replace a few calls to then with then_unpack. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200618172100.111147-1-espindola@scylladb.com>	2020-06-23 19:40:18 +03:00
Raphael S. Carvalho	47f63d021a	sstables/sstable_directory: improve log message in reshape() We were blind about the table which needed reshape and its compaction strategy, so let's improve log message. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200622192502.187532-4-raphaelsc@scylladb.com>	2020-06-23 19:40:18 +03:00
Raphael S. Carvalho	39f96a5572	distributed_loader: Don't mutate levels to zero when populating column family Unlike refresh on upload dir, column family population shouldn't mutate level of SSTables to level 0. Otherwise, LCS will have to regenerate all levels by rewriting the data multiple times, hurting a lot the write amplification and consequently the node performance. That's also affecting the time for a node to boot because reshape may be triggered as a result of this. Refs #6695. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200622192502.187532-2-raphaelsc@scylladb.com>	2020-06-23 19:40:18 +03:00
Benny Halevy	2d7c39de88	storage_service: set_tables_autocompaction: fix not-initialized-yet logic Typo introduced in `bb07678346`, set_tables_autocompaction should reject too-early requests if !_initialized rather than if _initialized. Fixes a bunch of compaction dtests. For example: https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/530/testReport/compaction_test/TestCompaction_with_DateTieredCompactionStrategy/disable_autocompaction_twice_test/ ``` True is not false : Expected to have autocompaction disabled but got it is enabled ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Tests: - unit(dev), - compaction_test:TestCompaction_with_DateTieredCompactionStrategy.disable_autocompaction_twice_test(dev) Message-Id: <20200623151418.439534-1-bhalevy@scylladb.com>	2020-06-23 19:40:18 +03:00
Avi Kivity	c72365d862	thrift: switch csharp backend to netstd The thrift compiler (since 0.13 at least) complains that the csharp target is deprecated and recommend replacing it with netstd. Since we don't use either, humor it. I suspect that this warning caused some spurious rebuilds, but have not proven it.	2020-06-23 19:40:18 +03:00
Piotr Sarna	6d224ae131	cql3: add missing filtering stats bump In a single case of indexed queries, the filtered_rows_read_total metrics was not updated, which could result in inconsistencies between filtered_rows_read_total and filtered_rows_matched_total later. Message-Id: <9a5a741da4c6cf030329610ba8b8e340be85c8e6.1592902295.git.sarna@scylladb.com>	2020-06-23 19:40:18 +03:00
Piotr Sarna	7480015721	cql3, service: decouple cql_stats from query pagers Pager belongs to a different layer than CQL and thus should not be coupled with CQL stats - if any different frontends want to use paging, they shouldn't be forced to instantiate CQL stats at all. Same goes with CQL restrictions, but that will require much bigger refactoring, so is left for later. Message-Id: <5585eb470949e3457334ffd6dba80742abf3a631.1592902295.git.sarna@scylladb.com>	2020-06-23 19:40:18 +03:00
Nadav Har'El	428e8b5c96	docker readme: remove outdated warning In the section explaining how to build a docker image for a self-built Scylla executable, we have a warning that even if you already built Scylla, build_reloc.sh will re-run configure.py and rebuild the executable with slightly different options. The re-run of configure.py and ninja still happens (see issue #6547) but we no longer pass different options to configure.py, so the rebuild usually doesn't do anything and finishes in seconds, and the paragraph warning about the rebuild is no longer relevant. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200621093049.975044-1-nyh@scylladb.com>	2020-06-23 19:40:18 +03:00
Avi Kivity	8d67537178	Update seastar submodule * seastar a6c8105443...7664f991b9 (13): > gate: add try_enter and try_with_gate > Merge "Manage reference counts in the file API" from Rafael > cmake: Refactor a bit of duplicated code > stream: Delete _sub > future: Add a rethrow_exception to future_state_base > future: Use a new seastar::nested_exception in finally > cmake: only apply C++ compile options to C++ language > testing: Enable fail-on-abandoned-failed-futures by default > future: Correct a few hypercorrect uses of std::forward > futures_test: Test using future::then with functions > Merge "io-queue: A set of cleanups collected so far" from Pavel E > tmp_file: Replace futurize_apply with futurize_invoke > future: Replace promise::set_coroutine with forward_state_and_schedule Contains update to tests from Rafael: tests: Update for fail-on-abandoned-failed-futures's new default This depends on the corresponding change in seastar. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-23 19:39:54 +03:00
Botond Dénes	4485864ada	mutation_reader: s/inactive_shard_read/inactive_evictable_reader/ Rename `inactive_shard_read` to `inactive_evictable_reader` to reflect that the fact that the evictable reader is going to be of general use, not specific to the multishard reader.	2020-06-23 10:01:38 +03:00
Botond Dénes	b6ed054c08	mutation_reader: move inactive_shard_reader code up It will be used by the `evictable_reader` code too in the next patches.	2020-06-23 10:01:38 +03:00
Botond Dénes	e3ea1c9080	mutation_reader: fix indentation Deferred from the previous patch.	2020-06-23 10:01:38 +03:00
Botond Dénes	f9d1916499	mutation_reader: shard_reader: extract remote_reader as evictable_reader We want to make the evictable reader mechanism used in the multishard reader pipeline available for general (re)use, as a standalone flat mutation reader implementation. The first step is extracting `shard_reader::remote_reader` the class implementing this logic into a top-level class, also renamed to `evictable_reader`.	2020-06-23 10:01:38 +03:00
Botond Dénes	63309f925c	mutation_reader: reader_lifecycle_policy: make semaphore() available early Currently all reader lifecycle policy implementations assume that `semaphore()` will only be called after at least one call to `make_reader()`. This assumption will soon not hold, so make sure `semaphore()` can be called at any time, including before any calls are made to `make_reader()`.	2020-06-23 10:01:38 +03:00
Raphael S. Carvalho	9033fa82d7	compaction: Reduce boilerplate to create new compaction type Run id and compaction type can now be figured out from the base class. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200622160645.177707-1-raphaelsc@scylladb.com>	2020-06-22 20:27:57 +02:00
Takuya ASADA	2d25697873	scylla_swap_setup: fix systemd-escape path On Ubuntu 18.04 and ealier & Deiban 10 and ealier, /usr merge is not done, so /usr/bin/systemd-escape and /bin/systemd-escape is different place, and we call /usr/bin but Debian variants tries to install the command in /bin. Drop full path, just call command name and resolve by default PATH. Fixes: #6650	2020-06-22 17:42:06 +03:00
Raphael S. Carvalho	2a171ee470	reshape: LCS: fix the target level of reshaping job LCS reshape job may pick a wrong level because we iterate through levels from index 1 and stop the iteration as soon as the current level is NOT disjoint, so it happens that we never reach the upper levels, meaning the level of the first NOT disjoint level is used, and not the actual maximum filled level. That's fixed by doing the iteration in the inverse order. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200618154112.8335-1-raphaelsc@scylladb.com>	2020-06-22 16:40:57 +03:00
Avi Kivity	de38091827	priority_manager: merge streaming_read and streaming_write classes into one class Streaming is handled by just once group for CPU scheduling, so separating it into read and write classes for I/O is artificial, and inflates the resources we allow for streaming if both reads and writes happen at the same time. Merge both classes into one class ("streaming") and adjust callers. The merged class has 200 shares, so it reduces streaming bandwidth if both directions are active at the same time (which is rare; I think it only happens in view building).	2020-06-22 15:09:04 +03:00
Takuya ASADA	9e51acec1f	reloc: simplified .deb build process We don't really need to have two build_deb.sh, merge it to reloc.	2020-06-22 14:03:13 +03:00
Takuya ASADA	67c0439c7d	reloc: simplified .rpm build process We don't really need to have two build_rpm.sh, merge it to reloc.	2020-06-22 14:03:13 +03:00
Takuya ASADA	90e28c5fcf	scylla_raid_setup: daemon-reload after mounts.conf installed systemd requires daemon-reload after adding drop-in file, so we need to do that after writing mounts.conf. Fixes #6674	2020-06-22 14:03:13 +03:00
Takuya ASADA	d6165bc1c3	dist/debian/python3: drop dependency on pystache Same as `287d6e5`, we need to drop pystache from package build script since Fedora 32 dropped it.	2020-06-22 14:03:13 +03:00
Juliusz Stasiewicz	a35b71c247	cdc: Handling of timeout/unavailable exceptions in streams fetching Retrying the operation of fetching generations not always makes sense. In this patch only the lightest exceptions (timeout and unavailable) trigger retrying, while the heavy, unrecoverable ones abort the operation and get logged on ERROR level. Fixes #6557	2020-06-22 14:03:13 +03:00
Raphael S. Carvalho	52180f91d4	compaction: Fix the 2x disk space requirement in SSTable upgrade SSTable upgrade is requiring 2x the space of input SSTables because we aren't releasing references of the SSTables that were already upgraded. So if we're upgrading 1TB, it means that up to 2TB may be required for the upgrade operation to succeed. That can be fixed by moving all input SSTables when rewrite_sstables() asks for the set of SSTables to be compacted, so allowing their space to be released as soon as there is no longer any ref to them. Spotted while auditting code. Fixes #6682. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200619205701.92891-1-raphaelsc@scylladb.com>	2020-06-22 14:03:13 +03:00
Rafael Ávila de Espíndola	a67f5b2de1	sstable_3_x_test: Call await_background_jobs on every test Now every tests starts by deferring a call to await_background_jobs. That can be verified with: $ git grep -B 1 await_background test/boost/sstable_3_x_test.cc \| grep THREAD \| wc -l 90 $ git grep -A 1 SEASTAR_THREAD_TEST_CASE test/boost/sstable_3_x_test.cc \| grep await_background \| wc -l 90 Thanks to Raphael Carvalho for noticing it. Refs #6624 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200619220048.1091630-1-espindola@scylladb.com>	2020-06-22 14:03:13 +03:00
Raphael S. Carvalho	a82afa68aa	test/lib/cql_test_env: reenable auto compaction after `e40aa042a7`, auto compaction is explicitly disabled on all tables being populated and only enabled later on in the boot process. we forgot to update cql_test_env to also reenable auto compaction, so unit tests based on cql_test_env were not compacting at all. database_test, for example, was running out of file descriptors because the number kept growing unboundly due to lack of compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200618225621.15937-1-raphaelsc@scylladb.com>	2020-06-22 14:03:13 +03:00
Benny Halevy	a3918bdc96	distributed_loader: reenable verify_owner_and_mode when loading new sstables The call to `verify_owner_and_mode` from `flush_upload_dir` fell between the cracks in `b34c0c2ff6` (distributed_loader: rework uploading of SSTables). It causes https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/528/testReport/nodetool_additional_test/TestNodetool/nodetool_refresh_with_wrong_upload_modes_test/ to fail like this: ``` /Directory cannot be accessed .* write/ not found in 'Nodetool command '/jenkins/workspace/scylla-master/dtest-release/scylla/.ccm/scylla-repository/7351db7cab7bbf907172940d0bbf8b90afde90ba/scylla-tools-java/bin/nodetool -h 127.0.87.1 -p 7187 refresh -- keyspace1 standard1' failed; exit status: 1; stdout: nodetool: Scylla API server HTTP POST to URL '/storage_service/sstables/keyspace1' failed: Failed to load new sstables: std::filesystem::__cxx11::filesystem_error (error system:13, filesystem error: remove failed: Permission denied [/jenkins/workspace/scylla-master/dtest-release/scylla/.dtest/dtest-rqzo7km7/test/node1/data/keyspace1/standard1-8a57a660b29611eabf0c000000000000/upload/mc-3-big-TOC.txt]) ``` Reenable it in this patch makes the dtest pass again. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200621140439.85843-1-bhalevy@scylladb.com>	2020-06-22 14:03:13 +03:00
Benny Halevy	aa4b4311e2	configure: do not define SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION in debug mode Seastar uses the default allocator in debug mode so it can't inject allocation failures in this mode. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Test: mutation_test(debug) Message-Id: <20200621131819.72108-1-bhalevy@scylladb.com>	2020-06-22 14:03:13 +03:00
Nadav Har'El	e4eca5211a	docker: add option to start Alternator with HTTPS We already have a docker image option to enable alternator on an unencrypted port, "--alternator-port", but we forgot to also allow the similar option for enabling alternator on an encrypted (HTTPS) port: "--alternator-https-port" so this patch adds the missing option, and documents how to use it. Note that using this option is not enough. When this option is used, Alternator also requires two files, /etc/scylla/scylla.crt and /etc/scylla/scylla.key, to be inserted into the image. These files should contain the SSL certificate, and key, respectively. If these files are missing, you will get an error in the log about the missing file. Fixes #6583. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200621125219.12274-1-nyh@scylladb.com>	2020-06-22 14:03:13 +03:00
Avi Kivity	7351db7cab	Merge "Reshape upload files and reshard+reshape at boot" from Glauber " This patchset adds a reshape operation to each compaction strategy; that is a strategy-specific way of detecting if SSTables are in-strategy or off-strategy, and in case they are offstrategy moving them to in-strategy. Often times the number of SSTables in a particular slice of the sstable set matters for that decision (number of SSTables in the same time window for TWCS, number of SSTables per tier for STCS, number of L0 SSTables for LCS). We want to be more lenient for operations that keep the node offline, like reshape at boot, but more forgiving for operations like upload, which run in maintenance mode. To accomodate for that the threshold for considering a slice of the SSTable set offstrategy is passed as a parameter Once this patchset is applied, the upload directory will reshape the SSTables before moving them to the main directory (if needed). One side effect of it is that it is no longer necessary to take locks for the refresh operation nor disable writes in the table. With the infrastructure that we have built in the upload directory, we can apply the same set of steps to populate_column_family. Using the sstable_directory to scan the files we can reshard and reshape (usually if we resharded a reshape will be necessary) with the node still offline. This has the benefit of never adding shared SSTables to the table. Applying this patchset will unlock a host of cleanups: - we can get rid of all testing for shared sstables, sstable_need_rewrite, etc. - we can remove the resharding backlog tracker. and many others. Most cleanups are deferred for a later patchset, though. " * 'reshard-reshape-v4' of github.com:glommer/scylla: distributed_loader: reshard before the node is made online distributed_loader: rework uploading of SSTables sstable_directory: add helper to reshape existing unshared sstables compaction_strategy: add method to reshape SSTables compaction: add a new compaction type, Reshape compaction: add a size and throught pretty printer. compaction: add default implementation for some pure functions tests: fix fragile database tests distributed_loader.cc: add a helper function to extract the highest SSTable version found distributed_loader.cc : extract highest_generation_seen code compaction_manager: rename run_resharding_job distributed_loader: assume populate_column_families is run in shard 0 api: do not allow user to meddle with auto compaction too early upload: use custom error handler for upload directory sstable_directory: fix debug message	2020-06-18 17:04:53 +03:00
Glauber Costa	e40aa042a7	distributed_loader: reshard before the node is made online This patch moves the resharding process to use the new directory_with_sstables_handler infrastructure. There is no longer a clear reshard step, and that just becomes a natural part of populate_column_family. In main.cc, a couple of changes are necessary to make that happen. The first one obviously is to stop calling reshard. We also need to make sure that: - The compaction manager is started much earlier, so we can register resharding jobs with it. - auto compactions are disabled in the populate method, so resharding doesn't have to fight for bandwidth with auto compactions. Now that we are resharding through the sstable_directory, the old resharding code can be deleted. There is also no need to deal with the resharding backlog either, because the SSTables are not yet added to the sstable set at this point. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:37:18 -04:00
Glauber Costa	b34c0c2ff6	distributed_loader: rework uploading of SSTables Uploading of SSTables is problematic: for historical reasons it takes a lock that may have to wait for ongoing compactions to finish, then it disables writes in the table, and then it goes loading SSTables as if it knew nothing about them. With the sstable_directory infrastructure we can do much better: * we can reshard and reshape the SSTables in place, keeping the number of SSTables in check. Because this is an background process we can be fairly aggressive and set the reshape mode to strict. * we can then move the SSTables directly into the main directory. Because we know they are few in number we can call the more elegant add_sstable_and_invalidate_cache instead of the open coding currently done by load_new_sstables * we know they are not shared (if they were, we resharded them), simplifying the load process even further. The major changes after this patch is applied is that all compactions (resharding and reshape) needed to make the SSTables in-strategy are done in the streaming class, which reduces the impact of this operation on the node. When the SSTables are loaded, subsequent reads will not suffer as we will not be adding shared SSTables in potential high numbers, nor will we reshard in the compaction class. There is also no more need for a lock in the upload process so in the fast path where users are uploading a set of SSTables from a backup this should essentially be instantaneous. The lock, as well as the code to disable and enable table writes is removed. A future improvement is to bypass the staging directory too, in which case the reshaping compaction would already generate the view updates. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:37:18 -04:00
Glauber Costa	4d6aacb265	sstable_directory: add helper to reshape existing unshared sstables Before moving SSTables to the main directory, we may need to reshape them into in-strategy. This patch provides helper code that reshapes the SSTables that are known to be unshared local in the sstable directory, and updates the sstable directory with the result. Rehaping can be made more or less aggressive by passing a reshape mode (relaxed or strict), which will influence the amount of SSTables reshape can tolerate to consider a particular slice of the SSTable set offstrategy. Because the compaction expects an std::vector everywhere, we changed our chunked vector for the unshared sstables to a std::vector so we can more easily pass it around without conversions. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:37:18 -04:00
Glauber Costa	3c254dd49d	compaction_strategy: add method to reshape SSTables Some SSTable sets are considered to be off-strategy: they are in a shape that is at best not optimal and at worst adversarial to the current compaction strategy. This patch introduces the compaction strategy-specific method get_reshaping_job(). Given an SSTable set, it returns one compaction that can be done to bring the table closer to being in-strategy. The caller can then call this repeatedly until the table is fully in-strategy. As an example of how this is supposed to work, consider TWCS: some SSTables will belong to a single window -> in which case they are already in-strategy and don't need to be compacted, and others span multiple windows in which case they are considered off-strategy and have to be compacted. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:37:18 -04:00
Glauber Costa	0467bd0a94	compaction: add a new compaction type, Reshape From the point of view of selecting SSTables and its expected output, Reshaping really is just a normal compaction. However, there are some key differences that we would like to uphold: - Reshaping is done separately from the main SSTable set. It can be done with the node offline, or it can be done in a separate priority class. Either way, we don't want those SSTables to count towards backlog. For reads, because the SSTables are not yet registered in the backlog tracker (if offline or coming from upload), if we were to deduct compaction charges from it we would go negative. For writes, we don't want to deal with backlog management here because we will add the SSTable at once when reshaping is finished. - We don't need to do early replacements. - We would like to clearly mark the Reshaping compactions as such in the logs For the reasons above, it is nicer to add a new Reshape compaction type, a subclass of compaction, that upholds such properties. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:37:18 -04:00
Glauber Costa	c4841fa735	compaction: add a size and throught pretty printer. This is so we don't always use MB. Sometimes it is best to report GB, TB, and their equivalent throughput metrics. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:37:18 -04:00
Takuya ASADA	76af112c7d	scylla_swap_setup: don't create sparse file fallocate creates sparse file, XFS rejects such file for swapfile: https://bugzilla.redhat.com/show_bug.cgi?id=1129205 Use dd instead. Fixes #6650	2020-06-18 16:26:17 +03:00
Avi Kivity	7129662edb	Update seastar submodule * seastar b515d63735...a6c8105443 (15): > Merge "Move thread_wake_task out of line" from Rafael > future: Fix result_of_apply instantiation > future: Move the function in then/then_wrapped only once > io-queue: Dont leak desc > fair-queue: Keep request queues self-consistent > app: Do not coredump on missing options > future: promise: mark set_value as noexcept > future: future_state: mark set as noexcept > fair_queue_perf: Remove unused captures > file_io_test: Add missing override > Merge "tmp_dir: handle remove failure in do_with_thread" from Benny > api-level: Add missing api_v4 namespace > future: Fix CanApplyTuple > http: use logger instead of stderr for erro reporting > sstring: Generalize make_sstring a bit	2020-06-18 16:16:05 +03:00
Glauber Costa	ef85a2cec5	compaction: add default implementation for some pure functions There are some functions that are today pure that have an obvious implementation (for example on_new_partition, do nothing). We'll add default implementations to the compaction class, which reduces the boilerplate needed to add a new compaction type. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:00:28 -04:00
Glauber Costa	96abf80c5e	tests: fix fragile database tests This test wants to make sure that an SSTable with generation number 4, which is incomplete, gets deleted. While that works today, the way the test verifies that is fragile because new SSTables can and will be created, especially in the local directory that sees a lot of activity on startup. It works if generations don't go that far, but with SMP, even a single SSTable in the right shard can end up having generation 4. In practice this isn't an issue today because the code calls cf.update_sstables_known_generation() as soon as it sees a file, before deciding whether or not the file has to be deleted. However this behavior is not guaranteed and is changing. The best way to fix this would be to check if the file is the same, including its inode. But given that this is just a unit test (which is almost always if not always single node), I am just moving to use the peers table instead. Again, we could have created a user table, but it's just not worth the hassle. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:00:28 -04:00
Glauber Costa	072d0d3073	distributed_loader.cc: add a helper function to extract the highest SSTable version found Using a map reduce in a shared sstable directory, finds the highest version seen across all shards. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:00:28 -04:00
Glauber Costa	baa82b3a26	distributed_loader.cc : extract highest_generation_seen code We'll use it in one more other location so extract it to common code. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:00:27 -04:00
Glauber Costa	9902af894a	compaction_manager: rename run_resharding_job It will be used to run any custom job where the caller provides a function. One such example is indeed resharding, but reshaping SSTables can also fall here. The semaphore is also renamed, and we'll allow only one custom job at a time (across all possible types). We also remove the assumption of the scheduling group. The caller has to have already placed the code in the correct CPU scheduling group. The I/O priority class comes from the descriptor. To make sure that we don't regress, we wrap the entire reshard-at-boot code in the compaction class. Currently the setup would be done in the main group, and the actual resharding in the compaction group. Note that this is temporary, as this code is about to change. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:00:27 -04:00
Glauber Costa	45f3bc679e	distributed_loader: assume populate_column_families is run in shard 0 This is already the case, since main.cc calls it from shard 0 and relies on it to spread the information to the other shards. We will turn this branch - which is always taken - into an assert for the sake of future-proofing and soon add even more code that relies on this being executed in shard 0. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:00:27 -04:00
Glauber Costa	bb07678346	api: do not allow user to meddle with auto compaction too early We are about to use the auto compaction property during the populate/reshard process. If the user toggles it, the database can be left in a bad state. There should be no reason why a user would want to set that up this early. So we'll disallow it. To do that property, it is better if the check of whether or not the storage service is ready to accomodate this request is local to the storage service itself. We then move the logic of set_tables_autocompaction from api to the storage service. The API layer now merely translates the table names and pass it along. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:00:25 -04:00
Pekka Enberg	3d128f5b51	scripts: Rename sync-submodules.sh to refresh-submodules.sh Rename the script as per Nadav's suggestion and update documentation within the script. Message-Id: <20200618123446.32496-1-penberg@scylladb.com>	2020-06-18 15:39:23 +03:00
Rafael Ávila de Espíndola	f6e407ecd2	everywhere: Prepare for seastar api v4 (when_all_succeed return value) The seastar api v4 changes the return type of when_all_succeed. This patch adds discard_result when that is best solution to handle the change. This doesn't do the actual update to v4 since there are still a few issues left to fix in seastar. A patch doing just the update will follow. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200617233150.918110-1-espindola@scylladb.com>	2020-06-18 15:13:56 +03:00
Amnon Heiman	bc854342e7	approx_exponential_histogram: Makes the implementation clearer This patch aim to make the implementation and usage of the approx_exponential_histogram clearer. The approx_exponential_histogram Uses a combination of Min, Max, Precision and number of buckets where the user needs to pick 3. Most of the changes in the patch are about documenting the class and method, but following the review there are two functionality changes: 1. The user would pick: Min, Max and Precision and the number of buckets will be calculated from these values. 2. The template restrictions are now state in a requires so voiolation will be stop at compile time.	2020-06-18 14:18:21 +03:00
Tomasz Grabiec	17ee1a2eed	utils: cached_file: Fix compilation error Fix field initialization order problem. In file included from ./sstables/mc/bsearch_clustered_cursor.hh:28, from sstables/index_reader.hh:32, from sstables/sstables.cc:49: ./utils/cached_file.hh: In constructor 'cached_file::stream::stream(cached_file&, const seastar::io_priority_class&, tracing::trace_state_ptr, cached_file::page_idx_type, cached_file::offset_type)': ./utils/cached_file.hh:119:34: error: 'cached_file::stream::_trace_state' will be initialized after [-Werror=reorder] 119 \| tracing::trace_state_ptr _trace_state; \| ^~~~~~~~~~~~ ./utils/cached_file.hh:117:23: error: 'cached_file::page_idx_type cached_file::stream::_page_idx' [-Werror=reorder] 117 \| page_idx_type _page_idx; \| ^~~~~~~~~ ./utils/cached_file.hh:127:9: error: when initialized here [-Werror=reorder] 127 \| stream(cached_file& cf, const io_priority_class& pc, tracing::trace_state_ptr trace_state, \| ^~~~~~ Message-Id: <1592478082-22505-1-git-send-email-tgrabiec@scylladb.com>	2020-06-18 14:08:29 +03:00
Raphael S. Carvalho	03db448a92	sstables/backlog_tracker: Fix incorrect calculation of Compaction backlog When debugging this for first time c412a7a, I thought the problem, which causes backlog to be negative, was a bug in the implementation of the formula, but it turns out that the bug is actually in the formula itself. Not limiting the scope of this bug to STCS because its tracker is inherited by the trackers of other strategies, meaning they're also affected by this. The backlog for a SSTable is known to be Bi = Ei * log(T / Si) Where T = total Size minus compacted bytes for a table, Ci = Compacted Bytes for a SSTable, Si = Size of a SStable Ei = Ci - Si The problem was that we were assuming T > Si, but it can happen that T is lower than Si if the table in question is decreasing in size. If we rewrite SSTable backlog as Bi = Ei * log (T) - Ei * log(Si) It becomes even clearer why T cannot be lower than Si whatsoever, or the backlog calculation can go wrong because first term becomes lower than the second. Fixing the formula consists of changing it to Bi = Ei * log (T / Ei) Bi = Ei * log (T) - Ei * log (Si - Ci) After this change, the backlog still behave in a very similar way as before, which can be confirmed via this graph: https://user-images.githubusercontent.com/1409139/79627762-71afdf80-8111-11ea-9ebc-0831c4e3d9c6.png Fixes #6021. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200616174712.16505-1-raphaelsc@scylladb.com>	2020-06-18 13:56:47 +03:00
Avi Kivity	5d99d667ec	Merge "Build system improvements for packaging" from Pekka " This patch series attempts to decouple package build and release infrastructure, which is internal to Scylla (the company). The goal of this series is to make it easy for humans and machines to build the full Scylla distribution package artifacts, and make it easy to quickly verify them. The improvements to build system are done in the following steps. 1. Make scylla.git a super-module, which has git submodules for scylla-jmx and scylla-tools. A clone of scylla.git is now all that is needed to access all source code of all the different components that make up a Scylla distribution, which is a preparational step to adding "dist" ninja build target. A scripts/sync-submodules.sh helper script is included, which allows easy updating of the submodules to the latest head of the respective git repositories. 2. Make builds reproducible by moving the remaining relocatable package specific build options from reloc/build_reloc.sh to the build system. After this step, you can build the exact same binaries from the git repository by using the dbuild version from scylla.git. 3. Add a "dist" target to ninja build, which builds all .rpm and .deb packages with one command. To build a release, run: $ ./tools/toolchain/dbuild ./configure.py --mode release $ ./tools/toolchain/dbuild ninja-build dist and you will now have .rpm and .deb packages to all the components of a Scylla distribution. 4. Add a "dist-check" target to ninja build for verification of .rpm and .deb packages in one command. To verify all the built packages, run: $ ninja-build dist-check Please note that you must run this step on the host, because the target uses Docker under the hood to verify packages by installing them on different Linux distributions. Currently only CentOS 7 verification is supported. All these improvements are done so that backward compatibility is retained. That is, any existing release infrastructure or other build scripts are completely unaffacted. Future improvements to consider: - Package repository generation: add a "ninja repo" command to generate a .rpm and .deb repositories, which can be uploaded to a web site. This makes it possible to build a downloadable Scylla distribution from scylla.git. The target requires some configuration, which user has to provide. For example, download URL locations and package signing keys. - Amazon Machine Image (AMI) support: add a "ninja ami" command to simplify the steps needed to generate a Scylla distribution AMI. - Docker image support: add a "ninja docker" command to simplify the steps needed to generate a Scylla distribution Docker image. - Simplify and unify package build: simplify and unify the various shell scripts needed to build packages in different git repositories. This step will break backward compatiblity and can be done only after relevant build scripts and release infrastructure is updated. " * 'penberg/packaging/v5' of github.com:penberg/scylla: docs: Update packaging documentation build: Add "dist-check" target scripts/testing: Add "dist-check" for package verification build: Add "dist" target reloc: Add '--builddir' option to build_deb.sh build: Add "-ffile-prefix-map" to cxxflags docs: Document sync-submodules.sh script in maintainer.md sync-submodules.sh: Add script for syncing submodules Add scylla-tools submodule Add scylla-jmx submodule	2020-06-18 12:59:52 +03:00
Dejan Mircevski	aec1acd1d5	range_test: Add cases for singular intersection Intersection was previously not tested for singular ranges. This ensures it will always work for singular ranges, too. Tests: unit(dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-06-18 12:38:31 +03:00
Yaron Kaikov	e9d5852b0c	dbuild: Add an option to run dbuild using podman Following https://github.com/scylladb/scylla/pull/5333, we want to be able to run dbuild using podman or docker by setting enviorment variable named: DBUILD_TOOL DBUILD_TOOL will use docker by default unless we explicitly set the tool podmand Fixes: https://github.com/scylladb/scylla/pull/6644	2020-06-18 12:13:39 +03:00
Avi Kivity	9322c07c71	Merge "Use binary search in sstable promoted index" from Tomasz " The "promoted index" is how the sstable format calls the clustering key index within a given partition. Large partitions with many rows have it. It's embedded in the partition index entry. Currently, lookups in the promoted index are done by scanning the index linearly so the lookup is O(N). For large partitions that's inefficient. It consumes both a lot of CPU and I/O. We could do better and use binary search in the index. This patch series switches the mc-format index reader to do that. Other formats use the old way. The "mc" format promoted index has an extra structure at the end of the index called "offset map". It's a vector of offsets of consecutive promoted index entries. This allows us to access random entries in the index without reading the whole index. The location of the offset entry for a given promoted index entry can be derived by knowing where the offset vector ends in the index file, so the offset map also doesn't have to be read completely into the memory. The most tricky part is caching. We need to cache blocks read from the index file to amortize the cost of binary search: - if the promoted index fits in the 32 KiB which was read from the index when looking for the partition entry, we don't want to issue any additional I/O to search the promoted index. - with large promoted indexes, the last few bisections will fall into the same I/O block and we want to reuse that block. - we don't want the cache to grow too big, we don't want to cache the whole promoted index as the read progresses over the index. Scanning reads may skip multiple times. This series implements a rather simple approach which meets all the above requirements and is not worse than the current state of affairs: - Each index cursor has its own cache of the index file area which corresponds to promoted index This is managed by the cached_file class. - Each index cursor has its own cache of parsed blocks. This allows the upper bound estimation to reuse information obtained during lower bound lookup. This estimation is used to limit read-aheads in the data file. - Each cursor drops entries that it walked past so that memory footprint stays O(log N) - Cached buffers are accounted to read's reader_permit. Later, we could have a single cache shared by many readers. For that, we need to come up with eviction policy. Fixes #4007. TESTING RESULTS * Point reads, large promoted index: Config: rows: 10000000, value size: 2000 Partition size: 20 GB Index size: 7 MB Notes: - Slicing read into the middle of partition (offset=5000000, read=1) is a clear win for the binary search: time: 1.9ms vs 22.9ms CPU utilization: 8.9% vs 92.3% I/O: 21 reqs / 172 KiB vs 29 reqs / 3'520 KiB It's 12x faster, CPU utilization is 10x times smaller, disk utilization is 20x smaller. - Slicing at the front (offset=0) is a mixed bag. time is similar: 1.8ms CPU utilization is 6.7x smaller for bsearch: 8.5% vs 57.7% disk bandwidth utilization is smaller for bsearch but uses more IOs: 4 reqs / 320 KiB (scan) vs 17 reqs / 188 KiB (bsearch) bsearch uses less bandwidth because the series reduces buffer size used for index file I/O. scan is issuing: 2 * 128 KB (index page) 2 * 32 KB (data file) bsearch is issuing: 1 * 64 KB (index page) 15 * 4 KB (promoted index) 1 * 64 KB (data file) The 1 * 64 KB is chosen dynamically by seastar. Sometimes it chooses 2 * 32 KB (with read-ahead). 32 KB is the minimum I/O currently. Disk utilization could be further improved by changing the way seastar's dynamic I/O adjustments work so that it uses 1 * 4 KB when it suffices. This is left for the follow-up. Command: perf_fast_forward --datasets=large-part-ds1 \ --run-tests=large-partition-slicing-clustering-keys -c1 --test-case-duration=1 Before: offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.001836 172 1 545 9 563 175 4.0 4 320 2 2 0 1 1 0 0 0 57.7% 0 0 32 0.001858 502 32 17220 126 17776 11526 3.2 3 324 2 1 0 1 1 0 0 0 56.4% 0 0 256 0.002833 339 256 90374 427 91757 85931 7.0 7 776 3 1 0 1 1 0 0 0 41.1% 0 0 4096 0.017211 58 4096 237984 2011 241802 233870 66.1 66 8376 59 2 0 1 1 0 0 0 21.4% 0 5000000 1 0.022952 42 1 44 1 45 41 29.2 29 3520 22 2 0 1 1 0 0 0 92.3% 0 5000000 32 0.023052 43 32 1388 14 1414 1331 31.1 32 3588 26 2 0 1 1 0 0 0 91.7% 0 5000000 256 0.024795 41 256 10325 129 10721 9993 43.1 39 4544 29 2 0 1 1 0 0 0 86.4% 0 5000000 4096 0.038856 27 4096 105414 398 106918 103162 95.2 95 12160 78 5 0 1 1 0 0 0 61.4% 0 After (v2): offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.001831 248 1 546 21 581 252 17.6 17 188 2 0 0 1 1 0 0 0 8.5% 0 0 32 0.001910 535 32 16751 626 17770 13896 17.9 19 160 3 0 0 1 1 0 0 0 8.8% 0 0 256 0.003545 266 256 72207 2333 89076 62852 26.9 24 764 7 0 0 1 1 0 0 0 9.7% 0 0 4096 0.016800 56 4096 243812 524 245430 239736 83.6 83 8700 64 0 0 1 1 0 0 0 16.6% 0 5000000 1 0.001968 351 1 508 19 538 380 21.3 21 172 2 0 0 1 1 0 0 0 8.9% 0 5000000 32 0.002273 431 32 14077 436 15503 11551 22.7 22 268 3 0 0 1 1 0 0 0 8.9% 0 5000000 256 0.003889 257 256 65824 2197 81833 57813 34.0 37 652 18 0 0 1 1 0 0 0 11.2% 0 5000000 4096 0.017115 54 4096 239324 834 241310 231993 88.3 88 8844 65 0 0 1 1 0 0 0 16.8% 0 After (v1): offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.001886 259 1 530 4 545 261 18.0 18 376 2 2 0 1 1 0 0 0 9.1% 0 0 32 0.001954 513 32 16381 93 16844 15618 19.0 19 408 3 2 0 1 1 0 0 0 9.3% 0 0 256 0.003266 318 256 78393 1820 81567 61663 30.8 26 1272 7 2 0 1 1 0 0 0 10.4% 0 0 4096 0.017991 57 4096 227666 855 231915 225781 83.1 83 8888 55 5 0 1 1 0 0 0 15.5% 0 5000000 1 0.002353 232 1 425 2 432 232 23.0 23 396 2 2 0 1 1 0 0 0 8.7% 0 5000000 32 0.002573 384 32 12437 47 12571 429 25.0 25 460 4 2 0 1 1 0 0 0 8.5% 0 5000000 256 0.003994 259 256 64101 2904 67924 51427 37.0 35 1484 11 2 0 1 1 0 0 0 10.6% 0 5000000 4096 0.018567 56 4096 220609 448 227395 219029 89.8 89 9036 59 5 0 1 1 0 0 0 15.1% 0 * Point reads, small promoted index (two blocks): Config: rows: 400, value size: 200 Partition size: 84 KiB Index size: 65 B Notes: - No significant difference in time - the same disk utilization - similar CPU utilization Command: perf_fast_forward --datasets=large-part-ds1 \ --run-tests=large-partition-slicing-clustering-keys -c1 --test-case-duration=1 Before: offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.000279 470 1 3587 31 3829 478 3.0 3 68 2 1 0 1 1 0 0 0 21.1% 0 0 32 0.000276 3498 32 116038 811 122756 104033 3.0 3 68 2 1 0 1 1 0 0 0 24.0% 0 0 256 0.000412 2554 256 621044 1778 732150 559221 2.0 2 72 2 0 0 1 1 0 0 0 32.6% 0 0 4096 0.000510 1901 400 783883 4078 819058 665616 2.0 2 88 2 0 0 1 1 0 0 0 36.4% 0 200 1 0.000339 2712 1 2951 8 3001 2569 2.0 2 72 2 0 0 1 1 0 0 0 17.8% 0 200 32 0.000352 2586 32 91019 266 92427 83411 2.0 2 72 2 0 0 1 1 0 0 0 20.8% 0 200 256 0.000458 2073 200 436503 1618 453945 385501 2.0 2 88 2 0 0 1 1 0 0 0 29.4% 0 200 4096 0.000458 2097 200 436475 1676 458349 381558 2.0 2 88 2 0 0 1 1 0 0 0 29.0% 0 After (v1): Testing slicing of large partition using clustering keys: offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.000278 492 1 3598 30 3831 500 3.0 3 68 2 1 0 1 1 0 0 0 19.4% 0 0 32 0.000275 3433 32 116153 753 122915 92559 3.0 3 68 2 1 0 1 1 0 0 0 22.5% 0 0 256 0.000458 2576 256 559437 2978 728075 504375 2.1 2 88 2 0 0 1 1 0 0 0 29.0% 0 0 4096 0.000506 1888 400 790064 3306 822360 623109 2.0 2 88 2 0 0 1 1 0 0 0 36.6% 0 200 1 0.000382 2493 1 2619 10 2675 2268 2.0 2 88 2 0 0 1 1 0 0 0 16.3% 0 200 32 0.000398 2393 32 80422 333 84759 22281 2.0 2 88 2 0 0 1 1 0 0 0 19.0% 0 200 256 0.000459 2096 200 435943 1608 453989 380749 2.0 2 88 2 0 0 1 1 0 0 0 30.5% 0 200 4096 0.000458 2097 200 436410 1651 455779 382485 2.0 2 88 2 0 0 1 1 0 0 0 29.2% 0 * Scan with skips, large index: Config: rows: 10000000, value size: 2000 Partition size: 20 GB Index size: 7 MB Notes: - Similar time, slightly worse for binary search: 36.1 s (scan) vs 36.4 (bsearch) - Slightly more I/O for bsearch: 153'932 reqs / 19'703'260 KiB (scan) vs 155'651 reqs / 19'704'088 KiB (bsearch) Binary search reads more by 828 KB and by 1719 IOs. It does more I/O to read the the promoted index offset map. - similar (low) memory footprint. The danger here is that by caching index blocks which we touch as we scan we would end up caching the whole index. But this is protected against by eviction as demonstrated by the last "mem" column. Command: perf_fast_forward --datasets=large-part-ds1 \ --run-tests=large-partition-skips -c1 --test-case-duration=1 Before: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 1 1 36.103451 4 5000000 138491 38 138601 138453 153932.0 153932 19703260 153561 1 0 1 1 0 0 0 31.5% 502690 After (v2): read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 1 1 37.000145 4 5000000 135135 6 135146 135128 155651.0 155651 19704088 138968 0 0 1 1 0 0 0 34.2% 0 After (v1): read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 1 1 36.965520 4 5000000 135261 30 135311 135231 155628.0 155628 19704216 139133 1 0 1 1 0 0 0 33.9% 248738 Also in: git@github.com:tgrabiec/scylla.git sstable-use-index-offset-map-v2 Tests: - unit (all modes) - manual using perf_fast_forward " * tag 'sstable-use-index-offset-map-v2' of github.com:tgrabiec/scylla: sstables: Add promoted index cache metrics position_in_partition: Introduce external_memory_usage() cached_file, sstables: Add tracing to index binary search and page cache sstables: Dynamically adjust I/O size for index reads sstables, tests: Allow disabling binary search in promoted index from perf tests sstables: mc: Use binary search over the promoted index utils: Introduce cached_file sstables: clustered_index: Relax scope of validity of entry_info sstables: index_entry: Introduce owning promoted_index_block_position compound_compat: Allow constructing composite from a view sstables: index_entry: Rename promoted_index_block_position to promoted_index_block_position_view sstables: mc: Extract parser for promoted index block sstables: mc: Extract parser for clustering out of the promoted index block parser sstables: consumer: Extract primitive_consumer sstables: Abstract the clustering index cursor behavior sstables: index_reader: Rearrange to reduce branching and optionals	2020-06-18 12:09:39 +03:00
Pekka Enberg	4d48f22827	docs: Update packaging documentation	2020-06-18 10:20:08 +03:00
Pekka Enberg	9e279ec2a9	build: Add "dist-check" target This adds a "dist-check" target to ninja build. The target needs to be run on the host because package verification is done with Docker.	2020-06-18 10:20:08 +03:00
Pekka Enberg	584c7130a1	scripts/testing: Add "dist-check" for package verification This adds a "dist-check.sh" script in tools/testing, which performs distribution package verification by installing packages under Docker.	2020-06-18 10:16:46 +03:00
Pekka Enberg	8e1a561fba	build: Add "dist" target	2020-06-18 10:16:46 +03:00
Pekka Enberg	7b7c91a34b	reloc: Add '--builddir' option to build_deb.sh The build system will call this script. It needs control over where the packages are built to allow building packages for the different build modes.	2020-06-18 09:54:37 +03:00
Pekka Enberg	013f87f388	build: Add "-ffile-prefix-map" to cxxflags This patch adds "-ffile-prefix-map" to cxxflags for all build modes. This has two benefits: 1, Relocatable packages no longer have any special build flags, which makes deeper integration with the build system possible (e.g. targets for packages). 2 Builds are now reproducible, which makes debugging easier in case you only have a backtrace, but no artifacts. Rafael explains: "BTW, I think I found another argument for why we should always build with -ffile-prefix-map=. There was user after free test failure on next promotion. I am unable to reproduce it locally, so it would be super nice to be able to decode the backtrace. I was able to do it, but I had to create a /jenkins/workspace/scylla-master/next/ directory and build from there to get the same results as the bot." Acked-by: Botond Dénes <bdenes@scylladb.com> Acked-by: Nadav Har'El <nyh@scylladb.com> Acked-by: Rafael Avila de Espindola <espindola@scylladb.com>	2020-06-18 09:54:37 +03:00
Pekka Enberg	71da4e6e79	docs: Document sync-submodules.sh script in maintainer.md	2020-06-18 09:54:37 +03:00
Pekka Enberg	e3376472e8	sync-submodules.sh: Add script for syncing submodules	2020-06-18 09:54:37 +03:00
Pekka Enberg	d759d7567b	Add scylla-tools submodule	2020-06-18 09:54:37 +03:00
Pekka Enberg	9edf858d30	Add scylla-jmx submodule	2020-06-18 09:54:37 +03:00
Benny Halevy	5926cfc298	CMakeLists.txt: Update to C++20 Following `427398641a` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200618052956.570260-1-bhalevy@scylladb.com>	2020-06-18 09:51:23 +03:00
Pekka Enberg	02b733c22b	Revert "dbuild: Add an option to run with 'docker' or 'podman'" This reverts commit `ac7237f991`. The logic is wrong and always picks "podman" if it's installed on the system even if user asks for "docker" with the DBUILD_TOOL environment variable. This wreaks havoc on machines that have both docker and podman packages installed, but podman is not configured correctly.	2020-06-18 09:22:33 +03:00
Juliusz Stasiewicz	8628ede009	cdc: Fix segfault when stream ID key is too short When a token is calculated for stream_id, we check that the key is exactly 16 bytes long. If it's not - `minimum_token` is returned and client receives empty result. This used to be the expected behavior for empty keys; now it's extended to keys of any incorrect length. Fixes #6570	2020-06-17 18:19:37 +03:00
Nadav Har'El	095ddf0d41	alternator test: use ConsistentRead=True where missing All tests that write some data and then read it back need to use ConsistentRead=True, otherwise the test may sporadically fail on a multi- node cluster. In the previous patch we fixed the full_query()/full_scan() convenience functions. In this patch, I audited the calls to the boto3 read methods - get_item(), batch_get_item(), query(), scan(), and although most of them did use ConsistentRead=True as needed, I found some missing and this patch fixes them. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200616080334.825893-1-nyh@scylladb.com>	2020-06-17 14:57:45 +02:00
Nadav Har'El	c298088375	alternator test: use ConsistentRead=True for full_query/scan Many of the Alternator tests use the convenience functions full_query()/ full_scan() to read from the table. Almost all these tests need to be able to read their own writes, i.e., want ConsistentRead=True, but none of them explicitly specified this parameter. Such tests may sporadically fail when running on cluster with multiple nodes. So this patch follows a TODO in the code, and makes ConsistentRead=True the default for the full_() functions. The caller can still override it with ConsistentRead=False - and this is necessary in the GSI tests, because ConsistentRead=True is not allowed in GSIs. Note that while ConsistentRead=True is now the default for the full_() convenience functions, but it is still not the default for the lower level boto3 functions scan(), query() and get_item() - so usages of those should be evaluated as well and missing ConsistentRead=True, if any, should be added. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200616073821.824784-1-nyh@scylladb.com>	2020-06-17 14:57:45 +02:00
Raphael S. Carvalho	2f680b3458	size_tiered_backlog_tracker: Rename total_bytes Reader can assume total_bytes and _total_bytes have the same meaning, but they don't, so let's give the former a more descriptive name. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200616175055.16771-1-raphaelsc@scylladb.com>	2020-06-17 13:39:30 +03:00
Avi Kivity	d2ab6a24a1	Update seastar submodule * seastar 8f0858cfd7...b515d63735 (2): > do_with: replace seastar::apply() calls with std::apply() > Merge "Resolve various http fixmes" from Piotr	2020-06-17 12:59:16 +03:00
Glauber Costa	1c70a7c54e	upload: use custom error handler for upload directory SSTables created for the upload directory should be using its custom error handler. There is one user of the custom error handler in tree, which is the current upload directory function. As we will use a free function instead of a lambda in our implementation we also use the opportunity to fix it for consistency. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-16 19:42:19 -04:00
Glauber Costa	c188aef088	sstable_directory: fix debug message I just noticed while working on the reshape patches that there is an extra format bracket in two of the debug message. As they are debug I've seen them less often than the others and that slipped. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-16 19:42:19 -04:00
Nadav Har'El	ba59034402	merge: Use std::string_view in a few more apis Merged patch series by Rafael Ávila de Espíndola: The main advantage is that callers now don't have to construct sstrings. It is also a 0.09% win in text size (from 41804308 to 41766484 bytes) and the tps reported by perf_simple_query --duration 16 --smp 1 -m4G >> log 2>err in 500 randomized runs goes up by 0.16% (from 162259 to 162517). Rafael Ávila de Espíndola (3): service: Pass a std::string_view to client_state::set_keyspace cql3: Use a flat_hash_map in untyped_result_set_row cql3: Pass std::string_view to various untyped_result_set member functions cql3/untyped_result_set.hh \| 30 ++++++++++++++++-------------- service/client_state.hh \| 2 +- cql3/untyped_result_set.cc \| 6 +++--- service/client_state.cc \| 4 ++-- 4 files changed, 22 insertions(+), 20 deletions(-)	2020-06-16 20:31:36 +03:00
Avi Kivity	b608af870b	dist: debian: do not require root during package build Debian package builds provide a root environment for the installation scripts, since that's what typical installation scripts expect. To avoid providing actual root, a "fakeroot" system is used where syscalls are intercepted and any effect that requires root (like chown) is emulated. However, fakeroot sporadically fails for us, aborting the package build. Since our install scripts don't really require root (when operating in the --packaging mode), we can just tell dpkg-buildpackage that we don't need fakeroot. This ought to fix the sporadic failures. As a side effect, package builds are faster. Fixes #6655.	2020-06-16 20:27:04 +03:00
Tomasz Grabiec	266e3f33d1	sstables: Add promoted index cache metrics	2020-06-16 16:15:24 +02:00
Tomasz Grabiec	9885d0e806	position_in_partition: Introduce external_memory_usage()	2020-06-16 16:15:24 +02:00
Tomasz Grabiec	58532cdf11	cached_file, sstables: Add tracing to index binary search and page cache	2020-06-16 16:15:24 +02:00
Tomasz Grabiec	ecb6abe717	sstables: Dynamically adjust I/O size for index reads Currently, index reader uses 128 KiB I/O size with read-ahead. That is a waste of bandwidth if index entries contain large promoted index and binary search will be used within the promoted index, which may not need to access as much. The read-ahead is wasted both when using binary search and when using the scanning cursor. On the other hand, large I/O is optimal if there is no promoted index and we're going to parse the whole page. There is no way to predict which case it is up front before reading the index. Attaching dynamic adjustments (per-sstable) lets the system auto adjust to the workload from past history. The large promoted index workload will settle on reading 32 KiB (with read-ahead). This is still not optimal, we should lower the buffer size even more. But that requires a seastar change, so is deferred.	2020-06-16 16:15:23 +02:00
Tomasz Grabiec	19501d9ef2	sstables, tests: Allow disabling binary search in promoted index from perf tests	2020-06-16 16:15:23 +02:00
Tomasz Grabiec	c0ee997614	sstables: mc: Use binary search over the promoted index Currently, lookups in the promoted index are done by scanning the index linearly so the lookup is O(N). For large partitions that's inefficient. It consumes both a lot of CPU and I/O. We could do better and use binary search in the index. This patch series switches the mc-format index reader to do that. Other formats use the old way. The "mc" format promoted index has an extra structure at the end of the index called "offset map". It's a vector of offsets of consecutive promoted index entries. This allows us to access random entries in the index without reading the whole index. The location of the offset entry for a given promoted index entry can be derived by knowing where the offset vector ends in the index file, so the offset map also doesn't have to be read completely into the memory. The most tricky part is caching. We need to cache blocks read from the index file to amortize the cost of binary search: - if the promoted index fits in the 32 KiB which was read from the index when looking for the partition entry, we don't want to issue any additional I/O to search the promoted index. - with large promoted indexes, the last few bisections will fall into the same I/O block and we want to reuse that block. - we don't want the cache to grow too big, we don't want to cache the whole promoted index as the read progresses over the index. Scanning reads may skip multiple times. This patch implements a rather simple approach which meets all the above requirements and is not worse than the current state of affairs: - Each index cursor has its own cache of the index file area which corresponds to promoted index This is managed by the cached_file class. - Each index cursor has its own cache of parsed blocks. This allows the upper bound estimation to reuse information obtained during lower bound lookup. This estimation is used to limit read-aheads in the data file. - Each cursor drops entries that it walked past so that memory footprint stays O(log N) - Cached buffers are accounted to read's reader_permit.	2020-06-16 16:15:23 +02:00
Tomasz Grabiec	c95dd67d11	utils: Introduce cached_file It is a read-through cache of a file. Will be used to cache contents of the promoted index area from the index file. Currently, cached pages are evicted manually using the invalidate_*() method family, or when the object is destroyed. The cached_file represents a subset of the file. The reason for this is to satisfy two requirements. One is that we have a page-aligned caching, where pages are aligned relative to the start of the underlying file. This matches requirements of the seastar I/O engine on I/O requests. Another requirement is to have an effective way to populate the cache using an unaligned buffer which starts in the middle of the file when we know that we won't need to access bytes located before the buffer's position. See populate_front(). If we couldn't assume that, we wouldn't be able to insert an unaligned buffer into the cache.	2020-06-16 16:15:23 +02:00
Tomasz Grabiec	ab274b8203	sstables: clustered_index: Relax scope of validity of entry_info entry_info holds views, which may get invalidated when the containing index blocks are removed. Current implementations of next_entry() keeps the blocks in memory as long as the cursor is alive but that will change in new implementations of the cursor. Adjust the assumption of tests accordingly.	2020-06-16 16:15:23 +02:00
Tomasz Grabiec	ea2fbcc2cd	sstables: index_entry: Introduce owning promoted_index_block_position	2020-06-16 16:15:23 +02:00
Tomasz Grabiec	714da3c644	compound_compat: Allow constructing composite from a view	2020-06-16 16:15:23 +02:00
Tomasz Grabiec	f2e52c433f	sstables: index_entry: Rename promoted_index_block_position to promoted_index_block_position_view	2020-06-16 16:15:23 +02:00
Tomasz Grabiec	101fd613c5	sstables: mc: Extract parser for promoted index block It will be reused in binary search over the index.	2020-06-16 16:15:14 +02:00
Tomasz Grabiec	a557c374fd	sstables: mc: Extract parser for clustering out of the promoted index block parser This parser will be used stand-alone when doing a binary search over promoted index blocks. We will only parse the start key not the whole block.	2020-06-16 16:14:31 +02:00
Tomasz Grabiec	95df7126a7	sstables: consumer: Extract primitive_consumer This change extracts the parser for primitive types out of continuous_data_consumer so that it can be used stand-alone or embedded in other parsers.	2020-06-16 16:14:30 +02:00
Tomasz Grabiec	d5bf540079	sstables: Abstract the clustering index cursor behavior In preparation for supporting more than one algorithm for lookups in the promoted index, extract relevant logic out of the index_reader (which is a partition index cursor). The clustered index cursor implementation is now hidden behind abstract interface called clustered_index_cursor. The current implementation is put into the scanning_clustered_index_cursor. It's mostly code movement with minor adjustments. In order to encapsulate iteration over promoted index entries, clustered_index_cursor::next_entry() was introduced. No change in behavior intended in this patch.	2020-06-16 16:14:17 +02:00
Tomasz Grabiec	a858f87b11	sstables: index_reader: Rearrange to reduce branching and optionals No change in logic. Will make it easier to make further refactoring.	2020-06-16 16:13:39 +02:00
Yaron Kaikov	ac7237f991	dbuild: Add an option to run with 'docker' or 'podman' This adds support for configuring whether to run dbuild with 'docker' or 'podman' via a new environment variable, DBUILD_TOOL. While at it, check if 'podman' exists, and prefer that by default as the tool for dbuild.	2020-06-16 15:18:46 +03:00
Gleb Natapov	7ca937778d	cql transport: do not log broken pipe error when a client closes its side of a connection abruptly Fixes #5661 Message-Id: <20200615075958.GL335449@scylladb.com>	2020-06-16 13:59:12 +02:00
Nadav Har'El	41a049d906	README: better explanation of dependencies and build In this patch I rewrote the explanations in both README.md and HACKING.md about Scylla's dependencies, and about dbuild. README.md used to mention only dbuild. It now explains better (I think) why dbuild is needed in the first place, and that the alternative is explained in HACKING.md. HACKING.md used to explain only install-dependencies.sh - and now explains why it is needed, what install-dependencies.sh and that it ONLY works on very recent distributions (e.g., Fedora older than 32 are not supported), and now also mentions the alternative - dbuild. Mentions of incorrect requirements (like "gcc > 8.1") were fixed or dropped. Mention of the archaic 'scripts/scylla_current_repo' script, which we used to need to install additional packages on non-Fedora systems, was dropped. The script itself is also removed. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200616100253.830139-1-nyh@scylladb.com>	2020-06-16 13:26:04 +02:00
Avi Kivity	bd794629f9	range: rename range template family to interval nonwrapping_range<T> and related templates represent mathematical intervals, and are different from C++ ranges. This causes confusion, especially when C++ ranges and the range templates are used together. As the first step to disentable this, introduce a new interval.hh header with the contents of the old range.hh header, renaming as follows: range_bound -> interval_bound nonwrapping_range -> nonwrapping_interval wrapping_range -> wrapping_interval Range -> Interval (concepts) The range alias, which previously aliased wrapping_range, did not get renamed - instead the interval alias now aliases nonwrapping_interval, which is the natural interval type. I plan to follow up making interval the template, and nonwrapping_interval the alias (or perhaps even remove it). To avoid churn, a new range.hh header is provided with the old names as aliases (range, nonwrapping_range, wrapping_range, range_bound, and Range) with the same meaning as their former selves. Tests: unit (dev)	2020-06-16 13:36:20 +03:00
Piotr Sarna	3bcc2e8f09	Merge 'hinted handoff: improve segment replay logic' from PiotrD This series contains two improvements to hint file replay logic in hints manager: - During replay of a hint file, keeping track of the first hint that fails to be sent is now done via a simple std::optional variable instead of an unordered_set. This slightly reduces complexity of next replay position calculation. - A corner case is handled: if reading commitlog fails, but there won't be an error related to sending hints, starting position wouldn't be updated. This could cause us to replay more hints than necessary. Tests: - unit(dev) - dtest(hintedhandoff_additional_test, dev) * piodul-hints-manager-handle-commitlog-failure-in-replay-position-calculation: hinted handoff: use bool instead of send_state_set hinted handoff: update replay position on commitlog failure hinted handoff: remove rps_set, use first_failed_rp instead	2020-06-16 12:24:55 +02:00
Avi Kivity	6ba7b8f3f5	Update seastar submodule * seastar 81242ccc3f...8f0858cfd7 (18): > Merge 'future, future-utils: stop returning a variadic future from when_all_succeed' > file: introduce layered_file_impl, a helper for layered files > net: packet: mark move assignment operator as noexcept > core: weak_ptr, weakly_referencable: implement empty default constructor > circular_buffer: Fix build with gcc 11 (avoid template parameters in d'tor declaration) > test: weak_ptr_test: fix static asserts about nothrow constructibility > coroutines: Fix clang build > cmake: Delete SEASTAR_COROUTINES_TS > Merge "future-util: Mark a few more functions as noexcept" from Rafael > tests: add a perf test to measure the fair_queue performance > Merge "iostream: make iostream stack nothrow move constructible" from Benny > future: Move most of rethrow_with_nested out of line. > future_test: Add test for nested exceptions in finally > core: Add noexcept to unaligned members functions > Merge "core: make weak_ptr and checked_ptr default and move nothrow constructible" from Benny > core: file: Fix typo in a comment > byteorder: Mark functions as noexcept > future: replace CanInvoke concepts with std::invocable	2020-06-16 13:19:36 +03:00
Piotr Sarna	e59d41dad6	alternator: use plain function pointer instead of std::function Since all function handlers are plain functions without any state, there's no need for wrapping them with a 32-byte std::function when a plain function pointer would suffice. Reported-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <913c1de7d02c252b40dc0c545989ec83fe74e5a9.1592291413.git.sarna@scylladb.com>	2020-06-16 12:08:21 +03:00
Raphael S. Carvalho	238ba899c0	compaction_manager: use double for backlog everywhere Avi says: "The backlog is a large number that changes slowly, so float might not have enough resolution to track small changes. For example, if the backlog is 800GB and changes less than 100kB, then we won't see a change (float resolution is 2^23 ~ 1:8,000,000). This is outside the normal range of values (usually the backlog changes a lot more than 100kB per 15-second period), so it will work, but better to be more careful." Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200615150621.17543-1-raphaelsc@scylladb.com>	2020-06-16 12:05:05 +03:00
Pavel Solodovnikov	6028588148	transport: introduce `cql_protocol_extension` enum and cql protocol extensions negotiation The patch introduces two new features to aid with negotiating protocol extensions for the CQL protocol: - `cql_protocol_extensions` enum, which holds all supported extensions for the CQL protocol (currently contains only `LWT_ADD_METADATA_MARK` extension, which will be mentioned below). - An additional mechainsm of negotiating cql protocol extensions to be used in a client connection between a scylla server and a client driver. These extensions are propagated in SUPPORTED message sent from the server side with "SCYLLA_" prefix and received back as a response from the client driver in order to determine intersection between the cql extensions that are both supported by the server and acknowledged by a client driver. This intersection of features is later determined to be a working set of cql protocol extensions in use for the current `client_state`, which is associated with a particular client connection. This way we can easily settle on the used extensions set on both sides of the connection. Currently there is only one value: `LWT_ADD_METADATA_MARK`, which regulates whether to set a designated bit in prepared statement metadata indicating if the statement at hand is an lwt statement or not (actual implementation for the feature will be in a later patch). Each extension can also propagate some custom parameters to the corresponding key. CQL protocol specification allows to send a list of values with each key in the SUPPORTED message, we use that to pass parameters to extensions as `PARAM=VALUE` strings. In case of `LWT_ADD_METADATA_MARK` it's `SCYLLA_LWT_OPTIMIZATION_META_BIT_MASK` which designates the bitmask for LWT flag in prepared statement metadata in order to be used for lookup in a client library. The associated bits of code in `cql3::prepared_metadata` are adjusted to accomodate the feature. The value for the flag is chosen on purpose to be the last bit in the flags bitset since we don't want to possibly clash with C* implementation in case they add more possible flag values to prepared metadata (though there is an issue regarding that: https://issues.apache.org/jira/browse/CASSANDRA-15746). If it's fixed in upstream Cassandra, then we could synchronize the value for the flag with them. Also extend the underlying type of `flag` enum in `cql3::prepared_metadata` to be `uint32_t` instead of `uint8_t` because in either case flags mask is serialized as 32-bit integer. In theory, shard-awareness extension support also should be reworked in terms of provided minimal infrastructure, but for the sake of simplicity, this is left to be done in a follow-up some time later. This solution eliminates the need to assume that all the client drivers follow the CQL spec carefully because scylla-specific features and protocol extensions could be enabled only in case both server and client driver negotiate the supported feature set. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-06-16 11:35:52 +03:00
Rafael Ávila de Espíndola	3e1307a6d1	cql3: Pass std::string_view to various untyped_result_set member functions Taking a std::string_view is a bit more flexible. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-15 15:47:15 -07:00
Rafael Ávila de Espíndola	3a9b4e7d26	cql3: Use a flat_hash_map in untyped_result_set_row No functionality changed. This just makes it possible to use heterogeneous lookups, which the next patch will add. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-15 15:46:25 -07:00
Rafael Ávila de Espíndola	65d56095d0	service: Pass a std::string_view to client_state::set_keyspace No change in the implementation since it was already copying the string. Taking a std::string_view is just a bit more flexible. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-15 15:46:25 -07:00
Piotr Sarna	45bf039357	alternator: use has_function instead of try-catch With the new interface available, the try-catch idiom can be removed, thus resolving a TODO. Tests: unit(dev) Message-Id: <788a29f8f9d7bcf952b28a6148670dbadb97a619.1592233511.git.sarna@scylladb.com>	2020-06-15 23:55:20 +03:00
Piotr Sarna	911dee5417	schema: add has_column utility function With this simple helper function, a code snippet in alternator can be transformed from try-catch to a simple condition. Message-Id: <553debf4e91c0511566e53e2c8a5e8e6ee6552e2.1592233511.git.sarna@scylladb.com>	2020-06-15 23:55:06 +03:00
Piotr Sarna	b1684cf2e1	alternator: move function handlers to a lookup map Instead of a long chain of `if` statements, handlers are now created in a static map. Fixes a TODO in the code. Tests: unit(dev) Message-Id: <0ea577a44dd56859da170fe82c16c8f810f9d695.1592232448.git.sarna@scylladb.com>	2020-06-15 23:44:45 +03:00
Piotr Sarna	e76fba6f86	alternator: remove outdated TODO for adding timeouts The TODO is already fixed, not to mention that it had an incorrect ordinal number (: Message-Id: <006dc3061e0f30641c2e63ff471686f4c2e82829.1592230155.git.sarna@scylladb.com>	2020-06-15 23:04:42 +03:00
Tomasz Grabiec	1c5db178dd	Merge "logalloc: Get rid of segments migration" from Pavel But not compaction. When reclaiming segments to seastar non-empty segments are copied as-is to some other place. Instead of doing this reclaimer can copy only allocated objects and leave the freed holes behing, i.e. -- do the regular compaction. This would be the same or better from the timing perspective, and will help to avoid yet another compaction pass over the same set of objects in the future. Current migration code checks for the free segments reserve to be above minimum to proceed with migration, so does the code after this patch, thus the segment compaction is called with non-empty free segments set and thus it's guaranteed not to fail the new segment allocation (if it will be required at all). Plus some bikeshedding patches for the run-up. tests: unit(dev) * https://github.com/xemul/scylla/tree/br-logalloc-compact-on-reclaim-2: logalloc: Compact segments on reclaim instead of migration logallog: Introduce RAII allocation lock logalloc: Shuffle code around region::impl::compact logalloc: Do not lock reclaimer twice logalloc: Do not calculate object size twice logalloc: Do not convert obj_desc to migrator back and forth	2020-06-15 16:28:16 +02:00
Glauber Costa	093328741d	compaction: test that sstable set is not null in update_pending_ranges SSTable_set is now an optional, and if we don't want to expire data it will be empty. We need to check that it is not empty before dereferencing it. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200610170647.142817-1-glauber@scylladb.com>	2020-06-15 15:43:08 +02:00
Tomasz Grabiec	e81fc1f095	row_cache: Fix undefined behavior on key linearization This is relevant only when using partition or clustering keys which have a representation in memory which is larger than 12.8 KB (10% of LSA segment size). There are several places in code (cache, background garbage collection) which may need to linearize keys because of performing key comparison, but it's not done safely: 1) the code does not run with the LSA region locked, so pointers may get invalidated on linearization if it needs to reclaim memory. This is fixed by running the code inside an allocating section. 2) LSA region is locked, but the scope of with_linearized_managed_bytes() encloses the allocating section. If allocating section needs to reclaim, linearization context will contain invalidated pointers. The fix is to reorder the scopes so that linearization context lives within an allocating section. Example of 1 can be found in range_populating_reader::handle_end_of_stream() where it performs a lookup: auto prev = std::prev(it); if (prev->key().equal(_cache._schema, _last_key->_key)) { it->set_continuous(true); but handle_end_of_stream() is not invoked under allocating section. Example of 2 can be found in mutation_cleaner_impl::merge_some() where it does: return with_linearized_managed_bytes([&] { ... return _worker_state->alloc_section(region, [&] { Fixes #6637. Refs #6108. Tests: - unit (all) Message-Id: <1592218544-9435-1-git-send-email-tgrabiec@scylladb.com>	2020-06-15 16:03:33 +03:00
Nadav Har'El	86a4dfcd29	merge: api: Command to check and repair cdc streams Merged pull request https://github.com/scylladb/scylla/pull/6551 from Juliusz Stasiewicz: The command regenerates streams when: generations corresponding to a gossiped timestamp cannot be fetched from system_distributed table, or when generation token ranges do not align with token metadata. In such case the streams are regenerated and new timestamp is gossiped around. The returned JSON is always empty, regardless of whether streams needed regeneration or not. Fixes #6498 Accompanied by: scylladb/scylla-jmx#109, scylladb/scylla-tools-java#172	2020-06-15 14:17:35 +03:00
Takuya ASADA	ecc83e83e5	scylla_cpuscaling_setup: move the unit file to /etc/systemd Since scylla-cpupower.service isn't installed by .rpm package, but created in the setup script, it's better to not use /usr/lib directory, use /etc. We already doing same way for scylla-server.service.d/.conf, .mount, and *.swap created by setup scripts.	2020-06-15 11:36:20 +03:00
Asias He	61e4387811	repair: Relax node selection in decommission for non network topology strategy In decommission operation, current code requires a node in local dc to sync data with. This requirement is too strong for a non network topology strategy. For example, consider: n1 dc1 n2 dc1 n3 dc2 n2 runs decommission operation. For a keyspace with simple strategy and RF = 2, it is possible n3 is the new owner but n3 is not in the same dc as n2. To fix, perform the dc check only for the network topology strategy. Fixes #6564	2020-06-15 11:26:02 +03:00
Avi Kivity	d17b05e911	Merge 'Adding Optimized pseudo floating point estimated histogram' from Amnon " This series Adds a pseudo-floating-point histogram implementation. The histogram is used for time_estimated_histogram a histogram for latency tracking and then used in storage_proxy as a more efficient with a higher resolution histogram. Follow up series would use the new histogram in other places in the system and will add an implementation that supports lower values. Fixes #5815 Fixes #4746 " * amnonh-quicker_estimated_histogram: storage_proxy: use time_estimated_histogram for latencies test/boost/estimated_histogram_test utils/histogram_metrics_helper Adding histogram converter utils/estimated_histogram: Adding approx_exponential_histogram	2020-06-15 10:19:36 +03:00
Avi Kivity	493d16e800	build: fix --enable-dpdk/--disable-dpdk configure switch `5ceb20c439` switched --enable-dpdk to a tristate switch, but forgot that add_tristate() prepends --enable and --disable itself; so now the switch looks like --enable-enable-dpdk and --disable-enable-dpdk. Fix by removing the "enable-" prefix.	2020-06-15 09:37:45 +03:00
Amnon Heiman	6e1f042b93	storage_proxy: use time_estimated_histogram for latencies This patch change storage_proxy to use time_estimated_histogram. Besides the type, it changes how values are inserted and how the histogram is used by the API. An example how a metric looks like after the change: scylla_storage_proxy_coordinator_write_latency_bucket{le="640.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="768.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="896.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="1024.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="1280.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="1536.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="1792.000000",scheduling_group_name="statement",shard="0",type="histogram"} 2 scylla_storage_proxy_coordinator_write_latency_bucket{le="2048.000000",scheduling_group_name="statement",shard="0",type="histogram"} 2 scylla_storage_proxy_coordinator_write_latency_bucket{le="2560.000000",scheduling_group_name="statement",shard="0",type="histogram"} 3 scylla_storage_proxy_coordinator_write_latency_bucket{le="3072.000000",scheduling_group_name="statement",shard="0",type="histogram"} 5 scylla_storage_proxy_coordinator_write_latency_bucket{le="3584.000000",scheduling_group_name="statement",shard="0",type="histogram"} 5 scylla_storage_proxy_coordinator_write_latency_bucket{le="4096.000000",scheduling_group_name="statement",shard="0",type="histogram"} 7 scylla_storage_proxy_coordinator_write_latency_bucket{le="5120.000000",scheduling_group_name="statement",shard="0",type="histogram"} 8 scylla_storage_proxy_coordinator_write_latency_bucket{le="6144.000000",scheduling_group_name="statement",shard="0",type="histogram"} 9 scylla_storage_proxy_coordinator_write_latency_bucket{le="7168.000000",scheduling_group_name="statement",shard="0",type="histogram"} 11 scylla_storage_proxy_coordinator_write_latency_bucket{le="8192.000000",scheduling_group_name="statement",shard="0",type="histogram"} 11 scylla_storage_proxy_coordinator_write_latency_bucket{le="10240.000000",scheduling_group_name="statement",shard="0",type="histogram"} 19 scylla_storage_proxy_coordinator_write_latency_bucket{le="12288.000000",scheduling_group_name="statement",shard="0",type="histogram"} 49 scylla_storage_proxy_coordinator_write_latency_bucket{le="14336.000000",scheduling_group_name="statement",shard="0",type="histogram"} 132 scylla_storage_proxy_coordinator_write_latency_bucket{le="16384.000000",scheduling_group_name="statement",shard="0",type="histogram"} 294 scylla_storage_proxy_coordinator_write_latency_bucket{le="20480.000000",scheduling_group_name="statement",shard="0",type="histogram"} 1035 scylla_storage_proxy_coordinator_write_latency_bucket{le="24576.000000",scheduling_group_name="statement",shard="0",type="histogram"} 2790 scylla_storage_proxy_coordinator_write_latency_bucket{le="28672.000000",scheduling_group_name="statement",shard="0",type="histogram"} 5788 scylla_storage_proxy_coordinator_write_latency_bucket{le="32768.000000",scheduling_group_name="statement",shard="0",type="histogram"} 9815 scylla_storage_proxy_coordinator_write_latency_bucket{le="40960.000000",scheduling_group_name="statement",shard="0",type="histogram"} 19821 scylla_storage_proxy_coordinator_write_latency_bucket{le="49152.000000",scheduling_group_name="statement",shard="0",type="histogram"} 30063 scylla_storage_proxy_coordinator_write_latency_bucket{le="57344.000000",scheduling_group_name="statement",shard="0",type="histogram"} 38642 scylla_storage_proxy_coordinator_write_latency_bucket{le="65536.000000",scheduling_group_name="statement",shard="0",type="histogram"} 44987 scylla_storage_proxy_coordinator_write_latency_bucket{le="81920.000000",scheduling_group_name="statement",shard="0",type="histogram"} 51821 scylla_storage_proxy_coordinator_write_latency_bucket{le="98304.000000",scheduling_group_name="statement",shard="0",type="histogram"} 54197 scylla_storage_proxy_coordinator_write_latency_bucket{le="114688.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55054 scylla_storage_proxy_coordinator_write_latency_bucket{le="131072.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55363 scylla_storage_proxy_coordinator_write_latency_bucket{le="163840.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55520 scylla_storage_proxy_coordinator_write_latency_bucket{le="196608.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55545 scylla_storage_proxy_coordinator_write_latency_bucket{le="229376.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="262144.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="327680.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="393216.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="458752.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="524288.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="655360.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="786432.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="917504.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="1048576.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="1310720.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="1572864.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="1835008.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="2097152.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="2621440.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="3145728.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="3670016.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="4194304.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="5242880.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="6291456.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="7340032.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="8388608.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="10485760.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="12582912.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="14680064.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="16777216.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="20971520.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="25165824.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="29360128.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="33554432.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="+Inf",scheduling_group_name="statement",shard="0",type="histogram"} 55549 Fixes #4746 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-06-15 08:23:02 +03:00
Amnon Heiman	1cbc2e3d3e	test/boost/estimated_histogram_test This patch adds basic testing for the approx_exponential_histogram implementations. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-06-15 08:22:57 +03:00
Amnon Heiman	f30f926703	utils/histogram_metrics_helper Adding histogram converter This patch adds a helper converter function to convert from a approx_exponential_histogram histogram to a seastar::metrics::histogram Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-06-15 08:22:49 +03:00
Amnon Heiman	3319756f36	utils/estimated_histogram: Adding approx_exponential_histogram This patch adds an efficient histogram implementation. The implementation chooses efficiency over flexibility. That is why templates are used. How the approx_exponential_histogram pseudo floating point histogram works: It split the range [MIN, MAX] into log2(MAX/MIN) ranges it then split each of that ranges linearly according to a given resolution. For example, using resolution of 4, would be similar to using an exponentially growing histogram with a coefficient of 1.2. All values are uint64. To prevent handling of corner cases, it is not allowed to set the MIN to be lower than the resolution. The approx_exponential_histogram will probably not be used directly, the first used is by time_estimated_histogram. A histogram for durations. It should be compared to the estimated_histogram. Performance comparison: Comparison was done by inserting 2^20 values into time_estimated_histogram and estimated_histogram. In debug mode on a local machine insert operation took an average of 26.0 nanoseconds vs 342.2 nanoseconds. In release mode insert operation took an average of 1.90 vs 8.28 nanoseconds Fixes #5815 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-06-15 08:22:43 +03:00
Piotr Sarna	23c63ec19d	Merge 'alternator: implement FilterExpression' from Nadav The main goal of this series is to implement FilterExpression - the newer syntax for filtering results of Query and Scan requests. This feature itself is just one simple patch - it just needs to have the already-existing filtering code call the already-existing expression evaluation code. However, before we can do this, we need a patch to refactor the expression-evaluation interface (this patch also fixes pre-existing bugs). Then we need three additional patches to fix pre- existing bugs in the various corner cases of expressions (this bugs already existed in ConditionExpression but now became visible in tests for FilerExpression). Finally, in the end of the series, we also do a bit of code cleanup. After this series, the FilterExpression feature is complete, and all tests for this feature pass. Tests: unit(dev) * 'alternator-filterexpression' of git://github.com/nyh/scylla: alternator: avoid unnecessary conversion to string alternator: move some code out of executor.cc alternator: implement FilterExpression alternator: improve error path of attribute_type() function alternator: fix begins_with() error path alternator: fix corner case of contains() function in conditions alternator: refactor resolving of references in expressions	2020-06-14 19:42:46 +02:00
Avi Kivity	4220ed849b	Merge "Use abseil's hash map in a couple places" from Rafael " This is part of the work for replacing global sstring variables with constexpr std::string_view ones. To have std::string_view values we have to convert a few APIs to take std::string_view instead of sstring references. The API conversions are complicated by the fact that std::unordered_map doesn't support heterogeneous lookup, so we need another hash map. The one provided by abseil seems like a natural choice since it has an API that looks like what is being proposed for c++ (http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1690r0.html) but is also much faster. A nice side effect is that this series is a 0.46% win in perf_simple_query --duration 16 --smp 1 -m4G Over 500 runs with randomized section layout and environment on each run. " * 'espindola/absl-v10' of https://github.com/espindola/scylla: database: Use a flat_hash_map for _ks_cf_to_uuid database: Use flat_hash_map for _keyspaces Add absl wrapper headers build: Link with abseil cofigure: Don't overwrite seastar_cflags Add abseil as a submodule	2020-06-14 18:26:59 +03:00
Rafael Ávila de Espíndola	336d541f58	database: Use a flat_hash_map for _ks_cf_to_uuid Given that the key is a std::pair, we have to explicitly mark the hash and eq types as transparent for heterogeneous lookup to work. With that, pass std::string_view to a few functions that just check if a value is in the map. This increases the .text section by 11 KiB (0.03%). Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	6da9eef25f	database: Use flat_hash_map for _keyspaces This changes the hash map used for _keyspaces. Using a flat_hash_map allows using std::string_view in has_keyspace thanks to the heterogeneous lookup support. This add 200 KiB to .text, since this is the first use of absl and brings in files from the .a. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	dd0d4ae217	Add absl wrapper headers Using these instead of using the absl headers directly adds support for heterogeneous lookup with sstring as key. The is no gain from having the hash function inline, so this implements it in a .cc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	7d1f6725dd	build: Link with abseil It is a pity we have to list so many libraries, but abseil doesn't provide a .pc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	2ad09aefb6	cofigure: Don't overwrite seastar_cflags The variable seastar_cflags was being used for flags passed to seastar and for flags extracted from the seastar.pc file. This introduces a new variable for the flags extracted from the seastar.pc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	383a9c6da9	Add abseil as a submodule This adds the https://abseil.io library as a submodule. The patch series that follows needs a hash table that supports heterogeneous lookup, and abseil has a really good hash table that supports that (https://abseil.io/blog/20180927-swisstables). The library is still not available in Fedora, but it is fairly easy to use it directly from a submodule. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:37 -07:00
Avi Kivity	08313106ce	Merge 'Repair use table id instead of table name' from Asias " Use table_id instead of table_name in row level repair to find a table. It guarantees we repair the same table even if a table is dropped and a new table is created with the same name. Refs: #5942 " * asias-repair_use_table_id_instead_of_table_name: repair: Do not pass table names to repair_info repair: Add table_id to row_level_repair repair: Use table id to find a table in get_sharder_for_tables repair: Add table_ids to repair_info repair: Make func in tracker::run run inside a thread	2020-06-14 14:58:46 +03:00
Raphael S. Carvalho	9983fa8766	compaction_manager: Export backlog metric This backlog metric holds the sum of backlog for all the tables in the system. This is very useful for understanding the behavior of the backlog trackers. That's how we managed to fix most of backlog bugs like #6054, #6021, etc. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200612194908.39909-1-raphaelsc@scylladb.com>	2020-06-14 14:07:53 +03:00
Avi Kivity	76d082c2b2	Merge "Decouple client services from storage_service" from Pavel E " The cql_server and thrift are "owned" by storage_service for the sake of managing those, i.e. starting and stopping. Since other services (still) need the storage_service this creates dependencies loops. This set makes the client services independent from the storage service. As a consequence of it the auth service is also removed from storage_service and put standalone. This, in turn, sets some tests free from the need to start and stop auth and makes one step towards NOT join_cluster()-ing in unit tests. Also the set fixes few wierd races on scylla start and stop that can trigger local_is_initialized() asserts, and one case of unclear aborted shutdown when client services remain running till the scylla process exit. Yet another benefit is localization of "isolating" functionality that sits deeper in storage_service than it should. One thing that's not completely clean after it is the need for cql server to continue referencing the service_memory_limiter semaphore from the storage_service, but this will go away with one of the next sets. tests: unit(debug), manual start-stop, nodetool check of cql/thrift start/stop " * 'br-split-transport-1' of https://github.com/xemul/scylla: storage_service: Isolate isolator auth: Move away from storage_service auth: Move start-stop code into main main: Don't forget to stop cql/thrift when start is aborted thrift_controller: Switch on standalone thrift_controller: Pass one through management API thrift_controller: Move the code into thrift/ thrift_controller: Introduce own lock for management thrift: Wrap start/stop/is_running code into a class cql_controller: Switch on standalone cql_controller: Pass one through management API cql_controller: Move the code into transport/ cql_controller: Introduce own lock for management cql: Wrap start/stop/is_running code into a class api: Tune reg/unreg of client services control endpoints	2020-06-14 13:49:23 +03:00
Takuya ASADA	863293576c	scylla_setup: add swapfile setup Adding swapfile setup on scylla_setup. Fixes #6539	2020-06-14 13:18:51 +03:00
Amnon Heiman	06510a4752	service/storage_service.cc: Make effective_ownership preemptable A lot is going on when calculating effective ownership. For each node in the cluster, we need to go over all the ranges belong to that node and see if that node is the owner or not. This patch uses futurized loops with do_for_each so it would preempt if needed. The patch replaces the current for-loops with do_for_each and do_with but keeps the logic. Fixes #6380	2020-06-14 12:56:07 +03:00
Nadav Har'El	493d7e6716	alternator: avoid unnecessary conversion to string In a couple of places, where we already have a std::string_view, there is no need to convert to to a std::string (which requires allocation). One cool observation (by Piotr Sarna) is that map over std::string_view is fine, when the strings in the map are always string constants. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Nadav Har'El	8c026b9f10	alternator: move some code out of executor.cc The source file alternator/executor.cc has grown too much, reaching almost 4,000 lines. In this patch I move about 400 lines out of executor.cc: 1. Some functions related to serialization of sets and lists were moved to serialization.cc, 2. Functions related to evaluating parsed expressions were moved to expressions.cc. The header file expressions_eval.hh was also removed - the calculate_value() functions now live in expressions.cc, so we can just define them in expressions.hh, no need for a separate header files. This patch just moves code around. It doesn't make any functional changes. Refs #5783. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Nadav Har'El	0b9f25ab50	alternator: implement FilterExpression This patch provides a complete implementation for the FilterExpression parameter - the newer syntax for filtering the results of the Query or Scan operations. The implementation is pretty straightforward - we already added earlier a result-filtering framework to Alternator, and used it for the older filtering syntax - QuryFilter and ScanFilter. All we had to do now was to run the FilterExpression (which has the same syntax as a ConditionExpression) on each individual items. The previous cleanup patches were important to reduce the friction of running these expressions on the items. After the previous patches fixing small esoteric bugs in a few expression functions, with this patch all the tests in test_filter_expression.py now pass, and so do the two FilterExpression tests in test_query.py and test_scan.py. As far as I know (and of course minus any bugs we'll discover later), this marks the FilterExpression feature complete. Fixes #5038. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Nadav Har'El	f87259a762	alternator: improve error path of attribute_type() function The attribute_type() function, which can be used in expressions like ConditionExpression and FilterExpression, is supposed to generate an error if its second parameter is not one of the known types. What we did until now was to just report a failed check in this case. We already had a reproducing test with FilterExpression, but in this patch we also add a test with ConditionExpression - which fails before this patch and passes afterwards (and of course, passes with DynamoDB). Fixes #6641. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:20 +03:00
Nadav Har'El	11d86dfb06	alternator: fix begins_with() error path The begins_with() function should report an error if a constant is passed to it which isn't one of the supported types - string or bytes (e.g., a number). The code we had to check this had wrong logic, though. If the item attribute was also a number, we silently returned false, and didn't go on to detect that the second parameter - a constant - was a number too and should generate an error - not be silent. Fixed and added a reproducing test case and another test to validate my understanding of the type of parameters that begins_with() accepts. Fixes #6640. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:13:23 +03:00
Nadav Har'El	f79a4e0e78	alternator: fix corner case of contains() function in conditions It turns out that the contains() functions in the new syntax of conditions (ConditionExpression, FilterExpression) is not identical to the CONTAINS operator in the old-syntax conditions (Expected). In the new syntax, one can check whether any constant object is contained in a list. In the old syntax, the constant object must be of specific types. So we need to move the testing out of the check_CONTAINS() functions that both implementations used, and into just the implementation of the old syntax (in conditions.cc). This bug broke one of the FilterExpression tests, but this patch also adds new tests for the different behaviour of ConditionExpression and Expected - tests which also reproduce this issue and verify its fix. Fixes #6639. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:02:14 +03:00
Nadav Har'El	13ef31f38b	alternator: refactor resolving of references in expressions In the DynamoDB API, expressions (e.g., ConditionExpression and many more) may contain references to column names ("#name") or to values (":val") given in a separate part of the request - ExpressionAttributeNames and ExpressionAttributeValues respectively. Before this patch, we resolved these references as part of the expression's evaluation. This approach had two downsides: 1. It often misdiagnosed (both false negatives and false positives) cases of unused names and values in expressions. We already had two xfailing tests with examples - which pass after this patch. This patch also adds two additional tests, which failed before this patch and pass with it. 2. In one of the following patches we will add support for FilterExpression, where the same expression is used repeatedly on many items. It is a waste (as well as makes the code uglier) to resolve the same references again and again each time the expression is evaluated. We should be able to do it just once. So this patch introduces an intermediate step between parsing and evaluating an expression - "resolving" the expression. The new resolve_() functions modify the already parsed expression, replacing references to attribute names and constant values by the actual names and values taken from the request. The resolve_() functions also keep track which references were used, making it very easy to check (as DynamoDB does) if there are any unused names or values, before starting the evaluation. The interface of evaluate() functions become much simpler - they no longer need to know the original request (which was previously needed for ExpressionAttributeNames/Values), the table's schema (which was previously needed only for some error checking), keep track of which references were used. This simplification is helpful for using the expressions in contexts where these things (request and schema) are no longer conveniently available, namely in FilterExpression. A small side-benefit of this patch is that it moves a bit of code, which handled resolving of references in expressions, from executor.cc to expressions.cc. This is just the first step in a bigger effort to reduce the size of executor.cc by moving code to smaller source files. There is no attempt in this patch to move as much code as we can. We will move more code in a separate patch in this series. Fixes #6572. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 11:57:13 +03:00
Glauber Costa	b0a0c207c3	twcs: move implementations to its own file LCS and SCTS already have their own files, reducing the clutter in compaction_strategy.cc. Do the same for TWCS. I am doing this in preparation to add more functions. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200611230906.409023-6-glauber@scylladb.com>	2020-06-14 11:50:08 +03:00
Pavel Emelyanov	514a1580da	storage_service: Isolate isolator There is a code that isolates a node on disk error. After all the previous changes this code can be collected in one place (better to move it away from storage_service at all, but still). This simplifies the stop_transport(): now it can avoid rescheduling itself on shard 0 for the 2nd time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:14:33 +03:00
Pavel Emelyanov	60e283b23e	auth: Move away from storage_service Now after the auth start/stop is standalone, we can remove reference from storage service to it. This frees some tests from the need to carry the auth service around for nothing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:14:33 +03:00
Pavel Emelyanov	6a46721fb7	auth: Move start-stop code into main The auth service management is currently sitting in storage service, but it was needed there just for cql/thrift start code. After the latters has been moved away there are no other reasons for the auth to be integrated with the storage service, so move it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:14:33 +03:00
Pavel Emelyanov	3eaf6b3ec7	main: Don't forget to stop cql/thrift when start is aborted The defer action for stopping the storage_service is registered very late, after the cql and thrift started. If an error happens in between, these client-shutdown hooks will not be called. This is a problem with the hooks, but fixing it in hooks place is a big rework, so for now put fuses for cql and thrift individually -- both their stopping codes are re-entrable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:14:33 +03:00
Pavel Emelyanov	a1df24621c	thrift_controller: Switch on standalone Remove the on-storage_service instance and make everybody use th standalone one. Stopping the thrift is done by registering the controller in client service shutdown hooks. This automatically wires the stopping into drain, decommission and isolation codes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:14:33 +03:00
Pavel Emelyanov	c26943e7b5	thrift_controller: Pass one through management API The goal is to make the relevant endpoints work on standalone thrift controller instead of the storage_service's one, so prepare this controller (dummy for now) and pass it all the way down the API code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:14:33 +03:00
Pavel Emelyanov	3786bc40ec	thrift_controller: Move the code into thrift/ Pure moving, no functional changes. Also fix the indentation leaft unclean two patches back. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:14:18 +03:00
Pavel Emelyanov	98ccf9bccb	thrift_controller: Introduce own lock for management Currently start/stop of thrift is protected with storage_service's run_with_api_lock, but this protection is purely needed to guard start and stop against each other, not from anything else. For the sake of thrift management isolation it's worth having its own start-stop lock. This also decouples thrift code from storage_service's "isolated" thing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:14:18 +03:00
Pavel Emelyanov	1dfcd63d34	thrift: Wrap start/stop/is_running code into a class The plan is to decouple thrift management code from storage_service and move into thrift/ directory, so prepare for that by introducing a controller class. This leaves some unclean indentation in start/stop helpers to reduce the churn, it will be fixed two patches ahead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:14:09 +03:00
Pavel Emelyanov	1d5cdfe3c6	cql_controller: Switch on standalone Remove the on-storage_service instance and make everybody use th standalone one. Stopping the server is done by registering the controller in client service shutdown hooks. This automatically wires the stopping into drain, decommission and isolation codes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:14:09 +03:00
Pavel Emelyanov	7ebe44f33d	cql_controller: Pass one through management API The goal is to make the relevant endpoints work on standalone cql controller instead of the storage_service's one, so prepare this controller (dummy for now) and pass it all the way down the API code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:14:09 +03:00
Pavel Emelyanov	f048f3434f	cql_controller: Move the code into transport/ Pure moving, no functional changes. Also fix the indentation leaft unclean two patches back. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:13:41 +03:00
Pavel Emelyanov	2282a27f26	cql_controller: Introduce own lock for management Currently start/stop of cql is protected with storage_service's run_with_api_lock, but this protection is purely needed to guard start and stop against each other, not from anything else. For the sake of cql server isolation it's worth having its own start-stop lock. This also decouples cql code from storage_service's "isolated" thing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:13:41 +03:00
Pavel Emelyanov	7de23f44d2	cql: Wrap start/stop/is_running code into a class The plan is to decouple cql server management code from storage_service and move into transport/ directory, so prepare for that by introducing a controller class. This leaves some unclean indentation in start/stop helpers to reduce the churn, it will be fixed two patches ahead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:12:19 +03:00
Pavel Emelyanov	6a89c987e4	api: Tune reg/unreg of client services control endpoints Currntly API endpoints to start and stop cql_server and thrift are registered right after the storage service is started, but much earlier than those services are. In between these two points a lot of other stuff gets initialized. This opens a small window during which cql_server and thrift can be started by hand too early. The most obvious problem is -- the storage_service::join_cluster() may not yet be called, the auth service is thus not started, but starting cql/thrift needs auth. Another problem is those endpoints are not unregistered on stop, thus creating another way to start cql/thrif at wrong time. Also the endpoints registration change helps further patching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 18:47:24 +03:00
Piotr Dulikowski	e5b2218ad4	hinted handoff: use bool instead of send_state_set After restart_segment was removed from send_state enum, send_state_set now has only one possible element: segment_replay_failed. This patch removes send_state_set and uses bool in its place instead.	2020-06-12 16:10:20 +02:00
Piotr Dulikowski	6b34bb1a43	hinted handoff: update replay position on commitlog failure Hints manager uses commitlog framework to store and replay hints. The commitlog::read_log_file function is used for replaying hints. It reads commitlog entries and passes them to a callback. In case of hints manager, the callback calls manager::send_one_hint function. In case something goes wrong during this process, sending of that file is attempted again later. If the error was caused by hints that failed to be sent (e.g. due to network error), then we also advance _last_not_complete_rp field to the position of the first hint that failed. In the next retry, we will start reading from the commitlog from that position. However, current logic does not account for the case when an error occurs in the commitlog::read_log_file function itself. If, coincidentally, all hints sent by send_one_hint succeed, then we won't advance the _last_not_complete_rp field and we may unnecessarily repeat sending some of the hints that succeeded. This patch adds the send_one_file_ctx::last_sent_rp field, which keeps track of the last commitlog position for which a hint was attempted to be sent. In case read_log_file throws an error but all send_one_hint calls succeed, then it will be used to update _last_not_complete_rp. This will reduce the amount of hints that are resent in this case to only one. Tests: - unit(dev) - dtest(hintedhandoff_additional_test, dev)	2020-06-12 16:10:20 +02:00
Piotr Dulikowski	d369b538f0	hinted handoff: remove rps_set, use first_failed_rp instead When sending hints from one file, rps_set is used to keep track of positions of hints that are currently sent. If sending of a hint fails, its position is not removed from rps_set. If some hints fail to be sent while handling a hints file, the lowest position from rps_set is used to calculate the position from where to start when sending of the file is retried. Keeping track of commitlog positions this way isn't necessary to calculate this position. This patch removes rps_set and replaces it with first_failed_rp - which is just a single std::optional<db::replay_position>. This value is updated when a hint send failure is detected. This simplifies calculation of starting position for the next retry, and allowed to remove some error handling logic related to an edge case when inserting to rps_set fails. - unit(dev) - dtest(hintedhandoff_additional_test, dev)	2020-06-12 16:10:19 +02:00
Botond Dénes	218b7d5b85	docs/debugging.md: expand section about troubleshooting thread debugging Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200612065604.215204-1-bdenes@scylladb.com>	2020-06-12 09:54:02 +02:00
Avi Kivity	4e79296090	tracked_file_impl: inherit disk and memory alignment from underlying file tracked_file_impl is a wrapper around another file, that tracks memory allocated for buffers in order to control memory consumption. However, it neglects to inherit the disk and memory alignment settings from the wrapped file, which can cause unnecessarily-large buffers to be read from disk, reducing throughput. Fix by copying the alignment parameters. Fixes #6290.	2020-06-11 17:43:50 +03:00
Avi Kivity	5ceb20c439	build: default enable dpdk in release mode To reduce special cases for the build bots, default dpdk to enabled in release mode, keeping it disabled for debug and dev. To allow release modes without dpdk to be build, the --enable-dpdk switch is converted to a tri-state. When disabled, dpdk is disabled across all modes. Similarly when enabled the effect is global. When unspecified, dpdk is enabled for release mode only. After this change, reloc/build_reloc.sh no longer needs to specify --enable-dpdk, so remove it.	2020-06-11 17:24:16 +03:00
Avi Kivity	0dc78d38f1	build: remove zstd submodule Now that Fedora provides the zstd static library, we can remove the submodule. The frozen toolchain is regenerated to include the new package.	2020-06-11 17:12:49 +03:00
Eliran Sinvani	14520e843a	messagin service: fix execution order in messaging_service constructor The messaging service constructor's body does two main things in this order: 1. it registers the CLIENT_ID verb with rpc. 2. it initializes the scheduling mechanism in charge of locating the right scheduling group for each verb. The registration function uses the scheduling mechanism to determine the scheduling group for the verb. This commit simply reverses the order of execution. Fixes #6628	2020-06-11 12:14:10 +03:00
Raphael S. Carvalho	72ae76fb09	compaction: Fix a potential source of stalls for run-based strategies When compaction A completes, a request is issued so that all parallel compactions will replace compaction A's input sstables by respective output sstables, in the SSTable set snapshot used for expiration purposes. That's done to allow space of input SSTables to be released as soon as possible, helping a lot incremental compaction, but also the non-incremental approach. Recently I came to realization that we're copying the SSTable set, when doing the replacement, to make the code exception safe, but it turns out that if an exception is triggered, the compaction will fail anyway. So this copy is very useless and a potential source of reactor stalls if strategies like LCS is used. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200608192614.9354-1-raphaelsc@scylladb.com>	2020-06-10 18:44:44 +03:00
Avi Kivity	9afd599d7c	Merge 'range_streamer: Handle table of RF 1 in get_range_fetch_map' from Asias " After "Make replacing node take writes" series, with repair based node operations disabled, we saw the replace operation fail like: ``` [shard 0] init - Startup failed: std::runtime_error (unable to find sufficient sources for streaming range (9203926935651910749, +inf) in keyspace system_auth) ``` The reason is the system_auth keyspace has default RF of 1. It is impossible to find a source node to stream from for the ranges owned by the replaced node. In the past, the replace operation with keyspace of RF 1 passes, because the replacing node calls token_metadata.update_normal_tokens(tokens, ip_of_replacing_node) before streaming. We saw: ``` [shard 0] range_streamer - Bootstrap : keyspace system_auth range (-9021954492552185543, -9016289150131785593] exists on {127.0.0.6} ``` Node 127.0.0.6 is the replacing node 127.0.0.5. The source node check in range_streamer::get_range_fetch_map will pass if the source is the node itself. However, it will not stream from the node itself. As a result, the system_auth keyspace will not get any data. After the "Make replacing node take writes" series, the replacing node calls token_metadata.update_normal_tokens(tokens, ip_of_replacing_node) after the streaming finishes. We saw: ``` [shard 0] range_streamer - Bootstrap : keyspace system_auth range (-9049647518073030406, -9048297455405660225] exists on {127.0.0.5} ``` Since 127.0.0.5 was dead, the source node check failed, so the bootstrap operation. Ta fix, we ignore the table of RF 1 when it is unable to find a source node to stream. Fixes #6351 " * asias-fix_bootstrap_with_rf_one_in_range_streamer: range_streamer: Handle table of RF 1 in get_range_fetch_map streaming: Use separate streaming reason for replace operation	2020-06-10 16:03:13 +03:00
Rafael Ávila de Espíndola	555d8fe520	build: Be consistent about system versus regular headers We were not consistent about using '#include "foo.hh"' instead of '#include <foo.hh>' for scylla's own headers. This patch fixes that inconsistency and, to enforce it, changes the build to use -iquote instead of -I to find those headers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200608214208.110216-1-espindola@scylladb.com>	2020-06-10 15:49:51 +03:00
Gleb Natapov	d5b0cf975a	cql transport: get rid of unneeded shared_ptr There is no point to hold prepared_metadata in result_message::prepared as a shared_ptr since their lifetime match. Message-Id: <20200610113217.GF335449@scylladb.com>	2020-06-10 15:48:40 +03:00
Nadav Har'El	65d3e3992f	alternator test: small fixes for test_key_condition_expression_multi The test test_key_condition_expression_multi() had a small typo, which was hidden by the fact that the request was expected to fail for other reasons, but nevertheless should be fixed. Moreover, it appears that the Amazon DynamoDB changed their error message for this case, so running the test with "--aws" failed. So this patch makes it work again by being more forgiving on the exact error message. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200609205628.562351-1-nyh@scylladb.com>	2020-06-10 07:34:20 +02:00
Nadav Har'El	0c460927bf	alternator: cleanup - don't use unique_ptr when not needed In the existing Alternator code, we used std::unique_ptr<rjson::value> for passing the optional old value of an item read for a RMW operation. The benfit of this type over the simpler "const rjson::value" is that it gives the callee ownership of the item, and thus the ability to move parts of it into the response without copying them. We only used this ability in a handful of obscure cases involving ReturnedValues, but I am not going to break this dubious feature in this patch. Nevertheless, a lot of internal code, like condition checks, just needs read-only access to that previous item, so we passed a reference to the unique_ptr, i.e., "const std::unique_ptr<rjson::value>&". This is ugly, and also forces new code that wants to use the same condition checks (i.e., filtering code), to artificially allocate a unique_ptr just because that is what these functions expect. So in this patch, we change the utility functions such as verify_condition_expression() and everything they use, to pass around a "const rjson::value" instead of a "const std::unique_ptr<rjson::value>&. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200604131352.436506-1-nyh@scylladb.com>	2020-06-10 07:33:31 +02:00
Takuya ASADA	5bdd09d08a	supervisor: drop unused Upstart code, always use libsystemd Since we don't support Ubuntu 14.04 anymore, we can drop Upstart related code from supervisor.[cc\|hh]. Also, "#ifdef HAVE_LIBSYSTEMD" was for compiling Scylla on older distribution which does not provide libsystemd, we nolonger need this since we always build Scylla on latest Fedora. Dropping HAVE_LIBSYSTEMD also means removing libsystemd from optional_packages in configure.py, make it required library. Note that we still may run Scylla without systemd such as our Docker image, but sd_notify() does nothing when systemd does not detected, so we can ignore such case. Reference: https://www.freedesktop.org/software/systemd/man/sd_notify.html Reference: https://github.com/systemd/systemd/blob/master/src/libsystemd/sd-daemon/sd-daemon.c	2020-06-10 08:17:35 +03:00
Takuya ASADA	06bcbfc4c3	scylla_cpuscaling_setup: support Amazon Linux 2 Amazon Linux 2 has /usr/bin/cpupower, but does not have cpupower.service unlike CentOS7. We need to provide the .service file when distribution is Amazon Linux 2. Fixes #5977	2020-06-10 08:12:53 +03:00
Dejan Mircevski	9027b6636f	Use sstring_view in execute_cql and assertions This lets the functions operate on a wider variety of arguments and may also be faster. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-06-10 08:10:43 +03:00
Takuya ASADA	4c9369e449	scylla_bootparam_setup: support Amazon Linux 2 CentOS7 uses GRUB_CMDLINE_LINUX on /etc/default/grub, but Amazon Linux 2 only has GRUB_CMDLINE_LINUX_DEFAULT, we need to support both.	2020-06-10 08:05:12 +03:00
Raphael S. Carvalho	8663824589	sstable_directory: fix off-by-one when calculating number of jobs Number of jobs can be off-by-one if it's divisible by max threshold (max_sstables_per_job), which results in one extra unneeded resharding job. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200609163430.14155-1-raphaelsc@scylladb.com>	2020-06-09 19:36:40 +03:00
Asias He	a521c429e1	streaming: Do not send end of stream in case of error Current sender sends stream_mutation_fragments_cmd::end_of_stream to receiver when an error is received from a peer node. To be safe, send stream_mutation_fragments_cmd::error instead of stream_mutation_fragments_cmd::end_of_stream to prevent end_of_stream to be written into the sstable when a partition is not closed yet. In addition, use mutation_fragment_stream_validator to valid the mutation fragments emitted from the reader, e.g., check if partition_start and partition_end are paired when the reader is done. If not, fail the stream session and send stream_mutation_fragments_cmd::error instead of stream_mutation_fragments_cmd::end_of_stream to isolate the problematic sstables on the sender node. Refs: #6478	2020-06-09 18:46:12 +03:00
Glauber Costa	4025b22d13	distributed_loader: remove self-move assignment By mistake I ended up spilling the lambda capture idiom of x = std::move(x) into the function parameter list, which is invalid. Fix it. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200609141608.103665-1-glauber@scylladb.com>	2020-06-09 17:22:57 +03:00
Avi Kivity	94634d9945	Merge "Reshard SSTables before moving them from upload directory" from Glauber " This series allows for resharding SSTables (if needed) before SSTables are moved from the upload directory, instead of after. The infrastructure is supposed to be used soon to also load SSTables at boot. That, however, will take a bit longer as we need to reshape resharded SSTables for maximum benefit. That should benefit the upload directory as well, however the current series already presents high incremental value for upload directory and could be merged sooner (so I can focus on reshaping). For now, this series still keep the actual moving from upload directory to the main directory untouched. Once reshaping is ready, it will take care of this too. A new file with tests is introduced that tests the process of reading SSTables from an existing directory. dtests executed: migration_test.py (--smp 4), which previously failed " * 'upload-reshard-v8.1' of github.com:glommer/scylla: load_new_sstables: reshard before scanning the upload directory distributed_load: initial handling of off-strategy SSTables remove manifest_file filter from table. sstables: move open-related structures to their own file. sstables: store data size in foreign_sstable_open_info compaction: split compaction.hh header	2020-06-09 17:06:22 +03:00
Glauber Costa	8021d12371	load_new_sstables: reshard before scanning the upload directory In a later patch we will be able move files directly from upload into the main directory. However for now, for the benefit of doing this incrementally, we will first reshard in place with our new reshard infrastructure. load_new_sstables can then move the SSTables directly, without having to worry about resharding. This has the immediate benefit that the resharding happens: - in the streaming group, without affecting compaction work - without waiting for the current locks (which are held by compactions) in load_new_sstables to release. We could, at this point, just move the SSTables to the main directory right away. I am not doing this in this patch, and opting to keep the rest of upload process unchanged. This will be fixed later when we enable offstrategy compactions: we'll then compact the SSTables generated into the main directory. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-09 09:02:35 -04:00
Pekka Enberg	a41258b116	dist/ami: Remove obsolete AMI files The Scylla AMI has moved to the scylla-machine-image.git repository so let's the obsolete files from scylla.git. Suggested-by: Konstantin Osipov <kostja@scylladb.com> Acked-by: Bentsi Magidovich <bentsi@scylladb.com> Message-Id: <20200609105424.30237-1-penberg@scylladb.com>	2020-06-09 13:55:41 +03:00
Calle Wilund	5105e9f5e1	cdc::log: Missing "preimage" check in row deletion pre-image Fixes #6561 Pre-image generation in row deletion case only checked if we had a pre-image result set row. But that can be from post-image. Also check actual existance of the pre-image CK. Message-Id: <20200608132804.23541-1-calle@scylladb.com>	2020-06-09 10:56:41 +03:00
Glauber Costa	aebd965f0e	distributed_load: initial handling of off-strategy SSTables Off-strategy SSTables are SSTables that do not conform to the invariants that the compaction strategies define. Examples of offstrategy SSTables are SSTables acquired over bootstrap, resharding when the cpu count changes or imported from other databases through our upload directory. This patch introduces a new class, sstable_directory, that will handle SSTables that are present in a directory that is not one of the directories where the table expects its SSTables. There is much to be done to support off-strategy compactions fully. To make sure we make incremental progress, this patch implements enough code to handle resharding of SSTables in the upload directory. SSTables are resharded in place, before we start accessing the files. Later, we will take other steps before we finally move the SSTables into the main directory. But for now, starting with resharding will not only allow us to start small, but it will also allow us to start unleashing much needed cleanups in many places. For instance, once we start resharding on boot before making the SSTables available, we will be able to expurge all places in Scylla where, during normal operations, we have extra handler code for the fact that SSTables could be shared. Tests: a new test is added and it passes in debug mode. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-08 16:06:00 -04:00
Glauber Costa	e48ad3dc23	remove manifest_file filter from table. When we are scanning an sstable directory, we want to filter out the manifest file in most situations. The table class has a filter for that, but it is a static filter that doesn't depend on table for anything. We are better off removing it and putting in another independent location. While it seems wasteful to use a new header just for that, this header will soon be populated with the sstable_directory class. Tests: unit (dev) Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-08 16:06:00 -04:00
Glauber Costa	fd89e9f740	sstables: move open-related structures to their own file. sstables/sstables.hh is one of our heaviest headers and it's better that we don't include it if possible. For some users, like distributed_loader, we are mostly interested in knowing the shape of structures used to open an SSTable. They are: - the entry_descriptor, representing an SSTable that we are scanning on-disk - the sstable_open_info, representing information about a local, opened SSTable - the foreign_sstable_open_info, representing information about an opened SSTable that can cross shard boundaries. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-08 16:06:00 -04:00
Glauber Costa	8698221dd2	sstables: store data size in foreign_sstable_open_info In the new version of resharding we'll want to spread SSTables around the many shards based on their total size. This means we also need to know the size of each SSTable individually. We could wrap the foreign_sstable_info around another structure that keeps track of that, but because this structure exists mostly for resharding purposes anyway we will just add the data_size to it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-08 16:06:00 -04:00
Glauber Costa	3972628fc0	compaction: split compaction.hh header compaction.hh is one of our heavy headers, but some users just want to use information on it about how to describe a compaction, not how to perform one. For that reason this patch splits the compaction_descriptor into a new header. The compaction_descriptor has, as a member type, compaction_options. That is moved too, and brings with it the compaction_type. Both of those structures would make sense in a separate header anyway. The compaction_descriptor also wants the creator_fn and replacer_fn functions. We also take this opportunity to rename them into something more descriptive Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-08 16:06:00 -04:00
Piotr Sarna	2746a3597f	Update seastar submodule * seastar 42e77050...81242ccc (7): > demos: coroutine_demo: fix for SEASTAR_API_LEVEL >= 3 > core: Avoid warning on disable_backtrace_temporarily::_old being unused > future: Add a couple of friend declarations > Merge "net: make socket stack nothrow move constructible" from Benny > reactor: Avoid declaring _Unwind_RaiseException > future-util: Delete SEASTAR__WAIT_ALL__AVOID_ALLOCATION_WHEN_ALL_READY > file: io_priority_class: specify constructor as noexcept	2020-06-08 19:38:28 +02:00
Takuya ASADA	1e2509ffec	dist/offline_installer/debian: fix umask error same as redhat, makeself script changes current umask, scylla_setup causes "scylla does not work with current umask setting (0077)" error. To fix that we need use latest version of makeself, and specfiy --keep-umask option. See #6243	2020-06-08 20:06:21 +03:00
Takuya ASADA	4eae7f66eb	dist/offline_installer/debian: support cross build Unlike redhat version, debian version already supported cross build since it uses debootstrap, but the shellscript rejecting to continue build on non-debian distribution, so drop these lines to build on Fedora. [avi: regenerate toolchain]	2020-06-08 19:54:09 +03:00
Takuya ASADA	058da69a3b	dist/debian/python3: cleanup build/debian, rename build directory This is scylla-python3 version of #6611, but we also need to rename .deb build directory for scylla-python3, since we may lose .deb when building both scylla and scylla-python3 .deb package, since we currently sharing build directory. So renamed it to build/python3/debian.	2020-06-08 15:49:22 +03:00
Takuya ASADA	260d264d3c	dist/debian: cleanup build/debian before building .deb On `287d6e5`, we stopped to rm -rf debian/ on build_deb.sh, since now we have prebuilt debian/ directory. However, it might cause .deb build error when we modified debian package source, since it never cleanup. To prevent build error, we need to cleanup build/debian on reloc/build_deb.sh, before extracting contents from relocatable package.	2020-06-08 15:18:42 +03:00
Pavel Emelyanov	d908646b28	logalloc: Compact segments on reclaim instead of migration When reclaiming segments to the seastar the code tries to free the segments sequentially. For this it walks the segments from left to right and frees them, but every time a non-empty segment is met it gets migrated to another segment, that's allocated from the right end of the list. This is waste of cycles sometimes. The destination segment inherits the holes from the source one, and thus it will be compacted some time in the future. Why not compact it right at the reclamation time? It will take the same time or less, but will result in better compaction. To acheive this, the segment to be reclaimed is compacted with the existing compact_segment_locked() code with some special care around it. 1. The allocation of new segments from seastar is locked 2. The reclaiming of segments with evict-and-compact is locked as well 3. The emergency pool is opened (the compaction is called with non-empty reserve to avoid bad_alloc exception throw in the middle of compaction) 4. The segment is forcibly removed from the histogram and the closed_occupancy is updated just like it is with general compaction The segments-migration auxiliary code can be removed after this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-08 14:07:35 +03:00
Pavel Emelyanov	4db6ef7b6d	logallog: Introduce RAII allocation lock The lock disables the segment_pool to call for more segments from the underlying allocator. To be used in next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-08 14:07:30 +03:00
Pavel Emelyanov	2005aca444	logalloc: Shuffle code around region::impl::compact This includes 3 small changes to facilitate next patching: - rename region::impl::compact into compact_segment_locked - merging former compact with compact_single_segment_locked - moving log print and stats update into compact_segment_locked Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-08 14:06:45 +03:00
Kamil Braun	013330199d	cdc/storage_proxy: keep cdc_service alive in storage_proxy operations storage_proxy is never deinitialized, so it may have still used cdc_service after its destructor was called. This fixes the problem by cdc_service inheriting from async_sharded_service and storage_proxy calling shared_from_this on the service whenever it uses it. cdc_service inherits from async_sharded_service and not simply from enable_shared_from_this, because there might be other services that cdc_service depends on. Assuming that these services are deinitialized after cdc_service (as they should), i.e. after stop() is called on cdc_service, making cdc_service async_sharded_service will keep their deinitialization code from being called until all references to cdc_service disappear (async_sharded_service keeps stop() from returning until this happens). Some more improvements should be possible through some refactoring: 1. Make augment_mutation_call a free function, not a member of cdc_service: it doesn't need any state that cdc_service has. db_context can be passed down from storage_proxy when it calls the function. 2. Remove the storage_proxy -> cdc_service reference. storage_proxy only needs augment_mutation_call, which would not be a part of the service. This would also get rid of the proxy -> cdc -> proxy reference cycle that we have now, and would allow storage_proxy to be safely deinitialized after cdc_service. 3. Maybe we could even remove the cdc_service -> storage_proxy reference. Is it really needed?	2020-06-08 13:25:51 +03:00
Pavel Emelyanov	8c81c6b7aa	logalloc: Do not lock reclaimer twice The tracker::impl::reclaim is already in reclaim-locked section, no need for yet another nested lock. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-08 13:14:33 +03:00
Pavel Emelyanov	0392c5ca77	logalloc: Do not calculate object size twice When walking objects on compaction the migrator->size() virtual fn is called twice. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-08 13:14:33 +03:00
Pavel Emelyanov	81c9c4c7b2	logalloc: Do not convert obj_desc to migrator back and forth When calling alloc_small the migrator is passed just to get the object descriptor, but during compaction the descriptor is already at hands, so no need to re-get it again. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-08 13:14:33 +03:00
Takuya ASADA	969c4258cf	aws: update enhanced networking supported instance list Sync enhanced networking supported instance list to latest one. Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html Fixes #6540	2020-06-08 12:48:36 +03:00
Takuya ASADA	bebaaa038f	dist/debian: fix node-exporter.service file name Since `287d6e5`, we mistakenly packaging node-exporter.service in wrong name on .deb, need to rename in correct name. Fixes #6604	2020-06-08 12:39:18 +03:00
Asias He	dddde33512	gossip: Do not send shutdown message when a node is in unknown status When a replacing node is in early boot up and is not in HIBERNATE sate yet, if the node is killed by a user, the node will wrongly send a shutdown message to other nodes. This is because UNKNOWN is not in SILENT_SHUTDOWN_STATES, so in gossiper::do_stop_gossiping, the node will send shutdown message. Other nodes in the cluster will call storage_service::handle_state_normal for this node, since NORMAL and SHUTDOWN status share the same status handler. As a result, other nodes will incorrectly think the node is part of the cluster and the replace operation is finished. Such problem was seen in replace_node_no_hibernate_state_test dtest: n1, n2 are in the cluster n2 is dead n3 is started to replace n2, but n3 is killed in the middle n3 announces SHUTDOWN status wrongly n1 runs storage_service::handle_state_normal for n3 n1 get tokens for n3 which is empty, because n3 hasn't gossip tokens yet n1 skips update normal tokens for n3, but think n3 has replaced n2 n4 starts to replace n2 n4 checks the tokens for n2 in storage_service::join_token_ring (Cannot replace token {} which does not exist!) or storage_service::prepare_replacement_info (Cannot replace_address {} because it doesn't exist in gossip) To fix, we add UNKNOWN into SILENT_SHUTDOWN_STATES and avoid sending shutdown message. Tests: replace_address_test.py:TestReplaceAddress.replace_node_no_hibernate_state_test Fixes: #6436	2020-06-08 11:32:23 +02:00
Pavel Solodovnikov	6f6e6762ba	cql: remove unused functions It seems that the following functions are never used, delete them: * `function::has_reference_to` * `functions::get_overload_count` * `to_identifiers` in column_identifier.hh * `single_column_relation::get_map_key` Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200606115149.1770453-1-pa.solodovnikov@scylladb.com>	2020-06-08 11:28:57 +03:00
Piotr Sarna	3458bd2e32	db,view: fix outdated comments Some comments still referred to variable names which are no longer up-to-date. Follow-up for #6560. Message-Id: <2b857ccc900dd64f0d9379f5d6c87fd3aaa5d902.1591594042.git.sarna@scylladb.com>	2020-06-08 09:02:10 +03:00
Nadav Har'El	d6626c217a	merge: add error injection to mv Merged pull request https://github.com/scylladb/scylla/pull/6516 from Piotr Sarna: This series adds error injection points to materialized view paths: view update generation from staging sstables; view building; generating view updates from user writes. This series comes with a corresponding dtest pull request which adds some test cases based on error injection. Fixes #6488	2020-06-07 19:23:23 +03:00
Avi Kivity	53a19fc1f2	Merge 'Debian version number fix' from Takuya " Now we generate dist/changelog on relocatable package generation time, we cannot run '.rc' fixup on .deb package building time, need to do it in debian_files_gen.py. Also, we uses '_' in version number for some test version packages, which does not supported in .deb packaging system, need to replaced with '-'. " * syuu1228-debian_version_number_fix: dist/debian: support version number containing '_' dist/debian: move version number fixup to debian_files_gen.py	2020-06-07 19:14:24 +03:00
Piotr Sarna	b3a6a33487	db,view: ensure that local updates are applied locally In current mutate_MV() code it's possible for a local endpoint to become a target for a network operation. That's the source of occasional `broken promise` benign error messages appearing, since the mutation is actually applied locally, so there's no point in creating a write response handler - the node will not send a response to itself via network. While at it, the code is deduplicated a little bit - with the paths simplified, it's easier to ensure that a local endpoint is never listed as a target for remote network operations. Fixes #5459 Tests: unit(dev), dtest(materialized_views_test.TestMaterializedViews.add_dc_during_mv_insert_test)	2020-06-07 19:10:03 +03:00
Kamil Braun	a1e235b1a4	CDC: Don't split collection tombstone away from base update Overwriting a collection cell using timestamp T is a process with following steps: 1. inserting a row marker (if applicable) with timestamp T; 2. writing a collection tombstone with timestamp T-1; 3. writing the new collection value with timestamp T. Since CDC does clustering of the operations by timestamp, this would result in 3 separate calls to `transform` (in case of INSERT, or 2 - in the case of UPDATE), which seems excessive, especially when pre-/postimage is enabled. This patch makes collection tombstones being treated as if they had the same TS as the base write and thus they are processed in one call to `transform` (as long as TTLs are not used). Also, `cdc_test` had to be updated in places that relied on former splitting strategy. Fixes #6084	2020-06-07 17:09:05 +03:00
Tomasz Grabiec	c1df00859e	sstables: Make deletion_time printable Message-Id: <1591387901-7974-12-git-send-email-tgrabiec@scylladb.com>	2020-06-07 13:55:34 +03:00
Raphael S. Carvalho	8e47f61df7	compaction: Enable tombstone expiration based on the presence of the sstable set For tombstone expiration to proceed correctly without the risk of resurrecting data, the sstable set must be present. Regular compaction and derivatives provide the sstable set, so they're able to expire tombstones with no resurrection risk. Resharding, on the other hand, can run on any shard, not necessarily on the same shard that one of the input sstables belongs to, so it currently cannot provide a sstable set for tombstone expiration to proceed safely. That being said, let's only do expiration based on the presence of the set. This makes room for the sstable set to be feeded to compaction via descriptor, allowing even resharding to do expiration. Currently, compaction thinks that sstable set can only come from the table, and that also needs to be changed for further flexibility. It's theoretically possible that a given resharding job will resurrect data if a fully expired SSTable is resharded at a shard which it doesn't belong to. Resharding will have no way to tell that expiring all that data will lead to resurrection because the relevant SSTables are at different shards. This is fixed by checking for fully expired sstables only on presence of the sstable set. Fixes #6600. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200605200954.24696-1-raphaelsc@scylladb.com>	2020-06-07 11:46:48 +03:00
Pavel Solodovnikov	5b1b6b1395	cql: pass `cql3::operation::raw_deletion` by unique_ptr Another small step towards shared_ptr usage reduction in cql3 code. Also make `raw_deletion` dtor virtual to make address sanitizer happy in debug builds. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200606104528.1732241-1-pa.solodovnikov@scylladb.com>	2020-06-06 21:04:06 +03:00
Juliusz Stasiewicz	0ad50013ff	storage_service: Implementation of API call to repair CDC streams The command regenerates streams when: - generations corresponding to a gossiped timestamp cannot be fetched from `system_distributed` table, - or when generation token ranges do not align with token metadata. In such case the streams are regenerated and new timestamp is gossiped around. The returned JSON is always empty, regardless of whether streams needed regeneration or not.	2020-06-06 16:52:21 +02:00
Takuya ASADA	9de65f26de	dist/debian: support version number containing '_' .deb packaging system does not support version number contains '_', it should be replacedwith '-'	2020-06-05 21:35:02 +09:00
Takuya ASADA	509ad875aa	dist/debian: move version number fixup to debian_files_gen.py Now we generate dist/changelog on relocatable package generation time, we cannot run '.rc' fixup on .deb package building time, need to do it in debian_files_gen.py.	2020-06-05 21:34:55 +09:00
Kamil Braun	1b7f1806ac	test: improve comments on test_schema_digest_does_not_change This test tends to cause a lot of discussion resulting from not understanding what is actually being tested. Closes https://github.com/scylladb/scylla/issues/6582.	2020-06-05 14:30:02 +02:00
Kamil Braun	d89b7a0548	cdc: rename CDC description tables Commit `968177da04` has changed the schema of cdc_topology_description and cdc_description tables in the system_distributed keyspace. Unfortunately this was a backwards-incompatible change: these tables would always be created, irrespective of whether or not "experimental" was enabled. They just wouldn't be populated with experimental=off. If the user now tries to upgrade Scylla from a version before this change to a version after this change, it will work as long as CDC is protected b the experimental flag and the flag is off. However, if we drop the flag, or if the user turns experimental on, weird things will happen, such as nodes refusing to start because they try to populate cdc_topology_description while assuming a different schema for this table. The simplest fix for this problem is to rename the tables. This fix must get merged in before CDC goes out of experimental. If the user upgrades his cluster from a pre-rename version, he will simply have two garbage tables that he is free to delete after upgrading. sstables and digests need to be regenerated for schema_digest_test since this commit effectively adds new tables to the system_distributed keyspace. This doesn't result in schema disagreement because the table is announced to all nodes through the migration manager.	2020-06-05 09:59:16 +02:00
Piotr Sarna	64b8b77ac2	table: add error injection points to the materialized view path ... in order to be able to test scenarios with failures.	2020-06-05 09:39:58 +02:00
Piotr Sarna	76e89efc1a	db,view: add error injection points to view building ... in order to be able to test scenarios with failures.	2020-06-05 09:39:58 +02:00
Piotr Sarna	9d524a7a7e	db,view: add error injection points to view update generator ... in order to be able to test scenarios with failures.	2020-06-05 09:39:58 +02:00
Piotr Sarna	9a4394327a	Merge 'CDC: Disallowed CDC for tables with counter column(s)' from Juliusz. CDC for counters is unimplemented as of now, therefore any attempt to enable CDC log on counter table needs to be clearly disallowed. This patch does exactly this. The check whether schema has counter columns is performed in `cdc_service::impl` in: - `on_before_create_column_family`, - `on_before_update_column_family` and, if so, results in `invalid_request_exception` thrown. Fixes #6553 * jul-stas-6553-disallow-cdc-for-counters: test/cql: Check that CDC for counters is disallowed CDC: Disallowed CDC for tables with counter column(s)	2020-06-05 07:46:53 +02:00
Nadav Har'El	ace1697aa9	alternator test: reproducer for unjustly refused condition expression This patch adds a test reproducing issue #6572, where the perfectly good condition expression: #name1 = :val1 OR #name2 = :val2 Gets refused because of the following combination in our implementation: 1. Short-circuit evaluation, i.e., after we discover #name1 = :val1 we don't evaluate the second half of the expression. 2. The list of "used" references is collected at evaluation time, instead of at parsing time. Because evaluation never reaches #name2 (or :val2) our implementation complains that they are not used, and refuses the request - which should have been allowed. This test xfails on Alternator. It passes on DynamoDB. Refs #6572 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200604171954.444291-1-nyh@scylladb.com>	2020-06-05 07:43:50 +02:00
Piotr Sarna	0ba23d2b40	test: add manual test for tagging return value While not very interesting by itself, the test case shows that in case of TagResource and UntagResource it's actually correct to return empty HTTP body instead of an empty JSON object, which was the case for PutItem. Message-Id: <6331963179c5174a695f0e9eeed17de6c9f9a3be.1591269516.git.sarna@scylladb.com>	2020-06-04 16:17:24 +03:00
Nadav Har'El	db45ff2733	alternator: clean up usage of describe_item() The DynamoDB GetItem request returns the requested item in a specific way, wrapped in a map with a "Item" member. For historic reasons, we used the same function that returns this (describe_item()) also in other code which reads items - e.g. for checking conditional operations. The result is wasteful - after adding this "Item" member we had other code to extract it, all for no good reason. It is also ugly and confusing. Importantly, this situation also makes it harder for me to add support for FilterExpression. The issue is that the expression evaluator got the item with the wrapper (from the existing ConditionExpression code) but the filtering code had it without this wrapper, as it didn't use describe_item(). So this patch uses describe_single_item(), which doesn't add the wrapper map, instead of describe_item(). The latter function is used just once - to implement GetItem. The unnecessary code to unwrap the item in multiple places was then dropped. All the tests still pass. I also tested test_expected.py in unsafe_rmw write isolation mode, because code only for this mode had to be modified as well. Refs #5038. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200604092050.422092-1-nyh@scylladb.com>	2020-06-04 12:33:48 +02:00
Nadav Har'El	3d26bde4c1	alternator doc: correct state of filtering support Correct the compatibility section in docs/alternator/alternator.md: Filtering of Scan/Query results using the older syntax (ScanFilter, QueryFilter) is, after commit `bea9629031`, now fully supported. The newer syntax (FilterExpression) is not yet. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200604073207.416860-1-nyh@scylladb.com>	2020-06-04 12:33:10 +02:00
Avi Kivity	5b92a6d9e4	build: drop __pycache__ directories from python3 relocatable package Recently ./reloc/build_deb.sh started failing with dpkg-source: info: using source format '1.0' dpkg-source: info: building scylla-python3 using existing scylla-python3_3.8.3-0.20200604.77dfa4f15.orig.tar.gz dpkg-source: info: building scylla-python3 in scylla-python3_3.8.3-0.20200604.77dfa4f15-1.diff.gz dpkg-source: error: cannot represent change to scylla-python3/lib64/python3.8/site-packages/urllib3/packages/backports/__pycache__/__init__.cpython-38.pyc: dpkg-source: error: new version is plain file dpkg-source: error: old version is symlink to /usr/lib/python3.8/site-packages/__pycache__/six.cpython-38.pyc dpkg-source: error: unrepresentable changes to source dpkg-buildpackage: error: dpkg-source -b . subprocess returned exit status 1 debuild: fatal error at line 1182: Those files are not in fact symlinks, so it's clear that dpkg is confused about something. Rather than debug dpkg, however, it's easier to just drop __pycache__ directories. These hold the result of bytecode compilation and are therefore optional, as Python will compile the sources if the cache is not populated. Fixes #6584.	2020-06-04 13:04:34 +03:00
Israel Fruchter	a2bb48f44b	fix "scylla_coredump_setup: Remove the coredump create by the check" In 28c3d4 `out()` was used without `shell=True` and was the spliting of arguments failed cause of the complex commands in the cmd (pipe and such) Fixes #6159	2020-06-04 12:55:10 +03:00
Raphael S. Carvalho	77dfa4f151	sstables: kill unused resharding code output_sstables is no longer needed after we made resharding use a special interposer. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200603165324.176665-1-raphaelsc@scylladb.com>	2020-06-03 23:20:15 +03:00
Avi Kivity	0c34e114e2	Merge "Upgrade to seastar api version 3" (make_file_output_stream returns future) from Rafael " The new seastar api changes make_file_output_stream and make_file_data_sink to return futures. This series includes a few refactoring patches and the actual transition. " * 'espindola/api-v3-v3' of https://github.com/espindola/scylla: table: Fix indentation everywhere: Move to seastar api level 3 sstables: Pass an output_stream to make_compressed_file_.*_format_output_stream sstables: Pass a data_sink to checksummed_file_writer's constructor sstables: Convert a file_writer constructor to a static make sstables: Move file_writer constructor out of line	2020-06-03 23:09:49 +03:00
Rafael Ávila de Espíndola	686f9220c1	table: Fix indentation It was broken by the previous commit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-03 10:32:46 -07:00
Rafael Ávila de Espíndola	e5876f6696	everywhere: Move to seastar api level 3 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-03 10:32:46 -07:00
Rafael Ávila de Espíndola	13282b3d4c	sstables: Pass an output_stream to make_compressed_file_.*_format_output_stream This is a bit simpler as we don't have to pass in the options and moves the calls to make_file_output_stream to places where we can handle futures. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-03 10:32:46 -07:00
Rafael Ávila de Espíndola	f6ec7364a7	sstables: Pass a data_sink to checksummed_file_writer's constructor checksummed_file_writer cannot be moved, so we can't have a checksummed_file_writer::make that returns a future. So instead we pass in a data_sink and let the callers call make_file_data_sink. This is in preparation for make_file_data_sink returning a future in the seastar api v3. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-03 10:32:46 -07:00
Rafael Ávila de Espíndola	c1f37db72b	sstables: Convert a file_writer constructor to a static make For now it always returns a ready future. This is in preparation for using seastar v3 api where make_file_output_stream returns a future. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-03 10:32:45 -07:00
Rafael Ávila de Espíndola	0bc4f3683a	sstables: Move file_writer constructor out of line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-03 10:21:29 -07:00
Juliusz Stasiewicz	bf4050ed15	test/cql: Check that CDC for counters is disallowed This test must be removed once we have implementation of CDC for tables with counter columns.	2020-06-03 18:31:44 +02:00
Juliusz Stasiewicz	3a079cf21b	CDC: Disallowed CDC for tables with counter column(s) Until we get implementation of CDC for counters, we explicitly disallow it. The check is performed in `cdc_service::impl` in: - `on_before_create_column_family`, - `on_before_update_column_family` and results in `invalid_request_exception` thrown.	2020-06-03 18:29:36 +02:00
Avi Kivity	86d7f2f91b	Update seastar submodule * seastar 9066edd512...42e770508c (15): > Revert "sharded: constrain sharded::map_reduce0" > tls: Fix race/unhandled case in reloadable_certificates > fair_queue: rename operator< to strictly_less > future: Add a current_exception_future_marker > Merge "Avoid passing non nothrow move constructible lambdas to future::then" from Rafael > tls_echo_server_demo: main: capture server post stop() > tests: fstream: remove obsolete comments about running in background > everywhere: Reopen inline namespaces as inline > Merge "Merge the two do_with implementations" from Rafael > sharded: constrain sharded::map_reduce0 > Merge "Backtracing across tasks" from Tomasz > posix-stack: fix strict aliasing violations on CMSG_DATA(cmsghdr) > sharded: unify invoke_on_*() variants > sharded_parameter_demo: Delete unused member variable > futures_test: Fix delete of copy constructor	2020-06-03 19:18:27 +03:00
Botond Dénes	72b8a2d147	querier: move common stuff into querier_base The querier cache expects all querier objects it stores to have certain methods. To avoid accessing these via `std::visit()` (the querier object is stored in an `std::variant`), we move all the stuff that is common to all querier types into a base class. The querier cache now accesses the members via a reference to this common base. Additionally the variant is eliminated completely and the cache entry stores an `std::unique_ptr<querier_base>` instead. Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200603152544.83704-1-bdenes@scylladb.com>	2020-06-03 18:45:33 +03:00
Raphael S. Carvalho	077b4ee97d	table: Don't remove a SSTable from the backlog tracker if not previously added After `7f1a215`, a sstable is only added to backlog tracker if sstable::shared() returns true. sstable::shared() can return true for a sstable that is actually owned by more than one shard, but it can also incorrectly return true for a sstable which wasn't made explicitly unshared through set_unshared(). A recent work of mine is getting rid of set_unshared() because a sstable has the knowledge to determine whether or not it's shared. The problem starts with streaming sstable which hasn't set_unshared() called for it, so it won't be added to backlog tracker, but it can be eventually removed from the tracker when that sstable is compacted. Also, it could happen that a shared sstable, which was resharded, will be removed from the tracker even though it wasn't previously added. When those problems happen, backlog tracker will have an incorrect account of total bytes, which leads it to producing incorrect backlogs that can potentially go negative. These problems are fixed by making every add / removal go through functions which take into account sstable::shared(). Fixes #6227. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200512220226.134481-2-raphaelsc@scylladb.com>	2020-06-03 17:35:22 +03:00
Raphael S. Carvalho	fb6976f1b9	Make sure SSTables created by streaming are added to backlog tracker New SStables are only added to backlog tracker if set_unshared() was called on their behalf. SStables created for streaming are not being added to the tracker because make_streaming_sstable_for_write() doesn't call set_unshared() nor does it caller. Which results in backlog not accounting for their existence, which means backlog will be much lower than expected. This problem could be fixed by adding a set_unshared() call but it turns out we don't even need set_unshared() anymore. It was introduced when Scylla metadata didn't exist, now a SSTable has built-in knowledge of whether or not it's shared. Relying on every SSTable creator calling set_unshared() is bug prone. Let's get rid of it and let the SStable itself say whether or not it's shared. If an imported SSTable has not Scylla metadata, Scylla will still be able to compute shards using token range metadata. Refs #6021. Refs #6227. Fixes #6441. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200512220226.134481-1-raphaelsc@scylladb.com>	2020-06-03 17:35:22 +03:00
Tomasz Grabiec	087fa42c1d	Merge "utils: inject errors around paxos stages" from Alejo Add Paxos error injections before/after save promise, proposal, decision, paxos_response_handler, delete decision. Adds a method to inject an error providing a lambda while avoiding to add a continuation when the error injection is disabled. For this provide error exception and enter() to allow flow control (i.e. return) on simple error injections without lambdas. Also includes Pavel's patch for CQL API for error injections, updated to current error injection API and added one_shot support. Also added some basic CQL API boost tests. For CQL API there's a limitation of the current grammar not supporting f(<terminal>) so values have to be inserted in a table until this is resolved. See #5411 * https://github.com/alecco/scylla/tree/error_injection_v11: paxos: fix indentation paxos: add error injections utils: add timeout error injection with lambda utils: error injection add enter() for control flow utils: error injections provide error exceptions failure_injector: implement CQL API for failure injector class lwt: fix disabled error injection templates	2020-06-03 15:42:10 +02:00
Piotr Sarna	8fc3ca855e	alternator: fix the return type of PutItem Even if there are no attributes to return from PutItem requests, we should return a valid JSON object, not an empty string. Fixes #6568 Tests: unit(dev)	2020-06-03 16:03:13 +03:00
Piotr Sarna	3aff52f56e	alternator: fix returning UnprocessedKeys unconditionally Client libraries (e.g. PynamoDB) expect the UnprocessedKeys and UnprocessedItems attributes to appear in the response unconditionally - it's hereby added, along with a simple test case. Fixes #6569 Tests: unit(dev)	2020-06-03 15:48:16 +03:00
Alejo Sanchez	59d60ae672	paxos: fix indentation Fix indentation Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-06-03 14:47:18 +02:00
Alejo Sanchez	019c96cfda	paxos: add error injections Adds error injections on critical points for: prepare accept learn release_semaphore_for_key Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-06-03 14:44:53 +02:00
Alejo Sanchez	a8b14b0227	utils: add timeout error injection with lambda Even though calling then() on a ready future does not allocate a continuation, calling then on the result of it will allocate. This error injection only adds a continuation in the dependency chain if error injections are enabled at compile timeand this particular error injection is enabled. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-06-03 14:44:00 +02:00
Alejo Sanchez	0321172677	utils: error injection add enter() for control flow For control flow (i.e. return) and simplicity add enter() method. For disabled injections, this method is const returning false, therefore it has no overhead. Add boost test. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-06-03 14:42:48 +02:00
Nadav Har'El	bea9629031	alternator: implement remaining QueryFilter / ScanFilter functionality This patch implements the missing QueryFilter (and ScanFilter) functionality:` 1. All operators. Previously, only the "EQ" operator was implemented. 2. Either "OR" or "AND" of conditions (previously only "AND"). 3. Correctly returning Count and ScannedCount for post-filter and pre-filter item counts, respectively. All of the previously-xfailing tests in test_query_filter.py are now passing. The implementation in this patch abandons our previous attempts to translate the DynamoDB API filters into Scylla's CQL filters. Doing this correctly for all operators would have been exceedingly difficult (for reasons explained in #5028), and simply not worth the effort: CQL's filters receive a page of results and then filter them, and we can do exactly the same without CQL's filters: The new code just retrieves an unfiltered page of items, and then for each of these items checks whether it passes the filters. The great thing is that we already had code for this checking - the QueryFilter syntax is identical to the "Expected" syntax (for conditional operations) that we already supported, so we already had code for checking these conditions, including all the different operators. This patch prepares for the future need to support also the newer FilterExpression syntax (see issue #5038), and the "filter" class supports either type of filter - the implementation for the second syntax is just missing and can be added (fairly easily) later. Fixes #5028. Refs #5038. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200603110118.399325-1-nyh@scylladb.com>	2020-06-03 13:16:45 +02:00
Piotr Dulikowski	97cb2892b2	cdc: include information about all PKs in trace This fixes a bug in CDC mutation augmentation logic. A lambda that is called for each partition key in a batch captures a trace state pointer, but moves it out after being called for the first time. This caused CDC tracing information to be included only for one of the partition keys of the batch. Fixes #6575	2020-06-03 11:07:57 +02:00
Nadav Har'El	f6b1f45d69	alternator: fix order conditions on binary attributes We implemented the order operators (LT, GT, LE, GE, BETWEEN) incorrectly for binary attributes: DynamoDB requires that the bytes be treated as unsigned for the purpose of order (so byte 128 is higher than 127), but our implementation uses Scylla's "bytes" type which has signed bytes. The solution is simple - we can continue to use the "bytes" type, but we need to use its compare_unsigned() function, not its "<" operator. This bug affected conditional operations ("Expected" and "ConditionExpression") and also filters ("QueryFilter", "ScanFilter", "FilterExpression"). The bug did not affect Query's key conditions ("KeyConditions", "KeyConditionExpression") because those already used Scylla's key comparison functions - which correctly compare binary blobs as unsigned bytes (in fact, this is why we have the compare_unsigned() function). The patch also adds tests that reproduce the bugs in conditional operations, and show that the bug did not exist in key conditions. Fixes #6573 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200603084257.394136-1-nyh@scylladb.com>	2020-06-03 10:55:50 +02:00
Takuya ASADA	536ab4ebe4	reloc-pkg: move all files under project name directory To make unified relocatable package easily, we may want to merge tarballs to single tarball like this: zcat .tar.gz \| gzip -c > scylla-unified.tar.xz But it's not possible with current relocatable package format, since there are multiple files conflicts, install.sh, SCYLLA--FILE, dist/, README.md, etc.. To support this, we need to archive everything in the directory when building relocatable package. This is modifying relocatable package format, we need to provide a way to detect the format version. To do this, we added a new file ".relocatable_package_version" on the top of the archive, and set version number "2" to the file. Fixes #6315	2020-06-03 09:52:44 +03:00
Israel Fruchter	28c3d4f8e8	scylla_coredump_setup: Remove the coredump create by the check We generate a coredump as part of "scylla_coredump_setup" to verify that coredumps are working. However, we need to remove that test coredump to avoid people and test infrastructure reporting those coredumps. Fixes #6159	2020-06-03 09:30:45 +03:00
Pekka Enberg	bdd0fcd0b7	Revert "scylla_current_repo: support diffrent $PRODUCT" This reverts commit `e5da79c211` because the URLs are incorrect: both open source and enterprise repository URLs are in http://downloads.scylladb.com/rpm/centos/ or http://downloads.scylladb.com/deb/{debian,ubuntu}	2020-06-02 18:33:02 +03:00
Nadav Har'El	0d337a716b	alternator test: confirm understanding of query paging with filtering This test (which passes successfully on both Alternator and DynamoDB) was written to confirm our understanding of how the paging feature works. Our understanding, based on DynamoDB documentation, has been that the "Limit" parameter determines the number of pre-filtering items, not the actual number of items returned after having passed the filter. So the number of items actually returned may be lower than Limit - in some cases even zero. This test tries an extreme case: We scan a collection of 20 items with a filter matching only 10 (or so) of them, with Limit=1, and count the number of pages that we needed to request until collecting all these 10 (or so) matches. We note that the result is 21 - i.e., DynamoDB and Alternator really went through the 20 pre-filtering items one by one, and for the items which didn't match the filter returned an empty page. The last page (the 21st) is always empty: DynamoDB or Alternator doesn't know whether or not there is a 21st item, and it takes a 21st request to discover there isn't. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200602145015.361694-1-nyh@scylladb.com>	2020-06-02 16:57:49 +02:00
Nadav Har'El	43138c0e5e	alternator test: test Count/ScannedCount return of Query This test reproduces a bug in the current implementation of QueryFilter, which returns for ScannedCount the count of post-filter items, whereas it should return the pre-filter count. The test tests both ScannedCount and Count, when QueryFilter is used and when it isn't used. The test currently xfails on Alternator, passes on DynamoDB. Refs #5028 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200602125924.358636-1-nyh@scylladb.com>	2020-06-02 16:57:49 +02:00
Pekka Enberg	0b30df8f23	Merge 'scylla_coredump_setup: fix coredump directory mount' from Amos "Currently in coredump setup, we enabled a systemd mount to mount default coredump directory to /var/lib/scylla/coredump, but we didn't start it. So the coredump will still be saved to default coredump directory before a system reboot, it might touch enospc problem. One patch started the systemd mount during coredump setup, and make the mount effect. Another patch improved the error message of systemd unit, it's confused when the unit config is invalid." Fixes #6566 * 'coredump_conf' of git://github.com/amoskong/scylla: scylla_util/systemd_unit: improve the error message active the coredump directory mount during coredump setup	2020-06-02 17:56:19 +03:00
Juliusz Stasiewicz	e04fd9f774	counters: Read the state under timeout Counter update is a RMW operation. Until now the "Read" part was not guarded by a timeout, which is changed in this patch. Fixes #5069	2020-06-02 15:10:43 +03:00
Amos Kong	b2f59c9516	scylla_util/systemd_unit: improve the error message we always raise exception 'Unit xxx not found' when exception is raised in executing 'systemctl cat xxx'. Sometimes the error is confused. On OEL7, the 'systemctl cat var-lib-systemd-coredump.mount' will also verify the config content, scylla_coredump_setup failed for that the config file is invalid, but the error is 'unit var-lib-systemd-coredump.mount not found'. This patch improved the error message. Related issue: https://github.com/scylladb/scylla/issues/6432	2020-06-02 18:03:15 +08:00
Amos Kong	abf246f6e5	active the coredump directory mount during coredump setup Currently we use a systemd mount (var-lib-systemd-coredump.mount) to mount default coredump directory (/var/lib/systemd/coredump) to (/var/lib/scylla/coredump). The /var/lib/scylla had been mounted to a big storage, so we will have enough space for coredump after the mount. Currently in coredump_setup, we only enabled var-lib-systemd-coredump.mount, but not start it. The directory won't be mounted after coredump_setup, so the coredump will still be saved to default coredump directory. The mount will only effect after reboot. Fixes #6566	2020-06-02 18:03:15 +08:00
Pekka Enberg	9d9d54c804	Revert "scylla_coredump_setup: Fix incorrect coredump directory mount" This reverts commit `e77dad3adf` because its incorrect. Amos explains: "Quote from https://www.freedesktop.org/software/systemd/man/systemd.mount.html What= Takes an absolute path of a device node, file or other resource to mount. See mount(8) for details. If this refers to a device node, a dependency on the respective device unit is automatically created. Where= Takes an absolute path of a file or directory for the mount point; in particular, the destination cannot be a symbolic link. If the mount point does not exist at the time of mounting, it is created as directory. So the mount point is '/var/lib/systemd/coredump' and '/var/lib/scylla/coredump' is the file to mount, because /var/lib/scylla had mounted a second big storage, which has enough space for Huge coredumps. Bentsi or other touched problem with old scylla-master AMI, a coredump occurred but not successfully saved to disk for enospc. The directory /var/lib/systemd/coredump wasn't mounted to /var/lib/scylla/coredump. They WRONGLY thought the wrong mount was caused by the config problem, so he posted a fix. Actually scylla-ami-setup / coredump wasn't executed on that AMI, err: unit scylla-ami-setup.service not found Because 'scylla-ami-setup.service' config file doesn't exist or is invalid. Details of my testing: https://github.com/scylladb/scylla/issues/6300#issuecomment-637324507 So we need to revert Bentsi's patch, it changed the right config to wrong."	2020-06-02 11:41:31 +03:00
Avi Kivity	6f394e8e90	tombstone: use comparison operator instead of ad-hoc compare() function and with_relational_operators The comparison operator (<=>) default implementation happens to exactly match tombstone::compare(), so use the compiler-generated defaults. Also default operator== and operator!= (these are not brought in by operator<=>). These become slightly faster as they perform just an equality comparison, not three-way compare. shadowable_tombstone and row_tombstone depend on tombstone::compare(), so convert them too in a similar way. with_relational_operations.hh becomes unused, so delete it. Tests: unit (dev) Message-Id: <20200602055626.2874801-1-avi@scylladb.com>	2020-06-02 09:28:52 +03:00
Piotr Sarna	160e2b06f9	test: move random string helpers to .cc ... since there's no reason for them to reside in a header, and .cc is our default destination. Message-Id: <2509410f0f71df036a7829f1f799503c1a671404.1591078777.git.sarna@scylladb.com>	2020-06-02 09:27:59 +03:00
Avi Kivity	a4c44cab88	treewide: update concepts language from the Concepts TS to C++20 Seastar recently lost support for the experimental Concepts Technical Specification (TS) and gained support for C++20 concepts. Re-enable concepts in Scylla by updating our use of concepts to the C++20 standard. This change: - peels off uses of the GCC6_CONCEPT macro - removes inclusions of <seastar/gcc6-concepts.hh> - replaces function-style concepts (no longer supported) with equation-style concepts - semicolons added and removed as needed - deprecated std::is_pod replaced by recommended replacement - updates return type constraints to use concepts instead of type names (either std::same_as or std::convertible_to, with std::same_as chosen when possible) No attempt is made to improve the concepts; this is a specification update only. Message-Id: <20200531110254.2555854-1-avi@scylladb.com>	2020-06-02 09:12:21 +03:00
Nadav Har'El	c77bc5bf51	merge: big_decimal: migrate to open-coded implementation Merged patch series by Piotr Sarna: This series migrates the regex-based implementation of big decimal parsing to a more efficient one, based on string views. The series originated as a single patch, but was later extended by more tests and a microbenchmark. Perf results, comparing the old implementation, the new one, and the experimental one from v2 of this series are here: test iterations median mad min max Regex: 88895 11.228us 25.891ns 11.202us 11.510us String view: 232334 4.303us 21.660ns 4.282us 4.736us State machine (experimental, ditched): 148318 6.723us 51.896ns 6.672us 6.877us Tests: unit(dev) Piotr Sarna (4): big_decimal: migrate to string views test: add test cases to big_decimal_test test/lib: add generating random numeric string test: add big_decimal perf test configure.py \| 1 + test/boost/big_decimal_test.cc \| 29 +++++++++++++++++++ test/lib/make_random_string.hh \| 11 +++++++ test/perf/perf_big_decimal.cc \| 52 ++++++++++++++++++++++++++++++++++ utils/big_decimal.cc \| 51 ++++++++++++++++++++++----------- 5 files changed, 127 insertions(+), 17 deletions(-)	2020-06-02 09:12:21 +03:00
Takuya ASADA	6b19479ce5	dist/offline_installer/debian: support latest distributions Added Ubuntu 18.04 and Debian 9/10.	2020-06-02 09:12:21 +03:00
Piotr Sarna	d1f5d42a25	test: add big_decimal perf test In order to be able to measure the impact of rewritting the parsing mechanism from std::regex to a hand-written state machine.	2020-06-01 16:11:49 +02:00
Piotr Sarna	91e02ed3ad	test/lib: add generating random numeric string Useful for testing random numeric inputs, e.g. big decimals.	2020-06-01 16:11:49 +02:00
Piotr Sarna	ecc4a87a24	test: add test cases to big_decimal_test Test cases for big decimals were quite complete, but since the implementation was recently changed, some corner cases are added: - incorrect strings - numbers not fitting into uint64_t - numbers less than uint64_t::max themselves, but with the unscaled value exceeding the maximum	2020-06-01 16:11:49 +02:00
Piotr Sarna	7b5db478ed	big_decimal: migrate to string views Big decimals are, among other use cases, used as a main number type for alternator, and as such can appear on the fast path. Parsing big decimals was performed via std::regex, which is not precisely famous for its speeds, and also enforces unnecessary string copying. Therefore, the implementation is replaced with an open-coded version based on string_views. One previous iteration of this series also included a hand-coded state machine implementation, but it proved to be slower than the slightly naive string_view one. Overall, execution time is reduced by 61.6% according to microbenchmarks, which sounds like a promising improvement. Perf results: test iterations median mad min max Regex (original): big_decimal_test.from_string 88895 11.228us 25.891ns 11.202us 11.510us String view (new): big_decimal_test.from_string 232334 4.303us 21.660ns 4.282us 4.736us State machine (experimental, ditched): big_decimal_test.from_string 148318 6.723us 51.896ns 6.672us 6.877us Tests: unit(dev + release(big_decimal_test))	2020-06-01 16:11:49 +02:00
Gleb Natapov	9848328844	lwt: do not go over the replica list in case a quorum is already reached Also add a comment that clarifies why doing prune before learning on all replicas is safe. Message-Id: <20200531143523.GN337013@scylladb.com>	2020-06-01 12:57:37 +02:00
Asias He	6c89cedf0a	repair: Do not pass table names to repair_info Get the table names from the table ids instead which prevents the user of repair_info class provides inconsistent table names and table ids. Refs: #5942	2020-06-01 17:44:05 +08:00
Asias He	12d929a5ae	repair: Add table_id to row_level_repair Now that repair_info has tables id for the tables we want to repair. Use table_id instead of table_name in row level repair to find a table. It guarantees we repair the same table even if a table is dropped and a new table is created with the same name. Refs: #5942	2020-06-01 17:34:25 +08:00
Asias He	7ea8bf648d	repair: Use table id to find a table in get_sharder_for_tables We are moving to use the table id instead of table name to get a table in repair. It guarantees the same table is repaired. Refs: #5942	2020-06-01 17:34:25 +08:00
Asias He	378e31b409	repair: Add table_ids to repair_info A helper get_table_ids is added to convert the table names to table ids. We convert it once and use the same table ids for the whole repair operations. This guarantees we repair the same table during the same repair request. Refs: #5942	2020-06-01 17:34:25 +08:00
Asias He	ad878a56eb	repair: Make func in tracker::run run inside a thread It simplify the code in func and makes it easier to write loop that does not stall. Refs: #5942	2020-06-01 17:34:16 +08:00
Avi Kivity	cb17baea77	Merge "Remove storage service from various places" from Pavel E " This is a combined set of tiny cleanups that has been collected for the past few monthes. Mostly about removing storage_service.hh inclusions here and there. tests: unit(dev), headers compilation " * 'br-storage-service-cleanups-a' of https://github.com/xemul/scylla: storage_service: Remove some inclusions of its header storage_service: Move get_generation_number to util/ streaming: Get local db with own helper streaming: Fix indentation after previous patch streaming: Do not explicitly switch sched group	2020-06-01 10:44:12 +03:00
Israel Fruchter	cd96202dcb	fix(scylla_prepare): missing platform import as part of `eabcb31503` `import platform` was removed from scylla_utils.py seem like we missed it's usage in scylla_prepare script	2020-06-01 10:33:18 +03:00
Pavel Emelyanov	67d5fad65f	storage_service: Remove some inclusions of its header GC pass over .cc files. Some really do not need it, some need for features/gossiper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-01 09:08:40 +03:00
Pavel Emelyanov	ee31191e21	storage_service: Move get_generation_number to util/ This is purely utility helper routine. As a nice side effect the inclusion of storage_service.hh is removed from several unrelated places. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-01 09:08:40 +03:00
Pavel Emelyanov	07add9767b	streaming: Get local db with own helper There's a static global instance of needed services and helpers for it in streaming code. This is not great to use them, but at least this change unifies different pieces of streaming code and removes the storage_service.hh from streaming_session.cc (the streaming_sessio.hh doesn't include it either). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-01 09:08:40 +03:00
Pavel Emelyanov	428ef9c9ac	streaming: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-01 09:08:40 +03:00
Pavel Emelyanov	5db04fcf30	streaming: Do not explicitly switch sched group This is continuation of `ac998e95` -- the sched group is switched by messaging service for a verb, no need to do it by hands. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-01 09:08:40 +03:00
Pavel Solodovnikov	022d5f6498	cql3: use unique_ptr's for `cql3::operation::raw_update` These are not shared anywhere and so can be easily changed to be stored in std::unique_ptr instead of shared_ptr's. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200531201051.946432-1-pa.solodovnikov@scylladb.com>	2020-05-31 23:45:19 +03:00
Botond Dénes	7c56e79355	test/multishard_mutation_query_test: eliminate another unsafely used boost test macro Boost test macros are not thread safe, using them from multiple threads results in garbled XML test report output. `3f1823a4f0` replaced most of the thread-unsafe boost test macros in multishard_mutation_query_test, but one still managed to slip through the cracks. This patch removes that as well. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200529130706.149603-3-bdenes@scylladb.com>	2020-05-31 16:08:02 +03:00
Botond Dénes	c5b0e8a45a	test: move thread-safe test macro alternatives to lib/test_utils.hh Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200529130706.149603-2-bdenes@scylladb.com>	2020-05-31 16:08:02 +03:00
Israel Fruchter	eabcb31503	scylla_util.py: replace platform.dist() with distro package since dbuild was updated to fedora-32, hence to python3.8 `platform.dist()` is deprecated, and need to be replaced Fixes: #6501 [avi: folded patch with install-dependencies.sh change] [avi: regenerated toolchain]	2020-05-31 13:42:34 +03:00
Avi Kivity	e63fd76a04	Update seastar submodule * seastar c97b05b238...9066edd512 (2): > Merge "Delete c++14 support code" from Rafael > coroutines: add support for forwarding returns	2020-05-31 13:12:16 +03:00
Botond Dénes	7ea64b1838	test: mutation_reader_test: use <ranges> Replace all the ranges stuff we use from boost with the std equivalents. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200529141407.158960-3-bdenes@scylladb.com>	2020-05-31 12:58:59 +03:00
Botond Dénes	a9e6fe4071	utils: introduce ranges::to() Sadly, std::ranges is missing an equivalent of boost::copy_range(), so we introduce a replacement: ranges::to(). There is an existing proposal to introduce something similar to the standard library: std::ranges::to() (https://github.com/cplusplus/papers/issues/145). We name our own version similarly, so if said proposal makes it in we can just prepend std:: and be good. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200529141407.158960-2-bdenes@scylladb.com>	2020-05-31 12:58:59 +03:00
Pavel Solodovnikov	c4bbeb80db	cql3: pass `column_specification` by ref to `cql3::assignment_testable` functions This patch changes the signatures of `test_assignment` and `test_all` functions to accept `cql3::column_specification` by const reference instead of shared pointer. Mostly a cosmetic change reducing overall shared_ptr bloat in cql3 code. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200529195249.767346-1-pa.solodovnikov@scylladb.com>	2020-05-30 09:49:29 +03:00
Raphael S. Carvalho	d6b4a9a237	compaction: increase the frequency at which we check for abortion requests Compaction is checking for abortion whenever it's consuming a new partition. The problem with this approach is that the abortion can take too long if compaction is working with really large partitions. If the current partition takes minutes to be compacted, it means that abortion may be delayed by a factor of minutes as well. Truncate, for example, relies on this abortion mechanism, so it could happen that the operation would take much longer than expected due to this ineffiency, probably result in timeouts in the user side. To fix this, it's clear that we need to increase the frequency at which we check for abortion requests. More precisely, we need to do it not only on partition granularity, but also on row granularity. Fixes #6309. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200529172847.44444-1-raphaelsc@scylladb.com>	2020-05-29 21:23:49 +02:00
Pavel Emelyanov	878f8d856a	logalloc: Report reclamation timing with rate The timer.stop() call, that reports not only the time-taken, but also the reclaimation rate, was unintentionally dropped while expanding its scope (`c70ebc7c`). Take it back (and mark the compact_and_evict_locked as private while at it). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200528185331.10537-1-xemul@scylladb.com>	2020-05-29 14:50:43 +02:00
Botond Dénes	94e00186b6	test.py: centralize the determining whether stdout is a tty Currently test.py has three different places it checks whether stdout is a tty. This patch centralizes these into a single global variable. This ensures consistency and makes it easier to override it later with a command-line switch, should we want to. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200529101124.123925-1-bdenes@scylladb.com>	2020-05-29 14:50:43 +02:00
Pavel Emelyanov	7696ed1343	shard_tracker: Configure it in one go Instead of doing 3 smp::invoke_on_all-s and duplicating tracker::impl API for the tracker itself, introduce the tracker::configure, simplify the tracker configuration and narrow down the public tracker API. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200528185442.10682-1-xemul@scylladb.com>	2020-05-29 14:50:43 +02:00
Kamil Braun	a304f774f8	redis: don't include storage_proxy.hh unnecessarily Use a forward declaration instead.	2020-05-29 13:34:42 +02:00
Juliusz Stasiewicz	f2cedbc228	cdc: Remove assert that bootstrap_tokens is nonempty	2020-05-29 12:23:08 +02:00
Juliusz Stasiewicz	aadd2ffa6a	api: Added command `/storage_service/cdc_streams_check_and_repair` This commit introduces a placeholder for HTTP POST request at `/storage_service/cdc_streams_check_and_repair`.	2020-05-29 12:23:08 +02:00
Avi Kivity	0c6bbc84cd	Merge "Classify queries based on their initiator, rather than their target" from Botond " Currently we classify queries as "system" or "user" based on the table they target. The class of a query determines how the query is treated, currently: timeout, limits for reverse queries and the concurrency semaphore. The catch is that users are also allowed to query system tables and when doing so they will bypass the limits intended for user queries. This has caused performance problems in the past, yet the reason we decided to finally address this is that we want to introduce a memory limit for unpaged queries. Internal (system) queries are all unpaged and we don't want to impose the same limit on them. This series uses scheduling groups to distinguish user and system workloads, based on the assumption that user workloads will run in the statement scheduling group, while system workloads will run in the main (or default) scheduling group, or perhaps something else, but in any case not in the statement one. Currently the scheduling group of reads and writes is lost when going through the messaging service, so to be able to use scheduling groups to distinguish user and system reads this series refactors the messaging service to retain this distinction across verb calls. Furthermore, we execute some system reads/writes as part of user reads/writes, such as auth and schema sync. These processes are tagged to run in the main group. This series also centralises query classification on the replica and moves it to a higher level. More specifically, queries are now classified -- the scheduling group they run in is translated to the appropriate query class specific configuration -- on the database level and the configuration is propagated down to the lower layers. Currently this query class specific configuration consists of the reader concurrency semaphore and the max memory limit for otherwise unlimited queries. A corollary of the semaphore begin selected on the database level is that the read permit is now created before the read starts. A valid permit is now available during all stages of the read, enabling tracking the memory consumption of e.g. the memtable and cache readers. This change aligns nicely with the needs of more accurate reader memory tracking, which also wants a valid permit that is available in every layer. The series can be divided roughly into the following distinct patch groups: * 01-02: Give system read concurrency a boost during startup. * 03-06: Introduce user/system statement isolation to messaging service. * 07-13: Various infrastructure changes to prepare for using read permits in all stages of reads. * 14-19: Propagate the semaphore and the permit from database to the various table methods that currently create the permit. * 20-23: Migrate away from using the reader concurrency semaphore for waiting for admission, use the permit instead. * 24: Introduce `database::make_query_config()` and switch the database methods needing such a config to use it. * 25-31: Get rid of all uses of `no_reader_permit()`. * 32-33: Ban empty permits for good. * 34: querier_cache: use the queriers' permits to obtain the semaphore. Fixes: #5919 Tests: unit(dev, release, debug), dtest(bootstrap_test.py:TestBootstrap.start_stop_test_node), manual testing with a 2 node mixed cluster with extra logging. " * 'query-class/v6' of https://github.com/denesb/scylla: (34 commits) querier_cache: get semaphore from querier reader_permit: forbid empty permits reader_permit: fix reader_resources::operator bool treewide: remove all uses of no_reader_permit() database: make_multishard_streaming_reader: pass valid permit to multi range reader sstables: pass valid permits to all internal reads compaction: pass a valid permit to sstable reads database: add compaction read concurrency semaphore view: use valid permits for reads from the base table database: use valid permit for counter read-before-write database: introduce make_query_class_config() reader_concurrency_semaphore: remove wait_admission and consume_resources() test: move away from reader_concurrency_semaphore::wait_admission() reader_permit: resource_units: introduce add() mutation_reader: restricted_reader: work in terms of reader_permit row_cache: pass a valid permit to underlying read memtable: pass a valid permit to the delegate reader table: require a valid permit to be passed to most read methods multishard_mutation_query: pass a valid permit to shard mutation sources querier: add reader_permit parameter and forward it to the mutation_source ...	2020-05-29 10:11:44 +03:00
Raphael S. Carvalho	097a5e9e07	compaction: Disable garbage collected writer if interposer consumer is used GC writer, used for incremental compaction, cannot be currently used if interposer consumer is used. That's because compaction assumes that GC writer will be operated only by a single compaction writer at a given point in time. With interposer consumer, multiple writers will concurrently operate on the same GC writer, leading to race condition which potentially result in use-after-free. Let's disable GC writer if interposer consumer is enabled. We're not losing anything because GC writer is currently only needed on strategies which don't implement an interposer consumer. Resharding will always disable GC writer, which is the expected behavior because it doesn't support incremental compaction yet. The proper fix, which allows GC writer and interposer consumer to work together, will require more time to implement and test, and for that reason, I am postponing it as #6472 is a showstopper for the current release. Fixes #6472. tests: mode(dev). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200526195428.230472-1-raphaelsc@scylladb.com>	2020-05-29 08:26:43 +02:00
Nadav Har'El	17649ad0b5	alternator: error when unimplemented ConditionalOperator is used The ScanFilter and QueryFilter features are only partially implemented. Most of their unimplemented features cause clear errors telling the user of the unimplemented feature, but one exception is the ConditionalOperator parameter, which can be used to "OR", instead of the default "AND", of several conditions. Before this patch, we simply ignored this parameter - causing wrong results to be returned instead of an error. In this patch, ScanFilter and QueryFilter parse, instead of ignoring, the ConditionalOperator. The common implementation, get_filtering_restrictions(), still does not implement the OR case, but returns an error if we reach this case instead of just ignoring it. There is no new test. The existing test_query_filter.py::test_query_filter_or xfailed before this patch, and continues to xfail after it, but the failure is different (you can see it by running the test with "--runxfail"): Before this patch, the failure was because of different results. After this patch, the failure is because of an "unimplemented" error message. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200528214721.230587-2-nyh@scylladb.com>	2020-05-29 08:26:43 +02:00
Nadav Har'El	d200cde9d6	alternator: extract function for parsing ConditionalOperator The code for parsing the ConditionalOperator attribute was used once in for the "Expected" case, but we will also need it for the "QueryFilter" and "ScanFilter" cases, so let's extract it into a function, get_conditional_operator(). While doing this extraction, I also noticed a bug: when Expected is missing, ConditionalOperator should not be allowed. We correctly checked the case of an empty Expected, but forgot to also check the case of a missing Expected. So the new code also fixes this corner case, and we include a new test case for it (which passes on DynamoDB and used to fail in Alternator but passes after this patch). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200528214721.230587-1-nyh@scylladb.com>	2020-05-29 08:26:43 +02:00
Rafael Ávila de Espíndola	aa778ec152	configure: Reduce the dynamic linker path size gdb has a SO_NAME_MAX_PATH_SIZE of 512, so we use that as the path size. Fixes: #6494 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200528202741.398695-2-espindola@scylladb.com>	2020-05-29 08:26:43 +02:00
Rafael Ávila de Espíndola	078c680690	configure: Implement get-dynamic-linker.sh directly in python For now it produces exactly the same output. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200528202741.398695-1-espindola@scylladb.com>	2020-05-29 08:26:43 +02:00
Rafael Ávila de Espíndola	33e1ee024f	configure: Delete old seastar option The sestar we use doesn't have -DSeastar_STD_OPTIONAL_VARIANT_STRINGVIEW=ON anymore. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200528203324.400141-1-espindola@scylladb.com>	2020-05-29 08:26:43 +02:00
Glauber Costa	44a0e40cb2	compaction: move compaction_strategy_type to its own header I just hit a circularity in header inclusion that I traced back to the fact that schema.hh includes compaction_strategy.hh. schema.hh is in turn included in lots of places, so a circularity is not hard to come by. The schema header really only needs to know about the compaction_type, so it can inform schema users about it. Following the trend in header clenups, I am moving that to a separate header which will both break the circularity and make sure we are included less stuff that is not needed. With this change, Scylla fails to compile due to a new missing forward declaration at index/secondary_index_manager.hh, so this is fixed. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200527172203.915936-1-glauber@scylladb.com>	2020-05-29 08:14:27 +03:00
Piotr Sarna	77e943e9a3	db,views: unify time points used for update generation Until now, view updates were generated with a bunch of random time points, because the interface was not adjusted for passing a single time point. The time points were used to determine whether cells were alive (e.g. because of TTL), so it's better to unify the process: 1. when generating view updates from user writes, a single time point is used for the whole operation 2. when generating view updates via the view building process, a single time point is used for each build step NOTE: I don't see any reliable and deterministic way of writing test scenarios which trigger problems with the old code. After #6488 is resolved and error injection is integrated into view.cc, tests can be added. Fixes #6429 Tests: unit(dev) Message-Id: <f864e965eb2e27ffc13d50359ad1e228894f7121.1590070130.git.sarna@scylladb.com>	2020-05-28 12:56:09 +03:00
Alejo Sanchez	bb08b5ad5a	utils: error injections provide error exceptions Provide non-timeout error exception to facilitate control flow in injected errors. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-05-28 11:13:55 +02:00
Pavel Solodovnikov	014883d560	failure_injector: implement CQL API for failure injector class The following UDFs are defined to control failure injector API usage: * enable_injection(name, args) * disable_injection(name) All arguments have string type. As currently function(terminal) is not supported by the parser, the arguments must come from selected rows. Added boost test for CQL API. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-05-28 11:13:55 +02:00
Alejo Sanchez	2c7e01a3b6	lwt: fix disabled error injection templates Fix disabled injection templates to match enabled ones. Fix corresponding test to not be a continuation. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-05-28 11:13:55 +02:00
Botond Dénes	e678f06a5e	querier_cache: get semaphore from querier Currently the `querier_cache` is passed a semaphore during its construction and it uses this semaphore to do all the inactive reader registering/unregistering. This is inaccurate as in theory cached reads could belong to different semaphores (although currently this is not yet the case). As all queriers store a valid permit now, use this permit to obtain the semaphore the querier is associated with, and register the inactive read with this semaphore.	2020-05-28 11:34:35 +03:00
Botond Dénes	3cd2598ab3	reader_permit: forbid empty permits Remove `no_reader_permit()` and all ways to create empty (invalid) permits. All permits are guaranteed to be valid now and are only obtainable from a semaphore. `reader_permit::semaphore()` now returns a reference, as it is guaranteed to always have a valid semaphore reference.	2020-05-28 11:34:35 +03:00
Botond Dénes	e40b1fc3c8	reader_permit: fix reader_resources::operator bool	2020-05-28 11:34:35 +03:00
Botond Dénes	d68ac8bf18	treewide: remove all uses of no_reader_permit()	2020-05-28 11:34:35 +03:00
Botond Dénes	0f55e8e30f	database: make_multishard_streaming_reader: pass valid permit to multi range reader The permit is not used by the mutation source passed to it, but empty permits will soon be forbidden, so we have to pass a valid one.	2020-05-28 11:34:35 +03:00
Botond Dénes	b5aa08ed77	sstables: pass valid permits to all internal reads We will soon require a valid permit for all reads, including low level index reads. The sstable layer has several internal reads which can not be associated with either the user or the system read semaphores or it would be very hard to obtain the correct semaphore, for limited/no gain. To be able to pass a valid permit still, we either expose a permit parameter so upper layers can pass down one, or create a local semaphore for these reads and use that to obtain a permit. The following methods now require a permit to be passed to them: * `sstables::sstabe::read_data()`: only used in tests. The following methods use internal semaphores: * `sstables::sstable::generate_summary()` used when loading an sstable. * `sstables::sstable::has_partition_key()`: used by a REST API method.	2020-05-28 11:34:35 +03:00
Botond Dénes	0952a43a9e	compaction: pass a valid permit to sstable reads Use the newly created compaction read concurrency semaphore to create and pass valid permits to all sstable reads done on behalf of compaction.	2020-05-28 11:34:35 +03:00
Botond Dénes	734e995639	database: add compaction read concurrency semaphore All reads will soon require a valid permit, including those done during compaction. To allow creating valid permits for these reads create a compaction specific semaphore. This semaphore is unlimited as compaction concurrency is managed by higher level layer, we use just for resource usage accounting.	2020-05-28 11:34:35 +03:00
Botond Dénes	992e697dd5	view: use valid permits for reads from the base table View update generation involves reading existing values from the base table, which will soon require a valid permit to be passed to it, so make sure we create and pass a valid permit to these reads. We use `database::make_query_class_config()` to obtain the semaphore for the read which selects the appropriate user/system semaphore based on the scheduling group the base table write is running in.	2020-05-28 11:34:35 +03:00
Botond Dénes	639bbefcd3	database: use valid permit for counter read-before-write Counter writes involve a read-before-write, which will soon require a valid permit to be passed to it, so make sure we create and pass a valid permit to this read. We use `database::make_query_class_config()` to obtain the semaphore for the read which selects the appropriate user/system semaphore based on the scheduling group the counter write is running in.	2020-05-28 11:34:35 +03:00
Botond Dénes	e4c591aa67	database: introduce make_query_class_config() And use it to obtain any query-class specific configuration that was obtained from `table::config` before, such as the read concurrency semaphore and the max memory limit for unlimited queries. As all users of these items get these from the query class config now, we can remove them from `table::config`.	2020-05-28 11:34:35 +03:00
Botond Dénes	f417b9a3ea	reader_concurrency_semaphore: remove wait_admission and consume_resources() Permits are now created with `make_permit()` and code is using the permit to do all resource consumption tracking and admission waiting, so we can remove these from the semaphore. This allows us to remove some now unused code from the permit as well, namely the `base_cost` which was used to track the resource amount the permit was created with. Now this amount is also tracked with a `resource_units` RAII object, returned from `reader_permit::wait_admission()`, so it can be removed. Curiously, this reduces the reader permit to be glorified semaphore pointer. Still, the permit abstraction is worth keeping, because it allows us to make changes to how the resource tracking part of the semaphore works, without having to change the huge amount of code sites passing around the permit.	2020-05-28 11:34:35 +03:00
Botond Dénes	a08467da29	test: move away from reader_concurrency_semaphore::wait_admission() And use the reader_permit for this instead. This refactoring has revealed a pre-existing bug in the `test_lifecycle_policy`, which is also addressed in this patch. The bug is that said policy executes reader destructions in the background, and these are not waited for. For some reason, the semaphore -> permit transition pushes these races over the edge and we start seeing some of these destruction fibers still being unfinished when test scopes are exited, causing all sorts of trouble. The solution is to introduce a special gate that tests can use to wait for all background work to finish, before the test scope is exited.	2020-05-28 11:34:35 +03:00
Botond Dénes	bf4ade8917	reader_permit: resource_units: introduce add() Allows merging two resource_units into one.	2020-05-28 11:34:35 +03:00
Botond Dénes	4409579352	mutation_reader: restricted_reader: work in terms of reader_permit We want to refactor all read resource tracking code to work through the read_permit, so refactor the restricted reader to also do so.	2020-05-28 11:34:35 +03:00
Botond Dénes	fe024cecdc	row_cache: pass a valid permit to underlying read All reader are soon going to require a valid permit, so make sure we have a valid permit which we can pass to the underlying reader when creating it. This means `row_cache::make_reader()` now also requires a permit to be passed to it.	2020-05-28 11:34:35 +03:00
Botond Dénes	9ede82ebf8	memtable: pass a valid permit to the delegate reader All reader are soon going to require a valid permit, so make sure we have a valid permit which we can pass to the delegate reader when creating it. This means `memtable::make_flat_reader()` now also requires a permit to be passed to it. Internally the permit is stored in `scanning_reader`, which is used both for flushes and normal reads. In the former case a permit is not required.	2020-05-28 11:34:35 +03:00
Botond Dénes	cc5137ffe3	table: require a valid permit to be passed to most read methods Now that the most prevalent users (range scan and single partition reads) all pass valid permits we require all users to do so and propagate the permit down towards `make_sstable_reader()`. The plan is to use this permit for restricting the sstable readers, instead of the semaphore the table is configured with. The various `make_streaming_*reader()` overloads keep using the internal semaphores as but they also create the permit before the read starts and pass it to `make_sstable_reader()`.	2020-05-28 11:34:35 +03:00
Botond Dénes	d5ebd763ff	multishard_mutation_query: pass a valid permit to shard mutation sources In preparation of a valid permit being required to be passed to all mutation sources, create a permit before creating the shard readers and pass it to the mutation source when doing so. The permit is also persisted in the `shard_mutation_querier` object when saving the reader, which is another forward looking change, to allow the querier-cache to use it to obtain the semaphore the read is actually registered with.	2020-05-28 11:34:35 +03:00
Botond Dénes	bad53c4245	querier: add reader_permit parameter and forward it to the mutation_source In preparation of a valid permit being required to be passed to all mutation sources, also add a permit to the querier object, which is then passed to the source when it is used to create a reader.	2020-05-28 11:34:35 +03:00
Botond Dénes	14743c4412	data_query, mutation_query: use query_class_config We want to move away from the current practice of selecting the relevant read concurrency semaphore inside `table` and instead want to pass it down from `database` so that we can pass down a semaphore that is appropriate for the class of the query. Use the recently created `query_class_config` struct for this. This is added as a parameter to `data_query`, `mutation_query` and propagated down to the point where we create the `querier` to execute the read. We are already propagating down a parameter down the same route -- max_memory_reverse_query -- which also happens to be part of `query_class_config`, so simply replace this parameter with a `query_class_config` one. As the lower layers are not prepared for a semaphore passed from above, make sure this semaphore is the same that is selected inside `table`. After the lower layers are prepared for a semaphore arriving from above, we will switch it to be the appropriate one for the class of the query.	2020-05-28 11:34:35 +03:00
Botond Dénes	0ee58d1d47	test: lib/reader_permit.hh: add make_query_class_config() To be used by tests to obtain a query_class_config to pass to APIs that require one. The class config contains the test semaphore.	2020-05-28 11:34:35 +03:00
Botond Dénes	308a162247	Introduce query_class_config This struct will serve as a container of all the query-class dependent configuration such as the semaphore to be used and the memory limit for unlimited queries. As there is no good place to put this, we create a separate header for it.	2020-05-28 11:34:35 +03:00
Botond Dénes	0b4ec62332	flat_mutation_reader: flat_multi_range_reader: add reader_permit parameter Mutation sources will soon require a valid permit so make sure we have one and pass it to the mutation sources when creating the underlying readers. For now, pass no_reader_permit() on call sites, deferring the obtaining of a valid permit to later patches.	2020-05-28 11:34:35 +03:00
Botond Dénes	97af2d98d2	test: lib: introduce reader_permit.{hh,cc} This contains a reader concurrency semaphore for the tests, that they can use to obtain a valid permit for reads. Soon we are going to start working towards a point where all APIs taking a permit will require a valid one. Before we start this work we must ensure test code is able to obtain a valid permit.	2020-05-28 11:34:35 +03:00
Botond Dénes	4d7250d12b	reader_permit: add wait_admission We want to make `read_permit` the single interface through which reads interact with the concurrency limiting mechanism. So far it was only usable to track memory consumption. Add the missing `wait_admission()` and `consume_resources()` to the permit API. As opposed to `reader_concurrency_semaphore::` equivalents which returned a permit, the `reader_permit::` variants jut return `reader_permit::resource_units` which is an RAII holder for the acquired units. This also allows for the permit to be created earlier, before the reader is admitted, allowing for tracking pre-admission memory usage as well. In fact this is what we are going to do in the next patches. This patch also introduces a `broken()` method on the reader concurrency semaphore which resolves waiters with an exception. This method is also called internally from the semaphore's destructor. This is needed because the semaphore can now have external waiters, who has to be resolved before the semaphore itself is destroyed.	2020-05-28 11:34:35 +03:00
Botond Dénes	bd793d6e19	reader_permit: resource_units: work in terms of reader_resources Refactor resource_units semantically as well to work in terms of reader_resources, instead of just memory.	2020-05-28 11:34:35 +03:00
Botond Dénes	0f9c24631a	reader_permit: s/memory_units/resource_units/ We want to refactor reader_permit::memory_units to work in terms of reader_resources, as we are planning to use it for guarding count resources as well. This patch makes the first step: renames it from memory_units to resources_units. Since this is a very noisy change, we do it in a separate patch, the semantic change is in the next patch.	2020-05-28 11:34:35 +03:00
Botond Dénes	16d8cdadc9	messaging_service: introduce the tenant concept Tenants get their own connections for statement verbs and are further isolated from each other by different scheduling groups. A tenant is identified by a scheduling group and a name. When selecting the client index for a statement verb, we look up the tenant whose scheduling group matches the current one. This scheduling group is persisted across the RPC call, using the name to identify the tenant on the remote end, where a reverse lookup (name -> scheduling group) happens. Instead of a single scheduling group to be used for all statement verbs, messaging_service::scheduling_config now contains a list of tenants. The first among these is the default tenant, the one we use when the current scheduling group doesn't match that of any configured tenant. To make this mapping easier, we reshuffle the client index assignment, such that statement and statement-ack verbs have the idx 2 and 3 respectively, instead of 0 and 3. The tenant configuration is configured at message service construction time and cannot be changed after. Adding such capability should be easy but is not needed for query classification, the current user of the tenant concept. Currently two tenants are configured: $user (default tenant) and $system.	2020-05-28 11:34:32 +03:00
Avi Kivity	db8974fef3	messaging_service: de-static-ify _scheduling_info_for_connection_index Per-user SLA means we have connection classifications determined dynamically, as SLAs are added or removed. This means the classification information cannot be static. Fix by making it a non-static vector (instead of a static array), allowing it to be extended. The scheduling group member pointer is replaced by a scheduling group as a member pointer won't work anymore - we won't have a member to refer to.	2020-05-28 10:40:08 +03:00
Avi Kivity	10dd08c9b0	messaging_service: supply and interpret rpc isolation_cookies On the client side, we supply an isolation cookie based on the connection index On the server side, we convert an isolation cookie back to a scheduling_group. This has two advantages: - rpc processes the entire connection using the scheduling group, so that code is also isolated and accounted for - we can later add per-user connections; the previous approach of looking at the verb to decide the scheduling_group doesn't help because we don't have a set of verbs per user With this, the main group sees <0.1% usage under simple read and write loads.	2020-05-28 10:40:08 +03:00
Avi Kivity	dbce57fa3c	messaging_service: extract connection_index -> scheduling_group translation Move it from a function-local static to a class static variable. We will want to extend it in two ways: - add more information per connection index (like the rpc isolation cookie) - support adding more connections for per-user SLA As a first step, make it an array of structures and make it accessible to all of messaging_service.	2020-05-28 10:40:08 +03:00
Botond Dénes	e0b98ba921	database: give system reads a concurrency boost during startup In the next patches we will match reads to the appropriate reader concurrency semaphore based on the scheduling group they run in. This will result in a lot of system reads that are executed during startup and that were up to now (incorrectly) using the user read semaphore to switch to the system read semaphore. This latter has a much more constrained concurrency, which was observed to cause system reads to saturate and block on the semaphore, slowing down startup. To solve this, boost the concurrency of the system read semaphore during startup to match that of the user semaphore. This is ok, as during startup there are no user reads to compete with. After startup, before we start serving user reads the concurrency is reverted back to the normal value.	2020-05-28 10:40:08 +03:00
Botond Dénes	521342f827	reader_concurrency_semaphore: expose signal/consume To allow the amount of available resource to be adjusted after creation.	2020-05-28 10:40:08 +03:00
Pavel Solodovnikov	d7fb51a094	cql3: remove unused functions `get_stored_prepared_statement*` These functions are not used anywhere, so no reason to keep them around. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200527172723.409019-1-pa.solodovnikov@scylladb.com>	2020-05-28 09:09:48 +02:00
Israel Fruchter	e5da79c211	scylla_current_repo: support diffrent $PRODUCT Support point to the correct download url for diffrent scylla products	2020-05-28 09:03:16 +03:00
Avi Kivity	9c26bdf944	Update seastar submodule * seastar 37774aa78...c97b05b23 (13): > test: futures: test async with throw_on_move arg > Merge 'fstream: close file if construction fails' from Botond > util: tmp_file: include <seastar/core/thread.hh> > test: file_io: test_file_stat_method: convert to use tmp_dir > reactor: don't mlock all memory at once > future: specify uninitialized_wrapper_base default constructors as noexcept > test: tls: ignore gate_closed_exception > rpc: recv_helper: ignore gate_closed_exception when replying to oversized requests > sharded: support passing arbitrary shard-dependent parameters to service constructors > Update circleci configuration for C++20 > treewide: deprecate seastar::apply() > Update README.md about c++ versions > cmake: Remove Seastar_STD_OPTIONAL_VARIANT_STRINGVIEW	2020-05-28 06:34:02 +03:00
Rafael Ávila de Espíndola	f274148be9	configure: Use seastar's api v2 No change right now as that is the current api version on the seastar we have, but being explicit will let us upgrade seastar and change the api independently. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200527235211.301654-1-espindola@scylladb.com>	2020-05-28 06:33:51 +03:00
Nadav Har'El	cd0fbb8d38	alternator test: add comprehensive tests for QueryFilter feature The QueryFilter parameter of Query is only partially implemented (issue tests for it. In this patch, we add comprehensive tests for this feature and all its various operators, types, and corner cases. The tests cover both the parts we already implemented, and the parts we did not yet. As usual, all tests succeed on DynamoDB, but many still xfail on Alternator pending the complete implementation. Refs #5028. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200525141242.133710-1-nyh@scylladb.com>	2020-05-27 15:29:27 +02:00
Avi Kivity	829e2508d0	logalloc: fix entropy depletion in test_compaction_with_multiple_regions() test_compaction_with_multiple_regions() has two calls to std::shuffle(), one using std::default_random_engine() has the PRNG, but the other, later on, using the std::random_device directly. This can cause failures due to entropy pool exhaustion. Fix by making the `random` variable refer to the PRNG, not the random_device, and adjust the first std::shuffle() call. This hides the random_device so it can't be used more than once. Message-Id: <20200527124247.2187364-1-avi@scylladb.com>	2020-05-27 15:51:16 +03:00
Botond Dénes	3f1823a4f0	multishard_mutation_query_test: don't use boost test macros in multiple shards Boost test macros are not safe to use in multiple shards (threads). Doing so will result in their output being interwoven, making it unreadable and generating invalid XML test reports. There was a lot of back-and-forth on how to solve this, including introducing thread-safe wrappers of the boost test macros, that use locks. This patch does something much simple: it defines a bunch of replacement utility functions for the used macros. These functions use the thread safe seastar logger to log messages and throw exceptions when the test has to be failed, which is pretty much what boost test does too. With this the previously seen complaint about invalid XML is gone. Example log messages from the utility functions: DEBUG 2020-05-27 13:32:54,248 [shard 1] testlog - check_equal(): OK @ validate_result() test/boost/multishard_mutation_query_test.cc:863: ckp{0004fe57c8d2} == ckp{0004fe57c8d2} DEBUG 2020-05-27 13:32:54,248 [shard 1] testlog - require(): OK @ validate_result() test/boost/multishard_mutation_query_test.cc:855 Fixes: #4774 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200527104426.176342-1-bdenes@scylladb.com>	2020-05-27 15:50:05 +03:00
Botond Dénes	caf21d7db9	test.py: disable boost test's colored output when stdout is not a tty Boost test uses colored output by default, even when the output of the test is redirected to a file. This makes the output quite hard to read for example in Jenkins. This patch fixes this by disabling the colored output when stdout is not a tty. This is in line with the colored output of configure.py itself, which is also enabled only if stdout is a tty. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200526112857.76131-1-bdenes@scylladb.com>	2020-05-27 14:20:12 +03:00
Avi Kivity	bdb5b11d19	treewide: stop using deprecated seastar::apply() seastar::apply() is deprecated in recent versions of seastar in favor of std::apply(), so stop including its header. Calls to unqualified apply(..., std::tuple<>) are resolved to std::apply() by argument dependent lookup, so no changes to call sites are necessary. This avoids a huge number of deprecation warnings with latest seastar. Message-Id: <20200526090552.1969633-1-avi@scylladb.com>	2020-05-27 14:07:35 +03:00
Nadav Har'El	51adaea499	alternator: use C++20 std::string_view::starts_with() We had to wait many years for it, but finally we have a starts_with() method in C++20. Let's use it instead of ugly substr()-based code. This is probably not a performance gain - substr() for a string_view was already efficient. But it makes the code easier to understand, and it allows us to rejoice in our decision to switch to C++20. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200526185812.165038-2-nyh@scylladb.com>	2020-05-27 08:14:12 +02:00
Nadav Har'El	b2ca7f6fc0	alternator: another check base64 begins_with without decoding In commit `cb7d3c6b55` we started to check if two base64-encoded strings begin with each other without decoding the strings first. However, we missed the check_BEGINS_WITH function which does the same thing. So this patch fixes this function as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200526185812.165038-1-nyh@scylladb.com>	2020-05-27 08:13:59 +02:00
Pekka Enberg	8721534dfb	Merge "tests: avoid exhausting random_device entropy" from Avi " In several tests we were calling random_device::operator() in a tight loop. This is a slow operation, and in gcc 10 can fail if called too frequently due to a bug [1]. Change to use a random_engine instead, seeded once from the random_device. Tests: unit (dev) [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087 " * 'entropy' of git://github.com/avikivity/scylla: tests: lsa_sync_eviction_test: don't exhaust random number entropy tests: querier_cache_test: don't exhaust random number entropy tests: loading_cache_test: don't exhaust random number entropy tests: dynamic_bitset_test: don't exhaust random number entropy	2020-05-27 08:40:06 +03:00
Botond Dénes	838b92f4b0	idl-compiler.py: don't use 'is not' for string comparison In python, `is` and `is not` checks object identity, not value equivalence, yet in `idl-compiler.py` it is used to compare strings. Newer python versions (that shipped in Fedora32) complains about this misuse, so this patch fixes it. Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200526091811.50229-1-bdenes@scylladb.com>	2020-05-27 08:40:05 +03:00
Avi Kivity	427398641a	build: switch C++ dialect to C++20 This gives us access to std::ranges, the spaceship operator, and more. Note coroutines are not yet enabled (these require g++ -fcoroutines) as we are still working our problem with address santizer support. Tests: unit (dev, debug, release) Message-Id: <20200521092157.1460983-1-avi@scylladb.com>	2020-05-27 08:40:05 +03:00
Nadav Har'El	c3da9f2bd4	alternator: add mandatory configurable write isolation mode Alternator supports four ways in which write operations can use quorum writes or LWT or both, which we called "write isolation policies". Until this patch, Alternator defaulted to the most generally safe policy, "always_use_lwt". This default could have been overriden for each table separately, but there was no way to change this default for all tables. This patch adds a "--alternator-write-isolation" configuration option which allows changing the default. Moreover, @dorlaor asked that users must explicitly choose this default mode, and not get "always_use_lwt" without noticing. The previous default, "always_use_lwt" supports any workload correctly but because it uses LWT for all writes it may be disappointingly slow for users who run write-only workloads (including most benchmarks) - such users might find the slow writes so disappointing that they will drop Scylla. Conversely, a default of "forbid_rmw" will be faster and still correct, but will fail on workloads which need read-modify-write operations - and suprise users that need these operations. So Dor asked that that none of the write modes be made the default, and users must make an informed choice between the different write modes, rather than being disappointed by a default choice they weren't aware of. So after this patch, Scylla refuses to boot if Alternator is enabled but a "--alternator-write-isolation" option is missing. The patch also modifies the relevant documentation, adds the same option to our docker image, and the modifies the test-running script test/alternator/run to run Scylla with the old default mode (always_use_lwt), which we need because we want to test RMW operations as well. Fixes #6452 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200524160338.108417-1-nyh@scylladb.com>	2020-05-27 08:40:05 +03:00
Kamil Braun	7a98db2ab3	cdc: set ttl column in log rows which update only collections	2020-05-27 08:40:05 +03:00
Tomasz Grabiec	1424543e11	Merge "Move sstables_format on sstable_manager" from Pavel Emelyanov The format is currently sitting in storage_service, but the previous set patched all the users not to call it, instead they use sstables_manager to get the highest supported format. So this set finalizes this effort and places the format on sstables_manager(s). The set introduces the db::sstables_format_selector, that - starts with the lowest format (ka) - reads one on start from system tables - subscribes on sstables-related features and bumps up the selection if the respective feature is enabled During its lifetime the selector holds a reference to the sharded<database> and updates the format on it, the database, in turn, propagates it further to sstables_managers. The managers start with the highest known format (mc) which is done for tests. * https://github.com/xemul/scylla br-move-sstables-format-4: storage_service: Get rid of one-line helpers system_keyspace: Cleanup setup() from storage_service format_selector: Log which format is being selected sstables_manager: Keep format on format_selector: Make it standalone format_selector: Move the code into db/ format_selector: Select format locally storage_service: Introduce format_selector storage_service: Split feature_enabled_listener::on_enabled storage_service: Tossing bits around features: Introduce and use masked features features: Get rid of per-features booleans	2020-05-27 08:40:05 +03:00
Gleb Natapov	e3ff88e674	lwt: prune system.paxos table when quorum of replicas learned the value Instead of waiting for all replicas to reply execute prune after quorum of replicas. This will keep system.paxos smaller in the case where one node is down. Fixes #6330 Message-Id: <20200525110822.GC233208@scylladb.com>	2020-05-27 08:40:05 +03:00
Piotr Sarna	ca2b96661d	Update seastar submodule * seastar ee516b1c...37774aa7 (12): > task: specify the default constructor as noexcept > scheduling: scheduling_group: specify explicit constructor as noexcept > net: tcp: use var after std::move()ed > future: implement make_exception_future_with_backtrace > future: Add noexcept to a few functions > scheduling: Add noexcept to a couple of functions > future: Move current_exception_as_future out of internal > future: Avoid a call to std::current_exception > seastar.hh: fix typo in doxygen main page text > future: Replace a call to futurize_apply with futurize_invoke > rpc: document how isolation work > future: Optimize any::move_it	2020-05-27 08:40:05 +03:00
Raphael S. Carvalho	9ebf7b442e	timestamp_based_splitting_writer: fix use-after-move look-alike rt is moved before rt.tomb.timestamp is retrieved, so there's a something that looks like use-after-move here (but really isn't). found it while auditting the code. [avi: adjusted changelog to note that it's not really a use-after-move] Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200525141047.168968-1-raphaelsc@scylladb.com>	2020-05-27 08:40:05 +03:00
Nadav Har'El	2eb929e89b	merge: Allow our users to shoot themselves in their feet Merge pull request https://github.com/scylladb/scylla/pull/6484 by Kamil Braun: Allow a node to join without bootstrapping, even if it couldn't contact other nodes. Print a BIG WARNING saying that you should never join nodes without bootstrapping (by marking it as a seed or using auto_bootstrap=off). Only the very first node should (must) be joined as a seed. If you want to have more seeds, first join them using the only supported way (i.e. bootstrap them), and only AFTER they have bootstrapped, change their configuration to include them in the seed list. Does not fix, but closes #6005. Read the discussion: it's enlightening. See scylladb/scylla-docs#2647 for the correct procedure of joining a node. Reverts `7cb6ac3`.	2020-05-27 08:40:05 +03:00
Nadav Har'El	b12265c2d5	alternator test: improve FilterExpression tests for "contains()" The tests for the contains() operator of FilterExpression were based on an incorrect understanding of what this operator does. Because the tests were (as usual) run against DynamoDB and passed, there was nothing wrong in the test per se - but it contains comments based on the wrong understanding, and also various corner cases which aren't as interesting as I thought (and vice versa - missed interesting corner cases). All these tests continue to pass on DynamoDB, and xfail on Alternator (because we didn't implement FilterExpression yet). Refs #5038. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200525123812.131209-1-nyh@scylladb.com>	2020-05-27 08:40:05 +03:00
Avi Kivity	8d27e1b4a9	Merge 'Propagate tracing to materialized view update path' from Piotr S In order to improve materialized views' debuggability, tracing points are added to view update generation path. Example trace: ``` ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-04-27 13:13:46.834000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-04-27 13:13:46.834346 \| 127.0.0.1 \| 1 \| 127.0.0.1 Processing a statement [shard 0] \| 2020-04-27 13:13:46.834426 \| 127.0.0.1 \| 80 \| 127.0.0.1 Creating write handler for token: -3248873570005575792 natural: {127.0.0.1, 127.0.0.3} pending: {} [shard 0] \| 2020-04-27 13:13:46.834494 \| 127.0.0.1 \| 148 \| 127.0.0.1 Creating write handler with live: {127.0.0.3, 127.0.0.1} dead: {} [shard 0] \| 2020-04-27 13:13:46.834507 \| 127.0.0.1 \| 161 \| 127.0.0.1 Sending a mutation to /127.0.0.3 [shard 0] \| 2020-04-27 13:13:46.834519 \| 127.0.0.1 \| 173 \| 127.0.0.1 Executing a mutation locally [shard 0] \| 2020-04-27 13:13:46.834532 \| 127.0.0.1 \| 186 \| 127.0.0.1 View updates for ks.t require read-before-write - base table reader is created [shard 0] \| 2020-04-27 13:13:46.834570 \| 127.0.0.1 \| 224 \| 127.0.0.1 Reading key {{-3248873570005575792, pk{000400000002}}} from sstable /home/sarna/.ccm/scylla-1/node1/data/ks/t-162ef290887811eaa4bf000000000000/mc-1-big-Data.db [shard 0] \| 2020-04-27 13:13:46.834608 \| 127.0.0.1 \| 262 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-162ef290887811eaa4bf000000000000/mc-1-big-Index.db: scheduling bulk DMA read of size 8 at offset 0 [shard 0] \| 2020-04-27 13:13:46.834635 \| 127.0.0.1 \| 289 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-162ef290887811eaa4bf000000000000/mc-1-big-Index.db: finished bulk DMA read of size 8 at offset 0, successfully read 8 bytes [shard 0] \| 2020-04-27 13:13:46.834975 \| 127.0.0.1 \| 629 \| 127.0.0.1 Message received from /127.0.0.1 [shard 0] \| 2020-04-27 13:13:46.834988 \| 127.0.0.3 \| 11 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-162ef290887811eaa4bf000000000000/mc-1-big-Data.db: scheduling bulk DMA read of size 41 at offset 0 [shard 0] \| 2020-04-27 13:13:46.835015 \| 127.0.0.1 \| 669 \| 127.0.0.1 View updates for ks.t require read-before-write - base table reader is created [shard 0] \| 2020-04-27 13:13:46.835020 \| 127.0.0.3 \| 44 \| 127.0.0.1 Generated 1 view update mutations [shard 0] \| 2020-04-27 13:13:46.835080 \| 127.0.0.3 \| 104 \| 127.0.0.1 Sending view update for ks.t_v2_idx_index to 127.0.0.2, with pending endpoints = {}; base token = -3248873570005575792; view token = 3728482343045213994 [shard 0] \| 2020-04-27 13:13:46.835095 \| 127.0.0.3 \| 119 \| 127.0.0.1 Sending a mutation to /127.0.0.2 [shard 0] \| 2020-04-27 13:13:46.835105 \| 127.0.0.3 \| 129 \| 127.0.0.1 View updates for ks.t were generated and propagated [shard 0] \| 2020-04-27 13:13:46.835117 \| 127.0.0.3 \| 141 \| 127.0.0.1 /home/sarna/.ccm/scylla-1/node1/data/ks/t-162ef290887811eaa4bf000000000000/mc-1-big-Data.db: finished bulk DMA read of size 41 at offset 0, successfully read 41 bytes [shard 0] \| 2020-04-27 13:13:46.835160 \| 127.0.0.1 \| 813 \| 127.0.0.1 Sending mutation_done to /127.0.0.1 [shard 0] \| 2020-04-27 13:13:46.835164 \| 127.0.0.3 \| 188 \| 127.0.0.1 Mutation handling is done [shard 0] \| 2020-04-27 13:13:46.835177 \| 127.0.0.3 \| 201 \| 127.0.0.1 Generated 1 view update mutations [shard 0] \| 2020-04-27 13:13:46.835215 \| 127.0.0.1 \| 869 \| 127.0.0.1 Locally applying view update for ks.t_v2_idx_index; base token = -3248873570005575792; view token = 3728482343045213994 [shard 0] \| 2020-04-27 13:13:46.835226 \| 127.0.0.1 \| 880 \| 127.0.0.1 Successfully applied local view update for 127.0.0.1 and 0 remote endpoints [shard 0] \| 2020-04-27 13:13:46.835253 \| 127.0.0.1 \| 907 \| 127.0.0.1 View updates for ks.t were generated and propagated [shard 0] \| 2020-04-27 13:13:46.835256 \| 127.0.0.1 \| 910 \| 127.0.0.1 Got a response from /127.0.0.1 [shard 0] \| 2020-04-27 13:13:46.835274 \| 127.0.0.1 \| 928 \| 127.0.0.1 Delay decision due to throttling: do not delay, resuming now [shard 0] \| 2020-04-27 13:13:46.835276 \| 127.0.0.1 \| 930 \| 127.0.0.1 Mutation successfully completed [shard 0] \| 2020-04-27 13:13:46.835279 \| 127.0.0.1 \| 933 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-04-27 13:13:46.835286 \| 127.0.0.1 \| 941 \| 127.0.0.1 Message received from /127.0.0.3 [shard 0] \| 2020-04-27 13:13:46.835331 \| 127.0.0.2 \| 14 \| 127.0.0.1 Sending mutation_done to /127.0.0.3 [shard 0] \| 2020-04-27 13:13:46.835399 \| 127.0.0.2 \| 82 \| 127.0.0.1 Mutation handling is done [shard 0] \| 2020-04-27 13:13:46.835413 \| 127.0.0.2 \| 96 \| 127.0.0.1 Got a response from /127.0.0.2 [shard 0] \| 2020-04-27 13:13:46.835639 \| 127.0.0.3 \| 662 \| 127.0.0.1 Delay decision due to throttling: do not delay, resuming now [shard 0] \| 2020-04-27 13:13:46.835640 \| 127.0.0.3 \| 664 \| 127.0.0.1 Successfully applied view update for 127.0.0.2 and 1 remote endpoints [shard 0] \| 2020-04-27 13:13:46.835649 \| 127.0.0.3 \| 673 \| 127.0.0.1 Got a response from /127.0.0.3 [shard 0] \| 2020-04-27 13:13:46.835841 \| 127.0.0.1 \| 1495 \| 127.0.0.1 Request complete \| 2020-04-27 13:13:46.834944 \| 127.0.0.1 \| 944 \| 127.0.0.1 ``` Fixes #6175 Tests: unit(dev), manual * psarna-propagate_tracing_to_more_write_paths: db,view: add tracing to view update generation path treewide: propagate trace state to write path	2020-05-27 08:40:05 +03:00
Takuya ASADA	287d6e5ece	dist/debian: drop dependency on pystache Same as `9d91ac345a`, drop dependency on pystache since it nolonger present in Fedora 32. To implement it, simplified debian package build process. It will be generate debian/ directory when building relocatable package, we just need to run debuild using the package. To generate debian/ directory this commit added debian_files_gen.py, it construct whole directory including control and changelog files from template files. Since we need to stop pystache, these template files swiched to string.Template class which is included python3 standard library. see: https://github.com/scylladb/scylla/pull/6313	2020-05-27 08:40:05 +03:00
Amnon Heiman	3e5beba403	estimated_histogram: clean if0 and FIXME This patch cleans the estimated histogram implementation. It removes the FIXME that were left in the code from the migration time and the if0 commented out code. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-05-27 08:40:05 +03:00
Avi Kivity	3ead6feaf0	tests: lsa_sync_eviction_test: don't exhaust random number entropy We call shuffle() with a random_device, extracting a true random number in each of the many calls shuffle() will invoke. Change it to use a random_engine seeded by a random_device. This avoids exhausting entropy, see [1] for details. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087	2020-05-26 20:51:38 +03:00
Avi Kivity	11698aafc1	tests: querier_cache_test: don't exhaust random number entropy rand_int() re-creates a random device each time it is called. Change it to use a static random_device, and get random numbers from a random_engine instead of from the device directly. This avoids exhausting entropy, see [1] for details. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087	2020-05-26 20:51:16 +03:00
Avi Kivity	e2f4c689b1	tests: loading_cache_test: don't exhaust random number entropy rand_int() re-creates a random device each time it is called. Change it to use a static random_device, and get random numbers from a random_engine instead of from the device directly. This avoids exhausting entropy, see [1] for details. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087	2020-05-26 20:49:58 +03:00
Avi Kivity	85da266cf4	tests: dynamic_bitset_test: don't exhaust random number entropy tests_random_ops() extracts a real random number from a random_device. Change it to use a random number engine. This avoids exhausting entropy, see [1] for details. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087	2020-05-26 20:46:45 +03:00
Pavel Emelyanov	ccdee822e1	storage_service: Get rid of one-line helpers Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:17:31 +03:00
Pavel Emelyanov	3c2066bd78	system_keyspace: Cleanup setup() from storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:17:31 +03:00
Pavel Emelyanov	0598b3a858	format_selector: Log which format is being selected Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:17:31 +03:00
Pavel Emelyanov	89a1b09214	sstables_manager: Keep format on Make the database be the format_selector target, so when the format is selected its set on database which in turn just forwards the selection into sstables managers. All users of the format are already patched to read it from those managers. The initial value for the format is the highest, which is needed by tests. When scylla starts the format is updated by format_selector, first after reading from system tables, then by selectiing it from features. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:17:28 +03:00
Pavel Emelyanov	a61f18ed64	format_selector: Make it standalone Remove the selector from storage_service and introduce an instance in main.cc that starts soon after the gossiper and feature_service, starts listening for features and sets the selected format on storage_service. This change includes - Removal of for_testing bit from format_selector constructor, now tests just do not use it - Adding a gate to selection routine to make sure on exit all the selection stuff is done. Although before the cluster join the selector waits for the feature listeners to finish (the .sync() method) this gate is still required to handle aborted start cases and wait for gossiper announcement from selector to complete. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:15:04 +03:00
Pavel Emelyanov	1692d94c9a	format_selector: Move the code into db/ This is just move, no changes in code logic. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 14:09:24 +03:00
Pavel Emelyanov	f13078ce80	format_selector: Select format locally Now format_selector uses storage_service as a place to keep the selected format. Change this by keeping the selected format on selector itself and after selection update one on the target. The selector starts with the lowest format to maybe bumps it up later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 13:43:47 +03:00
Pavel Emelyanov	5eb37c3743	storage_service: Introduce format_selector The final goal is to have a entity that will - read the saved sstables format (if any) - listen for sstables format related features enabling - select the top-most format - put the selected format onto a "target" - spread the world about it (via gossiper) The target is the service from which the selected format is read (so the selector can be removed once features agreement is reached). Today it's the storage_service, but at the end of this series it will be sstables_manager. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 13:27:34 +03:00
Pavel Emelyanov	833aa91f77	storage_service: Split feature_enabled_listener::on_enabled The split is into two parts, the goal is to move the 2nd one (the selection logic itself) into another class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 13:24:10 +03:00
Pavel Emelyanov	70391feb8e	storage_service: Tossing bits around The goal is to have main.cc add code between prepare_to_join and join_token_ring. As a side effect this drives us closer to proper split of storage service into sharded service itslef vs start/boot/join code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 13:21:08 +03:00
Pavel Emelyanov	d53d2bb664	features: Introduce and use masked features Nowadays the knowledge about known/supported features is scattered between frature_service and storage_service. The latter uses knowledge about the selected _sstables_format to alter the "supported" set. Encapsulate this knowledge inside the feature_service with the help of "masked_features" -- those, that shouldn't be advertized to other nodes. When only maskable feature for today is the UNBOUNDED_RANGE_TOMBSTONES one. Nowadays it's reported as supported only if the sstables format is MC. With this patch it starts as masked and gets unmasked when the sstables format is selected to be MC, so the change is correct. This will make it possible to move sstables_format from storage service to anywhere else. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 13:21:07 +03:00
Pavel Emelyanov	bb3a71529a	features: Get rid of per-features booleans The set of bool enable_something-s on feature_fonfig duplicates the disabled_features set on it, so remove the former and make full use of the latter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-05-25 13:09:12 +03:00
Nadav Har'El	bf7b5a0a0d	alternator test: add tests for Query's KeyConditions We had a very limited set of tests for the KeyConditions feature of Query, which some error cases as well as important use cases (such as bytes keys), leading to bugs #6490 and #6495 remaining undiscovered. This patch adds a comprehensive test for the KeyConditions and (hopefully) all its different combinations of operators, types, and many cases of errors. We already had a comprehensive test suite for the newer KeyConditionsExpression syntax, and this patch brings a similar level of coverage for the older KeyConditions syntax. Refs #6490 Refs #6495 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200524141800.104950-3-nyh@scylladb.com>	2020-05-25 09:59:06 +02:00
Nadav Har'El	f2eab853a5	alternator: improve Query's KeyConditions error message Improve error messages coming from Query's KeyCondition parameter when wrong ComparisonOperators were used (issue discovered by @Orenef11). At one point the error message was missing a parameter so resulted in an internal error, while in another place the message mentioned an unuseful number (enum) for the operator instead of its name. This patch fixes these error messages. Fixes #6490 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200524141800.104950-2-nyh@scylladb.com>	2020-05-25 09:59:00 +02:00
Nadav Har'El	6b38126a8f	alternator: fix support for bytes type in Query's KeyConditions Our parsing of values in a KeyConditions paramter of Query was done naively. As a result, we got bizarre error messages "condition not met: false" when these values had incorrect type (this is issue #6490). Worse - the naive conversion did not decode base64-encoded bytes value as needed, so KeyConditions on bytes-typed keys did not work at all. This patch fixes these bugs by using our existing utility function get_key_from_typed_value(), which takes care of throwing sensible errors when types don't match, and decoding base64 as needed. Unfortunately, we didn't have test coverage for many of the KeyConditions features including bytes keys, which is why this issue escaped detection. A patch will follow with much more comprehensive tests for KeyConditions, which also reproduce this issue and verify that it is fixed. Refs #6490 Fixes #6495 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200524141800.104950-1-nyh@scylladb.com>	2020-05-25 09:58:37 +02:00
Nadav Har'El	5ef9854e86	alternator: better error messages when 'forbid_rmw' mode is on When the 'forbid_rmw' write isolation policy is selected, read-modify-write are intentionally forbidden. The error message in this case used to say: "Read-modify-write operations not supported" Which can lead users to believe that this operation isn't supported by this version of Alternator - instead of realizing that this is in fact a configurable choice. So in this patch we just change the error message to say: "Read-modify-write operations are disabled by 'forbid_rmw' write isolation policy. Refer to https://github.com/scylladb/scylla/blob/master/docs/alternator/alternator.md#write-isolation-policies for more information." Fixes #6421. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200518125538.8347-1-nyh@scylladb.com>	2020-05-24 16:31:38 +02:00
Asias He	c02fea5f04	repair: Ignore table removed in sync_data_using_repair Commit `75cf255c67` (repair: Ignore keyspace that is removed in sync_data_using_repair) is not enough to fix the issue because when the repair master checks if the table is dropped, the table might not be dropped yet on the repair master. To fix, the repair master should check if the follower failed the repair because the table is dropped by checking the error returned from follower. With this patch, we would see WARN 2020-04-14 11:19:00,417 [shard 0] repair - repair id 1 on shard 0 completed successfully, keyspace=ks, ignoring dropped tables={cf} when the table is dropped during bootstrap. Tests: update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_new_node_while_schema_changes_test Fixes: #5942	2020-05-24 13:39:59 +03:00
Avi Kivity	5864bbcb52	cql3: untyped_result_set.hh: add missing include for column_specification Fails dev-headers build without it. Message-Id: <20200523061519.71855-1-avi@scylladb.com>	2020-05-24 12:28:03 +03:00
Avi Kivity	52e875430e	Merge "Pass --create-cc to seastar-json2code.py" from Rafael " This small series instructs seastar-json2code.py to also create a .cc file. This reduces header bloat and fixes the current stack usage warning in a dev build. " * 'espindola/json2code-cc' of https://github.com/espindola/scylla: configure.py: Pass --create-cc to seastar-json2code.py configure.py: Add a Source base class configure.py: Fix indentation	2020-05-24 11:27:41 +03:00
Piotr Sarna	629a965cbb	alternator-test: fix a test for large requests With required headers fixed by the previous commit, large requests test now returns a different error code (ClientError) when run with `--aws`. Message-Id: <d56142d1936164d22f457e30e37fd3e58cd52519.1590052823.git.sarna@scylladb.com>	2020-05-24 10:36:59 +03:00
Piotr Sarna	2adb17245b	alternator-test: add missing Content-Type header DynamoDB seems to have started refusing requests unless they include Content-Type header set to the following value: application/x-amz-json-1.0 In order to make sure that manual tests work correctly, let's add this header. Message-Id: <ae0edafa311bce27b27e9e72aa51bb9717c360f2.1590052823.git.sarna@scylladb.com>	2020-05-24 10:29:39 +03:00
Nadav Har'El	49fd0cc42f	docs/protocols.md: mention wireshark In docs/protocols.md, describing the protocols used by Scylla's (both inter-node protocols and client-facing protocols), add a paragraph about the ability to inspect most of these protocols, including Scylla's internal inter-node protocol, using wireshark. Link to Piotr Sarna's recent blog post about how to do this. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200524065248.76898-1-nyh@scylladb.com>	2020-05-24 09:54:49 +03:00
Avi Kivity	076c8317c7	streaming_histogram: add missing include for uint64_t Fails dev-headers build without it. Message-Id: <20200523061555.72087-1-avi@scylladb.com>	2020-05-23 11:09:10 +03:00
Raphael S. Carvalho	2f0f72025e	compaction: delete move ctor and assignment Compaction cannot be moved because its address is forwarded to members like garbage_collected_sstable_writer::data. Refs #6472. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200521193657.20782-1-raphaelsc@scylladb.com>	2020-05-22 17:07:43 +02:00
Kamil Braun	290d226034	storage_service: print a warning when joining a node improperly Print a BIG WARNING saying that you should never join nodes without bootstrapping (by marking it as a seed or using auto_bootstrap=off). Only the very first node should (must) be joined as a seed. If you want to have more seeds, first join them using the only supported way (i.e. bootstrap them), and only AFTER they have bootstrapped, change their configuration to include them in the seed list.	2020-05-22 16:46:39 +02:00
Kamil Braun	838f912ebf	storage_service: allow a node to join without bootstrapping ... even if it couldn't contact other nodes. This reverts `7cb6ac33f5`.	2020-05-22 16:46:30 +02:00
Tomasz Grabiec	5cbf0c5748	sstables: index_reader: Add trace-level logging to the index parser Tested against performance regression using: build/release/test/perf/perf_fast_forward --run-test=small-partition-skips -c1 I get similar results before and after the patch. Message-Id: <20200521213032.15286-1-tgrabiec@scylladb.com>	2020-05-22 13:54:47 +02:00
Avi Kivity	1c2f538eb3	tools: toolchain: dbuild: allow customization of docker arguments Introduce ~/.config/scylladb/dbuild configuration file, and SCYLLADB_DBUILD environment variables, that inject options into the docker run command. This allows adding bind mounts for ccache and distcc directories, as well as any local scripts and PATH or other environment configuration to suit the user's needs. Message-Id: <20200521133529.25880-1-avi@scylladb.com>	2020-05-22 13:52:21 +03:00
Asias He	81f0260816	range_streamer: Handle table of RF 1 in get_range_fetch_map After "Make replacing node take writes" series, with repair based node operations disabled, we saw the replace operation fail like: ``` [shard 0] init - Startup failed: std::runtime_error (unable to find sufficient sources for streaming range (9203926935651910749, +inf) in keyspace system_auth) ``` The reason is the system_auth keyspace has default RF of 1. It is impossible to find a source node to stream from for the ranges owned by the replaced node. In the past, the replace operation with keyspace of RF 1 passes, because the replacing node calls token_metadata.update_normal_tokens(tokens, ip_of_replacing_node) before streaming. We saw: ``` [shard 0] range_streamer - Bootstrap : keyspace system_auth range (-9021954492552185543, -9016289150131785593] exists on {127.0.0.6} ``` Node 127.0.0.6 is the replacing node 127.0.0.5. The source node check in range_streamer::get_range_fetch_map will pass if the source is the node itself. However, it will not stream from the node itself. As a result, the system_auth keyspace will not get any data. After the "Make replacing node take writes" series, the replacing node calls token_metadata.update_normal_tokens(tokens, ip_of_replacing_node) after the streaming finishes. We saw: ``` [shard 0] range_streamer - Bootstrap : keyspace system_auth range (-9049647518073030406, -9048297455405660225] exists on {127.0.0.5} ``` Since 127.0.0.5 was dead, the source node check failed, so the bootstrap operation. Ta fix, we ignore the keyspace of RF 1 when it is unable to find a source node to stream. Fixes #6351	2020-05-22 09:30:52 +08:00
Asias He	fa9ee234a0	streaming: Use separate streaming reason for replace operation Currently, replace and bootstrap share the same streaming reason, stream_reason::bootstrap, because they share most of the code in boot_strapper. In order to distinguish the two, we need to introduce a new stream reason, stream_reason::replace. It is safe to do so in a mixed cluster because current code only check if the stream_reason is stream_reason::repair. Refs: #6351	2020-05-22 09:30:52 +08:00
Tomasz Grabiec	a6c87a7b9e	sstables: index_reader: Fix overflow when calculating promoted index end When index file is larger than 4GB, offset calculation will overflow uint32_t and _promoted_index_end will be too small. As a result, promoted_index_size calculation will underflow and the rest of the page will be interpretd as a promoted index. The partitions which are in the remainder of the index page will not be found by single-partition queries. Data is not lost. Introduced in `6c5f8e0eda`. Fixes #6040 Message-Id: <20200521174822.8350-1-tgrabiec@scylladb.com>	2020-05-21 21:24:05 +03:00
Pavel Emelyanov	2ac24d38fa	row-cache: Remove variadic future from range_populating_reader Replace it with std::tuple, introduce range_populating_reader::read_result type alias for less keystrokes. This makes row_cache.o compilation warn-less. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200518160511.26984-1-xemul@scylladb.com>	2020-05-21 19:29:39 +02:00
Avi Kivity	a61b3f2d78	tools: toolchain: rebase on Fedora 32 - base image changed from Fedora 31 to Fedora 32 - disambiguate base image to use docker.io registry - pystache and python-casasndra-driver are no longer availble, so use pip3 to install them. Add pip3 to packages. - since pip3 installs commands to /usr/local/bin, update checks in build_deb to check for those too Fedora 32 packages gcc 10, which has support for coroutines. Message-Id: <20200521063138.1426400-1-avi@scylladb.com>	2020-05-21 18:27:50 +03:00
Piotr Sarna	032a531ea6	test: add unit tests for alternator base64 conversions The test cases verify that base64 operations encode and decode their data properly. Tests: unit(dev)	2020-05-21 18:26:59 +03:00
Piotr Sarna	e503075aac	alternator: apply the string_view helper function Explicit transformation from a JSON value to a string view can be replaced with a shorter helper function from rjson.hh.	2020-05-21 18:26:59 +03:00
Piotr Sarna	cb7d3c6b55	alternator: compute begins_with on base64 without decoding In order to remove a FIXME, code which checks a BEGINS_WITH relation between base64-encoded strings is computed in a way which does not involve decoding the whole string. In case of padding, the remainders are still decoded, but their size is bounded by 3, which means they will be eligible for the small string optimization.	2020-05-21 18:26:59 +03:00
Piotr Sarna	511ce82bd2	alternator: extract base64-decoding code to a helper function In the future, the decoding routine directly to std::string will be useful, so it's extracted out of a bigger function.	2020-05-21 18:26:59 +03:00
Piotr Sarna	3148571834	alternator: compute decoded base64 size without actually decoding In order to get rid of a FIXME, the code which computes the size of decoded base64 string based only on encoded size + padding is added. The result is an O(1) function with just a couple of ops (15 when checking with godbolt and gcc9), so it's a general improvement over having to allocate a string and get its size.	2020-05-21 18:26:59 +03:00
Botond Dénes	06dd3d9077	queue_reader: push(): eliminate unneeded continuation on full buffer case Currently, push() attaches a continuation to the _not_full future, if push() is called when the buffer is already full. This is not needed as we can safely push the fragment even if the buffer is already full. Furthermore we can eliminate the possibility of push() being called when the buffer is full, by checking whether it is full after pushing the fragment, not before. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200521055840.376019-1-bdenes@scylladb.com>	2020-05-21 09:34:44 +03:00
Pekka Enberg	ed0d00f51e	Revert "Revert "schema: Default dc_local_read_repair_chance to zero"" This reverts commit `43b488a7bc`. The commit was originally reverted because a dtest was sensitive to the value. The dtest is fixed now, so let's revert the revert as requested by Glauber.	2020-05-21 08:05:13 +03:00
Botond Dénes	c29ccdea7e	repair: switch from queue+generating_reader to queue_reader The queue_reader was inveted exactly to replace this construct and is more efficient than it. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200520155618.369873-1-bdenes@scylladb.com>	2020-05-20 19:33:28 +03:00
Calle Wilund	7ce4a8b458	token_metadata: Prune empty racks on endpoint change Fixes #6459 When moving or removing endpoints, we should ensure that the set of available racks reflect the nodes known, i.e. match what would be the result of a reboot + create sets initially. Message-Id: <20200519153300.15391-1-calle@scylladb.com>	2020-05-20 13:35:08 +02:00
Nadav Har'El	0673e44fc1	alternator test: small fix for Python 2 Although Python 2 is deprecated, some systems today still have "python" and "pytest" pointing to Python 2, so it would be convenient for the Alternator tests to work on both Python 2 and 3 if it's not too much of an effort. And it really isn't too much of an effort - they all work on both versions except for one problem introduced in the previous test patch: The syntax b'' for an empty byte array works correctly on Python 3 but incorrectly on Python 2: In Python 2, b'' is just a normal empty string, not byte array, which confuses Boto3 which refuses to accept a string as a value for a byte-array key. The trivial fix is to replace b'' by bytearray('', 'utf-8'). Uglier, but works as expected on both Python 2 and 3. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200519214321.25152-1-nyh@scylladb.com>	2020-05-20 07:56:16 +02:00
Nadav Har'El	3ff4be966d	merge: alternator: allow empty strings Merged patch series from Piotr Sarna: Given the new update from DynamoDB: https://aws.amazon.com/about-aws/whats-new/2020/05/amazon-dynamodb-now-supports-empty-values-for-non-key-string-and-binary-attributes-in-dynamodb-tables/ ... empty strings are now allowed, so alternator and its tests are updated accordingly. Key values still cannot be empty, and the definition also expands to columns which act as keys in global or local secondary indexes. Fixes #6480 Tests: alternator(local, remote)	2020-05-20 00:10:12 +03:00
Avi Kivity	ecae7a7920	Update seastar submodule * seastar 92365e7b8...ee516b1cc (17): > build: use -fcommon compiler flag for dpdk > coroutines: reduce template bloat > thread: make async noexcept > file: specify methods noexcept > doc: drop grace period for old C++ standard revisions > semaphore: specify consume_units as noexcept > doc/tutorial.md: add short intro to seastar::sharded<> > future: Move promise_base move constructor out of line > coroutines: enable for C++20 > tutorial: adjust evaluation order warning to note it is C++14-only > rpc_test: Fix test_stream_connection_error with valgrind > file: Remove unused lambda capture > install-dependencies: add valgrind to arch > coroutines_test: Don't access a destroyed lambda > tutorial: warn about evaluation order pitfall > merge: apps: improvements in httpd and seawreck > file: Move functions out of line	2020-05-19 21:25:24 +03:00
Rafael Ávila de Espíndola	79117e1473	configure.py: Pass --create-cc to seastar-json2code.py This adds a Json2Code class now that both a .cc and a .hh are produced. Creating a .cc file reduces header bloat and fixes the current stack too large warning in a dev build. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-05-19 10:22:38 -07:00
Rafael Ávila de Espíndola	8238c9c9f1	configure.py: Add a Source base class This reduces a bit of code duplication among the Thrift and Antlr3Grammar classes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-05-19 10:21:38 -07:00
Rafael Ávila de Espíndola	caf82755fc	configure.py: Fix indentation Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-05-19 10:20:21 -07:00
Botond Dénes	54a0d8536e	restricting_mutation_reader: include own buffer in buffer size calculation Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200519102902.231042-1-bdenes@scylladb.com>	2020-05-19 18:23:15 +03:00
Nadav Har'El	16b0680c40	alternator test: run Scylla with a different executable name The Alternator test (test/alternator/run) runs the real Scylla executable to test it. Users sometimes want to run Scylla manually in parallel (on different IP addresses, of course) and sometimes use commands like "killall scylla" to stop it, may be surprised that this command will also unintentionally kill a running test. So what this patch does is to name the Scylla process used for the test with the name "test_scylla". It will be visible as "test_scylla" in top, and a "killall scylla" will not touch it. You can, of course, kill it with a "killall test_scylla" if you wish. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200519071604.19161-1-nyh@scylladb.com>	2020-05-19 18:23:15 +03:00
Piotr Sarna	d3e70356c5	alternator-test: add tests for empty indexed string values According to DynamoDB, string/binary blob keys cannot be empty and this definition affects secondary indexes as well. As a result, only nonempty strings/binary blobs are accepted as values for columns which form a GSI or LSI key.	2020-05-19 11:32:18 +02:00
Piotr Sarna	ada137b543	alternator-test: add tests for empty strings in keys Empty string/binary blob values are not accepted by DynamoDB, and we should follow suit.	2020-05-19 11:32:18 +02:00
Piotr Sarna	7006389f69	alternator: refuse empty strings/binary blobs in keys In order to be compatible with DynamoDB, we should refuse items which keys contain empty strings or byte blobs.	2020-05-19 11:32:18 +02:00
Piotr Sarna	0d25427470	alternator-test: add a table with string sort key String sort key will be needed to ensure that empty string keys are not accepted.	2020-05-19 11:32:18 +02:00
Piotr Sarna	9f8202806a	alternator: allow empty strings in values Given the new update from DynamoDB: https://aws.amazon.com/about-aws/whats-new/2020/05/amazon-dynamodb-now-supports-empty-values-for-non-key-string-and-binary-attributes-in-dynamodb-tables/ ... empty strings are now allowed for non-key attributes, so alternator and its tests are updated accordingly. Fixes #6480 Tests: alternator(local, remote)	2020-05-19 11:32:18 +02:00
Nadav Har'El	cac9bcbbba	dist/docker: instructions how to build a docker image with your own executable Clarify in README.md that the instructions there will build a Docker image containing a Scylla executable downloaded from downloads.scylla.com - NOT the one you built yourself. The image is also CentOS based - not Fedora-based as claimed. In addition, a new dist/docker/redhat/README.md explains the somewhat steps needed to actually build a Docker image with the Scylla executable that you built. In the future, these steps should be automated (e.g., "ninja docker") but until then, let's at least document the process. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200518151123.11313-1-nyh@scylladb.com>	2020-05-19 09:23:37 +03:00
Piotr Sarna	1906e35d12	test: add --skip option The --skip option allows providing a pattern for tests which will not be run. Example usage: ./test.py --mode dev --skip alternator Tests: unit(dev), with `--skip alternator` and with no parameters Message-Id: <6970134d2bc15314f0e4944f3b167d0e105ea69b.1589811943.git.sarna@scylladb.com>	2020-05-19 08:14:32 +03:00
Tomasz Grabiec	3efef39e7e	Merge "lwt: fix batch validation crash and exception message case" from Alejo Fix a metadata crash and exception message casing consistency. Fixes #6332 * alejo/fix_issue_6332: lwt: validate before constructing metadata lwt: consistent exception message case	2020-05-19 08:14:32 +03:00
Avi Kivity	4d15aba7c0	commitlog: capture "this" explicitly in lambda C++20 deprecates capturing this in default-copy lambdas ([=]), with good reason. Move to explicit captures to avoid any ambiguity and reduce warning spew. Message-Id: <20200517150834.753463-1-avi@scylladb.com>	2020-05-19 08:14:32 +03:00
Piotr Sarna	18a37d0cb1	db,view: add tracing to view update generation path In order to improve materialized views' debuggability, tracing points are added to view update generation path. Sample info of an insert statement which resulted in producing local view updates which require read-before-write: activity \| timestamp \| source \| source_elapsed \| client ------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-04-19 12:02:48.420000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-04-19 12:02:48.420674 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2020-04-19 12:02:48.420753 \| 127.0.0.1 \| 79 \| 127.0.0.1 Creating write handler for token: -6715243485458697746 natural: {127.0.0.1} pending: {} [shard 0] \| 2020-04-19 12:02:48.420815 \| 127.0.0.1 \| 141 \| 127.0.0.1 Creating write handler with live: {127.0.0.1} dead: {} [shard 0] \| 2020-04-19 12:02:48.420824 \| 127.0.0.1 \| 149 \| 127.0.0.1 Executing a mutation locally [shard 0] \| 2020-04-19 12:02:48.420830 \| 127.0.0.1 \| 155 \| 127.0.0.1 View updates for ks.t1 require read-before-write - base table reader is created [shard 0] \| 2020-04-19 12:02:48.420862 \| 127.0.0.1 \| 188 \| 127.0.0.1 Generated 2 view update mutations [shard 0] \| 2020-04-19 12:02:48.420910 \| 127.0.0.1 \| 235 \| 127.0.0.1 Locally applying view update for ks.t1_v_idx_index; base token = -6715243485458697746; view token = -4156302194539278891 [shard 0] \| 2020-04-19 12:02:48.420918 \| 127.0.0.1 \| 243 \| 127.0.0.1 Successfully applied local view update for 127.0.0.1 and 0 remote endpoints [shard 0] \| 2020-04-19 12:02:48.420971 \| 127.0.0.1 \| 297 \| 127.0.0.1 View updates for ks.t1 were generated and propagated [shard 0] \| 2020-04-19 12:02:48.420973 \| 127.0.0.1 \| 299 \| 127.0.0.1 Got a response from /127.0.0.1 [shard 0] \| 2020-04-19 12:02:48.420988 \| 127.0.0.1 \| 314 \| 127.0.0.1 Delay decision due to throttling: do not delay, resuming now [shard 0] \| 2020-04-19 12:02:48.420990 \| 127.0.0.1 \| 315 \| 127.0.0.1 Mutation successfully completed [shard 0] \| 2020-04-19 12:02:48.420994 \| 127.0.0.1 \| 320 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-04-19 12:02:48.421000 \| 127.0.0.1 \| 326 \| 127.0.0.1 Request complete \| 2020-04-19 12:02:48.420330 \| 127.0.0.1 \| 330 \| 127.0.0.1 Sample info for remote updates: activity \| timestamp \| source \| source_elapsed \| client --------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-04-26 16:19:47.691000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 1] \| 2020-04-26 16:19:47.691590 \| 127.0.0.1 \| 6 \| 127.0.0.1 Processing a statement [shard 1] \| 2020-04-26 16:19:47.692368 \| 127.0.0.1 \| 783 \| 127.0.0.1 Creating write handler for token: -3248873570005575792 natural: {127.0.0.3, 127.0.0.2} pending: {} [shard 1] \| 2020-04-26 16:19:47.694186 \| 127.0.0.1 \| 2598 \| 127.0.0.1 Creating write handler with live: {127.0.0.2, 127.0.0.3} dead: {} [shard 1] \| 2020-04-26 16:19:47.694283 \| 127.0.0.1 \| 2699 \| 127.0.0.1 Sending a mutation to /127.0.0.2 [shard 1] \| 2020-04-26 16:19:47.694591 \| 127.0.0.1 \| 3006 \| 127.0.0.1 Sending a mutation to /127.0.0.3 [shard 1] \| 2020-04-26 16:19:47.694862 \| 127.0.0.1 \| 3277 \| 127.0.0.1 Message received from /127.0.0.1 [shard 1] \| 2020-04-26 16:19:47.696358 \| 127.0.0.3 \| 40 \| 127.0.0.1 Message received from /127.0.0.1 [shard 1] \| 2020-04-26 16:19:47.696442 \| 127.0.0.2 \| 32 \| 127.0.0.1 View updates for ks.t require read-before-write - base table reader is created [shard 1] \| 2020-04-26 16:19:47.697762 \| 127.0.0.3 \| 1444 \| 127.0.0.1 View updates for ks.t require read-before-write - base table reader is created [shard 1] \| 2020-04-26 16:19:47.698120 \| 127.0.0.2 \| 1710 \| 127.0.0.1 Generated 1 view update mutations [shard 1] \| 2020-04-26 16:19:47.699107 \| 127.0.0.3 \| 2789 \| 127.0.0.1 Sending view update for ks.t_v2_idx_index to 127.0.0.4, with pending endpoints = {}; base token = -3248873570005575792; view token = 1634052884888577606 [shard 1] \| 2020-04-26 16:19:47.699345 \| 127.0.0.3 \| 3027 \| 127.0.0.1 Sending a mutation to /127.0.0.4 [shard 1] \| 2020-04-26 16:19:47.699614 \| 127.0.0.3 \| 3296 \| 127.0.0.1 Generated 1 view update mutations [shard 1] \| 2020-04-26 16:19:47.699824 \| 127.0.0.2 \| 3414 \| 127.0.0.1 Locally applying view update for ks.t_v2_idx_index; base token = -3248873570005575792; view token = 1634052884888577606 [shard 1] \| 2020-04-26 16:19:47.700012 \| 127.0.0.2 \| 3603 \| 127.0.0.1 View updates for ks.t were generated and propagated [shard 1] \| 2020-04-26 16:19:47.700059 \| 127.0.0.3 \| 3741 \| 127.0.0.1 Message received from /127.0.0.3 [shard 1] \| 2020-04-26 16:19:47.700958 \| 127.0.0.4 \| 37 \| 127.0.0.1 Successfully applied local view update for 127.0.0.2 and 0 remote endpoints [shard 1] \| 2020-04-26 16:19:47.701522 \| 127.0.0.2 \| 5112 \| 127.0.0.1 View updates for ks.t were generated and propagated [shard 1] \| 2020-04-26 16:19:47.701615 \| 127.0.0.2 \| 5206 \| 127.0.0.1 Sending mutation_done to /127.0.0.1 [shard 1] \| 2020-04-26 16:19:47.701913 \| 127.0.0.3 \| 5595 \| 127.0.0.1 Mutation handling is done [shard 1] \| 2020-04-26 16:19:47.702489 \| 127.0.0.3 \| 6171 \| 127.0.0.1 Got a response from /127.0.0.3 [shard 1] \| 2020-04-26 16:19:47.702667 \| 127.0.0.1 \| 11082 \| 127.0.0.1 Delay decision due to throttling: do not delay, resuming now [shard 1] \| 2020-04-26 16:19:47.702689 \| 127.0.0.1 \| 11105 \| 127.0.0.1 Mutation successfully completed [shard 1] \| 2020-04-26 16:19:47.702784 \| 127.0.0.1 \| 11200 \| 127.0.0.1 Sending mutation_done to /127.0.0.1 [shard 1] \| 2020-04-26 16:19:47.703016 \| 127.0.0.2 \| 6606 \| 127.0.0.1 Done processing - preparing a result [shard 1] \| 2020-04-26 16:19:47.703054 \| 127.0.0.1 \| 11470 \| 127.0.0.1 Sending mutation_done to /127.0.0.3 [shard 1] \| 2020-04-26 16:19:47.703720 \| 127.0.0.4 \| 2800 \| 127.0.0.1 Mutation handling is done [shard 1] \| 2020-04-26 16:19:47.704527 \| 127.0.0.4 \| 3607 \| 127.0.0.1 Got a response from /127.0.0.4 [shard 1] \| 2020-04-26 16:19:47.704580 \| 127.0.0.3 \| 8262 \| 127.0.0.1 Delay decision due to throttling: do not delay, resuming now [shard 1] \| 2020-04-26 16:19:47.704606 \| 127.0.0.3 \| 8288 \| 127.0.0.1 Successfully applied view update for 127.0.0.4 and 1 remote endpoints [shard 1] \| 2020-04-26 16:19:47.704853 \| 127.0.0.3 \| 8535 \| 127.0.0.1 Mutation handling is done [shard 1] \| 2020-04-26 16:19:47.706092 \| 127.0.0.2 \| 9682 \| 127.0.0.1 Got a response from /127.0.0.2 [shard 1] \| 2020-04-26 16:19:47.709933 \| 127.0.0.1 \| 18348 \| 127.0.0.1 Request complete \| 2020-04-26 16:19:47.702582 \| 127.0.0.1 \| 11582 \| 127.0.0.1 Tests: unit(dev, debug)	2020-05-18 16:05:23 +02:00
Piotr Sarna	92aadb94e5	treewide: propagate trace state to write path In order to add tracing to places where it can be useful, e.g. materialized view updates and hinted handoff, tracing state is propagated to all applicable call sites.	2020-05-18 16:05:23 +02:00
Piotr Jastrzebski	cd33b9f406	cdc: Tune expired sstables check frequency CDC Log is a time series which uses time window compaction with some time window. Data is TTLed with the same value. This means that sstable won't become fully expired more often than once per time window duration. This patch sets expired_sstable_check_frequency_seconds compaction strategy parameter to half of the time window. Default value of this parameter is 10 minutes which in most cases won't be a good fit. By default, we set TTL to 24h and time window to 1h. This means that with a default value of the parameter we would be checking every 10 minutes but new expired sstable would appear only every 60 minutes. The parameter is set to half of the time window duration because it's the expected time we have to wait for sstable to become fully expired. Half of the time we will wait longer and half of the time we will wait shorter. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-18 16:49:19 +03:00
Alejo Sanchez	d1521e6721	lwt: validate before constructing metadata LWT batches conditions can't span multiple tables. This was detected in batch_statement::validate() called in ::prepare(). But ::cas_result_set_metadata() was built in the constructor, causing a bitset assert/crash in a reported scenario. This patch moves validate() to the constructor before building metadata. Closes #6332 Tested with https://github.com/scylladb/scylla-dtest/pull/1465 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-05-18 10:40:21 +02:00
Alejo Sanchez	74edb3f20b	lwt: consistent exception message case Fix case Batch -> BATCH to match similar exception in same file Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-05-18 10:40:06 +02:00
Avi Kivity	61a8c8c989	storage_proxy: capture "this" explicitly in lambda C++20 deprecates capturing this in default-copy lambdas ([=]), with good reason. Move to explicit captures to avoid any ambiguity and reduce warning spew. Message-Id: <20200517150921.754073-1-avi@scylladb.com>	2020-05-18 10:30:10 +03:00
Avi Kivity	2d933c62ec	thrift: capture "this" explicitly in lambda C++20 deprecates capturing this in default-copy lambdas ([=]), with good reason. Move to explicit captures to avoid any ambiguity and reduce warning spew. Message-Id: <20200517151023.754906-1-avi@scylladb.com>	2020-05-18 10:24:00 +03:00
Rafael Ávila de Espíndola	311fbe2f0a	repair: Make sure sinks are always closed In a recent next failure I got the following backtrace #3 0x00007efd71251a66 in __GI___assert_fail (assertion=assertion@entry=0x2d0c00 "this->_con->get()->sink_closed()", file=file@entry=0x32c9d0 "./seastar/include/seastar/rpc/rpc_impl.hh", line=line@entry=795, function=function@entry=0x270360 "seastar::rpc::sink_impl<Serializer, Out>::~sink_impl() [with Serializer = netw::serializer; Out = {repair_row_on_wire_with_cmd}]") at assert.c:101 #4 0x0000000001f5d2c3 in seastar::rpc::sink_impl<netw::serializer, repair_row_on_wire_with_cmd>::~sink_impl (this=<optimized out>, __in_chrg=<optimized out>) at ./seastar/include/seastar/core/future.hh:312 #5 0x0000000001f5d2f4 in seastar::shared_ptr_count_for<seastar::rpc::sink_impl<netw::serializer, repair_row_on_wire_with_cmd> >::~shared_ptr_count_for (this=0x60100075b680, __in_chrg=<optimized out>) at ./seastar/include/seastar/core/shared_ptr.hh:463 #6 seastar::shared_ptr_count_for<seastar::rpc::sink_impl<netw::serializer, repair_row_on_wire_with_cmd> >::~shared_ptr_count_for (this=0x60100075b680, __in_chrg=<optimized out>) at ./seastar/include/seastar/core/shared_ptr.hh:463 #7 0x000000000240f2e6 in seastar::shared_ptr<seastar::rpc::sink<repair_row_on_wire_with_cmd>::impl>::~shared_ptr (this=0x601003118590, __in_chrg=<optimized out>) at ./seastar/include/seastar/core/future.hh:427 #8 seastar::rpc::sink<repair_row_on_wire_with_cmd>::~sink (this=0x601003118590, __in_chrg=<optimized out>) at ./seastar/include/seastar/rpc/rpc_types.hh:270 #9 <lambda(auto:134&)>::<lambda(const seastar::rpc::client_info&, uint64_t, seastar::rpc::source<repair_hash_with_cmd>)>::<lambda(std::__exception_ptr::exception_ptr)>::~<lambda> (this=0x601003118570, __in_chrg=<optimized out>) at repair/row_level.cc:2059 This patch changes a few functions to use finally to make sure the sink is always closed. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200515202803.60020-1-espindola@scylladb.com>	2020-05-18 08:13:42 +03:00
Avi Kivity	beaeda5234	database: remove variadic future from query() and query_mutations() Variadic futures are deprecated; replace with future<std::tuple<...>>. Tests: unit (dev)	2020-05-17 18:45:38 +02:00
Nadav Har'El	4cf44ddbdf	docs: update alternator.md Some statements made in docs/alternator/alternator.md on having a single keyspace, or recommending a DNS setup, are not up-to-date. So fix them. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200517132444.9422-1-nyh@scylladb.com>	2020-05-17 18:38:13 +02:00
Nadav Har'El	1b807a5018	alternator test: better recognition that Alternator failed to boot The test/alternator/run script starts Scylla to be tested. It waits until CQL is responsive and if Scylla dies earlier, recognizes the failure immediately. This is useful so we see boot errors immediately instead of waiting for the first test to timeout and fail. However, Scylla starts the Alternator service after CQL. So it is possible that after the "run" script found CQL to be up, Alternator couldn't start (e.g., bad configuration parameters) and Scylla is shut down, and instead of recognizing this situation, we start the actual test. The fix is simple: don't start the tests until verifying that Alternator is up. We verify this using the trivial healthcheck request (which is nothing more than an HTTP GET request). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200517125851.8484-1-nyh@scylladb.com>	2020-05-17 18:33:27 +02:00
Nadav Har'El	2b9437076f	README.md: update instructions for building docker image The instructions in README.md about building a docker image start with "cd dist/docker", but it actually needs to be "cd dist/docker/redhat". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200517152815.15346-1-nyh@scylladb.com>	2020-05-17 18:29:55 +03:00
Tzach Livyatan	82dfab0a54	Fix a link to contributor-agreement in the CONTRIBUTING page	2020-05-17 14:15:49 +03:00
Avi Kivity	513faa5c71	Merge 'Use http Stream for describe ring' from Amnon " This series changes the describe_ring API to use HTTP stream instead of serializing the results and send it as a single buffer. While testing the change I hit a 4-year-old issue inside service/storage_proxy.cc that causes a use after free, so I fixed it along the way. Fixes #6297 " * amnonh-stream_describe_ring: api/storage_service.cc: stream result of token_range storage_service: get_range_to_address_map prevent use after free	2020-05-17 14:05:26 +03:00
Amnon Heiman	7c4562d532	api/storage_service.cc: stream result of token_range The get token range API can become big which can cause large allocation and stalls. This patch replace the implementation so it would stream the results using the http stream capabilities instead of serialization and sending one big buffer. Fixes #6297 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-05-17 13:56:05 +03:00
Amnon Heiman	69a46d4179	storage_service: get_range_to_address_map prevent use after free The implementation of get_range_to_address_map has a default behaviour, when getting an empty keypsace, it uses the first non-system keyspace (first here is basically, just a keyspace). The current implementation has two issues, first, it uses a reference to a string that is held on a stack of another function. In other word, there's a use after free that is not clear why we never hit. The second, it calls get_non_system_keyspaces twice. Though this is not a bug, it's redundant (get_non_system_keyspaces uses a loop, so calling that function does have a cost). This patch solves both issues, by chaning the implementation to hold a string instead of a reference to a string. Second, it stores the results from get_non_system_keyspaces and reuse them it's more efficient and holds the returned values on the local stack. Fixes #6465 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-05-17 13:53:13 +03:00
Dejan Mircevski	8db7e4cc96	cql: Add test for invalid unbounded DELETE In `add40d4e59`, we relaxed the prohibition of unbounded DELETE and stopped testing the failure message. But there are still scenarios when unbounded DELETE is prohibited, so add a test to ensure we continue to catch it where appropriate. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-05-17 12:28:36 +03:00
Avi Kivity	b155eef726	Merge "allow early aborts through abort sources." from Glauber " The shutdown process of compaction manager starts with an explicit call from the database object. However that can only happen everything is already initialized. This works well today, but I am soon to change the resharding process to operate before the node is fully ready. One can still stop the database in this case, but reshardings will have to finish before the abort signal is processed. This patch passes the existing abort source to the construction of the compaction_manager and subscribes to it. If the abort source is triggered, the compaction manager will react to it firing and all compactions it manages will be stopped. We still want the database object to be able to wait for the compaction manager, since the database is the object that owns the lifetime of the compaction manager. To make that possible we'll use a future that is return from stop(): no matter what triggered the abort, either an early abort during initial resharding or a database-level event like drain, everything will shut down in the right order. The abort source is passed to the database, who is responsible from constructing the compaction manager Tests: unit (debug), manual start+stop, manual drain + stop, previously failing dtests. "	2020-05-17 11:49:00 +03:00
Avi Kivity	777d5e88c3	types: support altering fixed-size integer types to varint Fixed-size integer types are legal varints - both are serialized as two's complement in network byte order. So there's tinyint, shortint, int, and bigint can be interpreted as varints. Change is_compatible_with() to reflect that. Message-Id: <20200516115143.28690-2-avi@scylladb.com>	2020-05-17 11:31:00 +03:00
Avi Kivity	ff57e4d9a5	types: make short and byte types value-compatible with varint The short and byte types are two's complement network byte order, just like varint (except fixed size) and so varint can read them just fine. Mark them as value compatible like int32_type and long_type. A unit test is added. Message-Id: <20200516115143.28690-1-avi@scylladb.com>	2020-05-17 11:31:00 +03:00
Benny Halevy	a96087165a	hints: get_device_id: use seastar file_stat This avoids potential use-after-move, since undefined c++ sequencing order may std::move(f) in the lambda capture before evaluating f.stat(). Also, this makes use of a more generic library function that doesn't require to open and hold on to the file in the application. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200514152054.162168-1-bhalevy@scylladb.com>	2020-05-15 10:11:45 +02:00
Asias He	b2c4d9fdbc	repair: Fix race between write_end_of_stream and apply_rows Consider: n1, n2, n1 is the repair master, n2 is the repair follower. === Case 1 === 1) n1 sends missing rows {r1, r2} to n2 2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1 is written to sstable, r2 is not written yet, r1 belongs to partition 1, r2 belongs to partition 2. It yields after row r1 is written. data: partition_start, r1 3) n1 sends repair_row_level_stop to n2 because error has happened on n1 4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream() data: partition_start, r1, partition_end 5) Step 2 resumes to apply the rows. data: partition_start, r1, partition_end, partition_end, partition_start, r2 === Case 2 === 1) n1 sends missing rows {r1, r2} to n2 2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1 is written to sstable, r2 is not written yet, r1 belongs to partition 1, r2 belongs to partition 2. It yields after partition_start for r2 is written but before _partition_opened is set to true. data: partition_start, r1, partition_end, partition_start 3) n1 sends repair_row_level_stop to n2 because error has happened on n1 4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream(). Since _partition_opened[node_idx] is false, partition_end is skipped, end_of_stream is written. data: partition_start, r1, partition_end, partition_start, end_of_stream This causes unbalanced partition_start and partition_end in the stream written to sstables. To fix, serialize the write_end_of_stream and apply_rows with a semaphore. Fixes: #6394 Fixes: #6296 Fixes: #6414	2020-05-14 18:15:01 +03:00
Pekka Enberg	96e35f841c	docs/redis: API reference documentation The Redis API in Scylla only supports a small subset of the Redis commands. Let's document what we support so people have the right expectations when they try it out.	2020-05-14 17:33:39 +03:00
Benny Halevy	0d4b93b11d	sstable: fix potential use-after-move sites Avoid `f(s).then([s = std::move(s)] {})` patterns, where the move into the lambda capture may potentially be sequenced by the compiler before passing `s` to function `f`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200514131701.140046-1-bhalevy@scylladb.com>	2020-05-14 16:06:07 +02:00
Nadav Har'El	f3fd976120	docs, alternator: improve description of status of global tables support The existing text did not explain what happens if additional DCs are added to the cluster, so this patch improves the explanation of the status of our support for global tables, including that issue. Fixes #6353 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200513175908.21642-1-nyh@scylladb.com>	2020-05-14 08:03:16 +02:00
Glauber Costa	7423ccc318	compaction_manager: allow early aborts through abort sources. The shutdown process of compaction manager starts with an explicit call from the database object. However that can only happen everything is already initialized. This works well today, but I am soon to change the resharding process to operate before the node is fully ready. One can still stop the database in this case, but reshardings will have to finish before the abort signal is processed. This patch passes the existing abort source to the construction of the compaction_manager and subscribes to it. If the abort source is triggered, the compaction manager will react to it firing and all compactions it manages will be stopped. We still want the database object to be able to wait for the compaction manager, since the database is the object that owns the lifetime of the compaction manager. To make that possible we'll use a future that is return from stop(): no matter what triggered the abort, either an early abort during initial resharding or a database-level event like drain, everything will shut down in the right order. The abort source is passed to the database, who is responsible from constructing the compaction manager. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-05-13 16:51:25 -04:00
Glauber Costa	45dc9cc6e5	compaction_manager: carve out a drain method We want stop() to be callable just once. Having the compaction manager stopped twice is a potential indication that something is wrong. Still there are places where we want to stop all ongoing compactions and prevent new from running - like the drain operation. Today the only operation that allows for cancellation of all existing compations is stop(). To unweave this, we will split those two things. A drain operation is carved out, and it should be safe to be called many times. The compaction manager is usable after this, and new compactions can even be sent if it happen to be enabled again (we currently don't) A stop operation, which includes a drain, will only be allowed once. After a stop() the compaction_manager object is no longer usable. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-05-13 16:51:25 -04:00
Glauber Costa	e29701ca1c	compaction_manager: expand state to be able to differentiate between enabled and stopped We are having many issues with the stop code in the compaction_manager. Part of the reason is that the "stopped" state has its meaning overloaded to indicate both "compaction manager is not accepting compactions" and "compaction manager is not ready or destructed". In a later step we could default to enabled-at-start, but right now we maintain current behavior to minimize noise. It is only possible to stop the compaction manager once. It is possible to enable / disable the compaction manager many times. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-05-13 16:51:25 -04:00
Nadav Har'El	62c00a3f17	merge: Use time window compaction strategy for CDC Log table Merged pull request https://github.com/scylladb/scylla/pull/6427 by Piotr Jastrzębski: CDC Log is a time series so it makes sense to use time window compaction strategy for it. Our support for time series is limited so we make sure that we don't create more than 24 sstables. If TTL is configured to 0, meaning data does not expire, we don't use time window compaction strategy. This PR also sets gc_grace_seconds to 0 when TTL is not set to 0.	2020-05-13 14:36:43 +03:00
Benny Halevy	94a558e9a8	test.py: print test command line and env to log Print the test command line and the UBSAN and ASAN env settings to the log so the run can be easily reproduced (optionally with providing --random-seed=XXX that is printed by scylla unit tests when they start). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200513110959.32015-1-bhalevy@scylladb.com>	2020-05-13 14:27:15 +03:00
Raphael S. Carvalho	c06cdcdb3c	table: Don't allow a shared SSTable to be selected for regular compaction After commit `88d2486fca`, removal of shared SSTables is not atomic anymore. They can be first removed from the list of shared SSTables and only later be removed from the SSTable set. That list is used to filter out shared SSTables from regular compaction candidates. So it can happen that regular compaction pick up a shared SSTable as candidate after it was removed from that list but before it was removed from the set. To fix this, let's only remove a shared SSTable from that aforementioned list after it was successfully removed from the SSTable set, so that a shared SSTable cannot be selected for regular compaction anymore. Fixes #6439. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200512175224.114487-1-raphaelsc@scylladb.com>	2020-05-13 10:43:48 +03:00
Avi Kivity	fc5568167b	tests: like_matcher_test: adjust for C++20 char8_t C++20 makes string literals defined with u8"my string" as using a new type char8_t. This is sensible, as plain char might not have 8 bits, but conflicts with our bytes type. Adjust by having overloads that cast back to char*. This limits us to environments where char is 8 bits, but this is already a restriction we have. Reviewed-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20200512101646.127688-1-avi@scylladb.com>	2020-05-13 09:37:39 +03:00
Avi Kivity	33fda05388	counters: change deprecated std::is_pod<> to replacement C++20 deprecates std::is_pod<> in favor of the easier-to-type std::is_starndard_layout<> && std::is_trivial<>. Change to the recommendation in order to avoid a flood of warnings. Reviewed-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200512092200.115351-1-avi@scylladb.com>	2020-05-13 09:36:52 +03:00
Avi Kivity	2afd40fe6f	tracing: use correct std::memory_order_* scoping std::memory_order is an unscoped enum, and so does not need its members to be prefixed with std::memory_order::, just std::. This used to work, but in C++20 it no longer does. Use the standard way to name these constants, which works in both C++17 and C++20. Reviewed-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200512092408.115649-1-avi@scylladb.com>	2020-05-13 09:36:23 +03:00
Avi Kivity	8d4bdc49f1	tests: sstable_run_based_compaction_strategy_for_tests: adjust for C++20 pass-by-value in std::accumulate C++20 changed the parameter to the binary operation function in std::accumulate() to be passed by value (quite sensibly). Adjust the code to be compatible by using a #if. This will be removed once we switch over to C++20. Message-Id: <20200512105427.142423-1-avi@scylladb.com>	2020-05-12 20:41:16 +02:00
Avi Kivity	74c1db7f59	tests: like_matcher_test: add casts for utf8 string literals C++20 makes string literals defined with u8"foo" return a new char8_t. This is sensible but is noisy for us. Cast them to plain const char. Message-Id: <20200512104751.137816-1-avi@scylladb.com>	2020-05-12 20:41:02 +02:00
Avi Kivity	07061f9a00	duration: adjust for C++20 char8_t type C++20 makes string literals defined with u8"blah" return a new char8_t type, which is sensible but noisy here. Adjust for it by dropping an unneeded u8 in one place, and adding a cast in another. Message-Id: <20200512104515.137459-1-avi@scylladb.com>	2020-05-12 20:40:30 +02:00
Avi Kivity	89ea879ba9	storage_proxy: adjust for C++20 std::accumulate() pass-by-value C++20 passes the input to the binary operation by value (which is sensible), but is not compatible with C++17. Add some #if logic to support both methods. We can remove the logic when we fully transition to C++20. Message-Id: <20200512101355.127333-1-avi@scylladb.com>	2020-05-12 20:39:21 +02:00
Tomasz Grabiec	df4b698309	Merge "Add more defenses against empty keys" from Botond In theory we shouldn't have empty keys in the database, as we validate all keys that enter the database via CQL with `validation::validate_cql_keys()`, which will reject empty keys. In this context, empty means a single-component key, with its only component being empty. Yet recently we've seen empty keys appear in a cluster and wreak havoc on it, as they will cause the memtable flush to fail due to the sstable summary rejecting the empty key. This will cause an infinite loop, where Scylla keeps retrying to flush the memtable and failing. The intermediate consequence of this is that the node cannot be shut down gracefully. The indirect consequence is possible data loss, as commitlog files cannot be replayed as they just re-insert the empty key into the memtable and the infinite flush retry circle starts all over again. A workaround is to move problematic commitlog files away, allowing the node to start up. This can however lead to data loss, if multiple replicas had to move away commitlogs that contain the same data. To prevent the node getting into an unusable state and subsequent data loss, extend the existing defenses against invalid (empty) keys to the commitlog replay, which will now ignore them during replay. Fixes: #6106 * denesb/empty-keys/v5: commitlog_replayer: ignore entries with invalid keys test: lib/sstable_utils: add make_keys_for_shard validation: add is_cql_key_invalid() validation: validate_cql_key(): make key parameter a `partition_key_view` partition_key_view: add validate method	2020-05-12 20:36:40 +02:00
Avi Kivity	72172effc8	transport: stop using boost::bimap<> We use boost::bimap for bi-directional conversion from protocol type encodings to type objects. Unfortunately, boost::bimap isn't C++20-ready. Fortunately, we only used one direction of the bimap. Replace with plain old std::unordered_map<>. Message-Id: <20200512103726.134124-1-avi@scylladb.com>	2020-05-12 18:55:26 +03:00
Botond Dénes	74b020ad05	main: run redis service in the statement scheduling group Like all the other API services (CQL, thrift and alternator). Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200512145631.104051-1-bdenes@scylladb.com>	2020-05-12 18:01:27 +03:00
Piotr Dulikowski	0c5ac0da98	hinted handoff: remove discarded hint positions from rps_set Related commit: `85d5c3d` When attempting to send a hint, an exception might occur that results in that hint being discarded (e.g. keyspace or table of the hint was removed). When such an exception is thrown, position of the hint will already be stored in rps_set. We are only allowed to retain positions of hints that failed to be sent and needed to be retried later. Dropping a hint is not an error, therefore its position should be removed from rps_set - but current logic does not do that. Because of that bug, hint files with many discardable hints might cause rps_set to grow large when the file is replayed. Furthermore, leaving positions of such hints in rps_set might cause more hints than necessary to be re-sent if some non-discarded hints fail to be sent. This commit fixes the problem by removing positions of discarded hints from rps_set. Fixes #6433	2020-05-12 15:13:59 +02:00
Avi Kivity	05e19078f6	storage_proxy: replace removed std::not1() by replacement std::not_fn() C++17 deprecated std::not1() and C++20 removed it; replace with its successor. Message-Id: <20200512101205.127046-1-avi@scylladb.com>	2020-05-12 14:05:03 +03:00
Avi Kivity	e774ee06ed	Update seastar submodule * seastar e708d1df3a...92365e7b87 (11): > tests: distributed_test: convert to SEASTAR_TEST_CASE > Merge "Avoid undefined behavior on future self move assignments" from Rafael > Merge "C++20 support" from Avi > optimized_optional: don't use experimental C++ features > tests: scheduling_group_test: verify that later() doesn't modify the current group > tests: demos: coroutine_demo: add missing include for open_file_dma() > rpc: minor documentation improvements > rpc: Assert that sinks are closed > Merge "Fix most tests under valgrind" from Rafael > distributed_test: Fix it on slow machines > rpc_test: Make sure we always flush and close the sink loading_shard_values.hh: added missing include for gcc6-concepts.hh, exposed by the submodule update. Frozen toolchain updated for the new valgrind dependency.	2020-05-12 14:04:16 +03:00
Botond Dénes	6083ed668b	commitlog_replayer: ignore entries with invalid keys When replaying the commitlog, pass keys to `validation::validate_cql_key()`. Discard entries which fail validation and warn about it in the logs. This prevents invalid keys from getting into the system, possibly failing the commitlog replay and the successful boot of the node, preventing the node from recovering data.	2020-05-12 12:07:21 +03:00
Botond Dénes	e0f5ef5ef0	test: lib/sstable_utils: add make_keys_for_shard A variant of make_keys() which creates keys for the requested shard. As this version is more generic than the existing local_shards_only variant, the former is reimplemented on top of the latter.	2020-05-12 12:07:21 +03:00
Botond Dénes	dd76e8c8de	validation: add is_cql_key_invalid()	2020-05-12 12:07:00 +03:00
Botond Dénes	95bf3a75de	validation: validate_cql_key(): make key parameter a `partition_key_view` This is more general than the previous `const partition_key&` and allows for passing keys obtained from the likes of `frozen_mutation` that only have a view of the key. While at it also change the schema parameter from schema_ptr to const schema&. No need to pass a shared pointer.	2020-05-12 12:07:00 +03:00
Botond Dénes	84c47c4228	partition_key_view: add validate method We want to be able to pass `partition_key_view` to `validation::validate_cql_key()`. As the latter wants to call `validate()` on the key, replicate `partition_key::validate()` in `partition_key_view`.	2020-05-12 12:07:00 +03:00
Asias He	b744dba75a	repair: Abort the queue in write_end_of_stream in case of error In write_end_of_stream, it does: 1) Write write_partition_end 2) Write empty mutation_fragment_opt If 1) fails, 2) will be skipped, the consumer of the queue will wait for the empty mutation_fragment_opt forever. Found this issue when injecting random exceptions between 1) and 2). Refs #6272 Refs #6248	2020-05-12 10:50:52 +02:00
Avi Kivity	f1fde537a9	Merge 'Support Snapshot of multiple tables' from Amnon This series adds support for taking a snapshot of multiple tables. Fixes #6333 * amnonh-snapshot_keyspace_table: api/storage_service.cc: Snapshot, support multiple tables service/storage_service: Take snapshot of multiple tables	2020-05-12 11:34:09 +03:00
Piotr Jastrzebski	49b6010cb4	cdc: Use time window compaction strategy for CDC Log table CDC Log is a time series with data TTLed by default to 24 hours so it makes sense to use for it a time window compaction. A window size is adjusted to the TTL configured for CDC Log so that no more than 24 sstables will be created. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-12 07:53:40 +02:00
Glauber Costa	70a89ab4ab	compaction: do not assume I/O priority class We shouldn't assume the I/O priority class for compactions. For instance, if we are dealing with offstrategy compactions we may want to use the maintenance group priority for them. For now, all compactions are put in the compaction class. rewrite compactions (scrub, cleanup) could be maintenance, but we don't have clear access to the database object at this time to derive the equivalent CPU priority. This is planned to be changed in the future, and when we do change it, we'll adjust. Same goes for resharding: while we could at this point change it we'd risking memory pressure since resharding is run online and sstables are shared until resharding is done. When we move it to offline execution we'll do it with maintenance priority. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200512002233.306538-3-glauber@scylladb.com>	2020-05-12 08:23:19 +03:00
Glauber Costa	4234538292	compaction: pass descriptor all the way down to compaction object. To do that - and still avoid a copy - we need to add some fields to the compaction object that are exclusive to regular_compaction. Still, not only this simplifies the code, resharding and regular compaction look more and more alike. This is done now in preparation for another patch that will add more information to the descriptor. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200512002233.306538-2-glauber@scylladb.com>	2020-05-12 08:23:19 +03:00
Piotr Sarna	5f2eadce09	alternator: wait for schema agreement after table creation In order to be sure that all nodes acknowledged that a table was created, the CreateTable request will now only return after seeing that schema agreement was reached. Rationale: alternator users check if the table was created by issuing a DescribeTable request, and assume that the table was correctly created if it returns nonempty results. However, our current implementation of DescribeTable returns local results, which is not enough to judge if all the other nodes acknowledge the new table. CQL drivers are reported to always wait for schema agreement after issuing DDL-changing requests, so there should be no harm in waiting a little longer for alternator's CreateTable as well. Fixes #6361 Tests: alternator(local)	2020-05-11 21:51:12 +03:00
Piotr Jastrzebski	0cd0775a27	cdc: Set CDC Log gc_grace_seconds to 0 Data in CDC Log is TTLed and we want to remove it as soon as it expires. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-11 17:59:52 +02:00
Piotr Sarna	517f2c0490	alternator: unify error messages for existing tables/keyspaces Since alternator is based on Scylla, two "already exists" error types can appear when trying to create a table - that a table itself exists, or that its keyspace does. That's however an implementation detail, since alternator does not have a notion of keyspaces at all. This patch unifies the error message to simply mention that a table already exists, and comes with a more robust test case. If the keyspace already exists, table creation will still be attempted. Fixes #6340 Tests: alternator(local, remote)	2020-05-11 18:30:02 +03:00
Gleb Natapov	d555fb60d7	lwt: add counters for background and foreground paxos operations Paxos may leave an operation in a background after returning result to a caller. Lest add a counter for background/foreground paxos handlers so that it will be easier to detect memory related issues. Message-Id: <20200510092942.GA24506@scylladb.com>	2020-05-11 14:37:00 +02:00
Avi Kivity	f4a703fc66	Merge "tools/scylla-types: add compound_type and validation support" from Botond " A good portion of the values that one would want to be examine with scylla-tools will be partition or clustering keys. While examining them was possible before too, especially for single component keys, it required manually extracting the components from it, so they can be individually examined. This series adds support for working with keys directly, by adding prefixable and full compound type support. When passing --prefix-compound or --full-compound, multiple types can be passed, which will form the compound type. Example: $ scylla_types --print --prefix-compound -t TimeUUIDType -t Int32Type 0010d00819896f6b11ea00000000001c571b000400000010 (d0081989-6f6b-11ea-0000-0000001c571b, 16) Another feature added in this series is validation. For this, `compound_type::validate()` had to be implemented first. We already use this in our code, but currently has a no-op body. Example: $ scylla-types --validate --full-compound -t TimeUUIDType -t Int32Type 0010d00819896f6b11ea00000000001c571b0004000000 0010d00819896f6b11ea00000000001c571b0004000000: INVALID - seastar::internal::backtraced<marshal_exception> (marshaling error: compound_type iterator - not enough bytes, expected 4, got 3 Backtrace: 0x1b2e30f 0x85c9d5 0x85cb07 0x85cc7b 0x85cd7c 0x85d2d7 0x844e03 0x84241b 0x84490b 0x844ae5 0x19c0362 0x19c0741 0x19c13d1 0x19c4b44 0x8aeb7a 0x8aeca7 0x19ebc90 0x19fb8d5 0x1a12b49 0x19c4376 0x19c47a6 0x19c4900 0x843373 /lib64/libc.so.6+0x271a2 0x84202d ) Tests: unit(dev) " * 'tools-scylla-types-compound-support/v1' of https://github.com/denesb/scylla: tools/scylla_types: add validation action tools/scylla_types: add compound_type support tools/scylla_types: single source of truth for actions compound_type: implement validate() compound_type: fix const correctness tools: mv scylla_types scylla-types	2020-05-11 15:28:33 +03:00
Juan Ramon Martin	9d0198140b	dist/docker: Add "--reserve-memory" command line option Fixes #6311	2020-05-11 13:34:42 +03:00
Piotr Dulikowski	85d5c3d5ee	hinted handoff: don't keep positions of old hints in rps_set When sending hints from one file, rps_set field in send_one_file_ctx keeps track of commitlog positions of hints that are being currently sent, or have failed to be sent. At the end of the operation, if sending of some hints failed, we will choose position of the earliest hint that failed to be sent, and will retry sending that file later, starting from that position. This position is stored in _last_not_complete_rp. Usually, this set has a bounded size, because we impose a limit of at most 128 hints being sent concurrently. Because we do not attempt to send any more hints after a failure is detected, rps_set should not have more than 128 elements at a time. Due to a bug, commitlog positions of old hints (older than gc_grace_seconds of the destination table) were inserted into rps_set but not removed after checking their age. This could cause rps_set to grow very large when replaying a file with old hints. Moreover, if the file mixed expired and non-expired hints (which could happen if it had hints to two tables with different gc_grace_seconds), and sending of some non-expired hints failed, then positions of expired hints could influence calculation _last_not_complete_rp, and more hints than necessary would be resent on the next retry. This simple patch removes commitlog position of a hint from rps_set when it is detected to be too old. Fixes #6422	2020-05-11 11:33:31 +02:00
Avi Kivity	76d21a0c22	Merge 'Make it possible to turn caching off per table and stop caching CDC Log' from Piotr J. " We inherited from Origin a `caching` table parameter. It's a map of named caching parameters. Before this PR two caching parameters were expected: `keys` and `rows_per_partition`. So far we have been ignoring them. This PR adds a new caching parameter called `enabled` which can be set to `true` or `false` and controls the usage of the cache for the table. By default, it's set to `true` which reflects Scylla behavior before this PR. This new capability is used to disable caching for CDC Log table. It is desirable because CDC Log entries are not expected to be read often. They also put much more pressure on memory than entries in Base Table. This is caused by the fact that some writes to Base Table can override previous writes. Every write to CDC Log is unique and does not invalidate any previous entry. Fixes #6098 Fixes #6146 Tests: unit(dev, release), manual " * haaawk-dont_cache_cdc: cdc: Don't cache CDC Log table table: invalidate disabled cache on memtable flush table: Add cache_enabled member function cf_prop_defs: persist caching_options in schema property_definitions: add get that returns variant feature: add PER_TABLE_CACHING feature caching_options: add enabled parameter	2020-05-10 15:39:42 +03:00
Avi Kivity	9d91ac345a	dist: redhat: drop dependency on pystache We use pystache to parametrize our scylla.spec, but pystache is not present in Fedora 32. Fortunately rpm provides its own template mechanism, and this patch switches to using it: - no longer install pystache - pass parameters via rpm "-D" options - use 0/1 for conditionals instead of true/false as per rpm conventions - sanitize the "product" variable to not contain dashes - change the .spec file to use rpm templating: %{...} and %if ... %endif instead of mustache templating	2020-05-10 14:42:31 +03:00
Avi Kivity	5b971397aa	Revert "compaction_manager: allow early aborts through abort sources." This reverts commit `e8213fb5c3`. It results in an assertion failure in remove_index_file_test. Fixes #6413.	2020-05-10 12:32:18 +03:00
Raphael S. Carvalho	88d2486fca	sstables: Synchronize deletion of SSTables in resharding with other operations Input SSTables of resharding is deleted at the coordinator shard, not at the shards they belong to. We're not acquiring deletion semaphore before removing those input SSTables from the SSTable set, so it could happen that resharding deletes those SSTables while another operation like snapshot, which acquires the semaphore, find them deleted. Let's acquire the deletion semaphore so that the input SSTables will only be removed from the set, when we're certain that nobody is relying on their existence anymore. Now resharding will only delete input SStables after they're safely removed from the SSTable set of all shards they belong to. unit: test(dev). Fixes #6328. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200507233636.92104-1-raphaelsc@scylladb.com>	2020-05-10 10:50:32 +03:00
Takuya ASADA	4d957eeda7	dist/redhat/python3: drop dependency on pystache Same as dist/redhat, stop using mustache since pystache is no longer available on Fedora 32. see: https://github.com/scylladb/scylla/pull/6313	2020-05-09 23:35:33 +03:00
Nadav Har'El	7da949026d	doc, alternator: shorten description of "tags" compatibility The "current compatibility with DynamoDB" section in alternator.md is where we should list very briefly our state of compatibility - it's not the right place to explain implementation details or track obscure bugs. I've significantly shortened the "Tags" section because, in brief, we do fully support tags and should say that we do. I moved the two bugs mentioned in the text into the bug tracker: Refs #6389 Refs #6391 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200507125022.22608-1-nyh@scylladb.com>	2020-05-07 17:48:34 +02:00
Tomasz Grabiec	2078016f84	test: memory_footprint: Avoid invalid identifiers as columnnames Column name should not start with a digit, as can be the case with random_string(). Message-Id: <1588860648-15796-1-git-send-email-tgrabiec@scylladb.com>	2020-05-07 17:33:34 +03:00
Pavel Emelyanov	ef181fb2d0	test: Add option to flush memtables for perf_simple_query The test in question measures the speed of memtables, not the row_cache. With this option it can do both. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200507140603.12350-1-xemul@scylladb.com>	2020-05-07 16:09:40 +02:00
Botond Dénes	4e83cc4413	tools/scylla_types: add validation action Allow validating values according to their declared type.	2020-05-07 16:35:23 +03:00
Botond Dénes	4662ad111c	tools/scylla_types: add compound_type support Allow examining partition and clustering keys, by adding support for full and prefix compound types. The members of the compound type are specified by passing several types with --type on the command line.	2020-05-07 16:35:21 +03:00
Ivan Prisyazhnyy	84e25e8ba4	api: support table auto compaction control The patch implements: - /storage_service/auto_compaction API endpoint - /column_family/autocompaction/{name} API endpoint Those APIs allow to control and request the status of background compaction jobs for the existing tables. The implementation introduces the table::_compaction_disabled_by_user. Then the CompactionManager checks if it can push the background compaction job for the corresponding table. New members === table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const Test === Tests: unit(sstable_datafile_test autocompaction_control_test), manual $ ninja build/dev/test/boost/sstable_datafile_test $ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000 The test tries to submit a compaction job after playing with autocompaction control table switch. However, there is no reliable way to hook pending compaction task. The code assumed that with_scheduling_group() closure will never preempt execution of the stats check. Revert === Reverts commit `c8247ac`. In previous version the execution sometimes resulted into the following error: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed This version adds a few sstables to the cf, starts the compaction and awaits until it is finished. API change === - `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled). - `/column_family/autocompaction/` got support for POST/DELETE per table Fixes === Fixes #1488 Fixes #1808 Fixes #440 Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com>	2020-05-07 16:23:38 +03:00
Botond Dénes	70331bad6f	tools/scylla_types: single source of truth for actions Currently the available actions are documented in several different places: * code implementing them * description * documentation for --action * error message that validates value for --action This is guaranteed to result in incorrect, possibly self-contradicting documentation. Resolve by generating all documentation from the handler registry, which now also contains the description of the action. Also have a separate flag for each action, instead of --action=$ACTION.	2020-05-07 16:20:18 +03:00
Botond Dénes	84e38ae358	compound_type: implement validate() Validate the number of present components, then validate each of them. A unit test for both the prefix and full instances is also added.	2020-05-07 16:19:56 +03:00
Botond Dénes	3e400cf54e	compound_type: fix const correctness Make all methods that don't mutate members const.	2020-05-07 16:15:11 +03:00
Botond Dénes	7176660e12	tools: mv scylla_types scylla-types Using hypen seems to be the standard among executables.	2020-05-07 15:14:59 +03:00
Nadav Har'El	e9aa1173e0	doc, alternator: better documentation for write isolation policies Alternator supports four different write isolation policies, the default being to do all the writes with LWT, but these policies were only briefly explained in alternator.md. This patch significantly expands on this explanation, better explaining the tradeoffs involved in these four options, and when each might make sense (if at all). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200506235152.18190-1-nyh@scylladb.com>	2020-05-07 13:59:38 +02:00
Nadav Har'El	f12989ff73	alternator/test: minor cleanup in test_key_condition_expression.py Some minor cleanups, mostly comments, in test_key_condition_expression.py Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200506212849.16207-1-nyh@scylladb.com>	2020-05-07 13:58:44 +02:00
Botond Dénes	791acc7f38	sstables: sstable_reader: fix read range upper bound calculation for reverse slices The single-key sstable reader uses the clustering ranges from the slice to determine the upper bound of the disk read-range using the index. For this is simply uses the end bound of the last clustering ranges. For reverse reads however the clustering ranges in the slice are in reverse order, so this will in fact be the upper bound of the smallest range. Depending on whether the distance between the clustering range is big enough for the sstable reader to use the index to skip between them, this will lead to either reading too little data or an assert failure. This patch fixes the problematic function `get_slice_upper_bound()` to consider reverse reads as well. Initially I thought there will be more mishandling of reverse slices, but actually `mutation_fragment_filter`, the component doing the actual slicing of rows, is already reverse-slice aware. A unit test which reproduces the assert failure is also added. Fixes: #6171 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200507114956.271799-1-bdenes@scylladb.com>	2020-05-07 14:52:04 +03:00
Avi Kivity	bef8e5e930	Merge "Don't invalidate row cache when adding GC SStable to SSTable Set" from Raphael " Garbage collected SSTables, created by incremental compaction process, are being added to the SSTable set using a function that invalidates row cache using the range of the SSTable itself. That's incorrect because data in GC SSTables come from preexisting SSTables in set, meaning the state of data isn't changed and so no need for invalidation at all. Incorrect invalidation like this is a source of read performance issues. This problem is fixed by including GC SSTables to the descriptor which is used to specify changes to the SSTable set, which is the correct thing to do given that a midway failure could leave the set in an incorrect state. Fixes #5956. Fixes #6275. tests: unit(dev) " * 'fix_issue_5956_v4' of github.com:raphaelsc/scylla: sstables/compaction: Don't invalidate row cache when adding GC SSTable to SSTable set sstables/compaction: Change meaning of compaction_completion_desc input and output fields sstables/compaction: Clean up code around garbage_collected_sstable_writer	2020-05-07 14:10:49 +03:00
Glauber Costa	e8213fb5c3	compaction_manager: allow early aborts through abort sources. The shutdown process of compaction manager starts with an explicit call from the database object. However that can only happen everything is already initialized. This works well today, but I am soon to change the resharding process to operate before the node is fully ready. One can still stop the database in this case, but reshardings will have to finish before the abort signal is processed. This patch passes the existing abort source to the construction of the compaction_manager and subscribes to it. If the abort source is triggered, the compaction manager will react to it firing and all compactions it manages will be stopped. We still want the database object to be able to wait for the compaction manager, since the database is the object that owns the lifetime of the compaction manager. To make that possible we'll use a future that is return from stop(): no matter what triggered the abort, either an early abort during initial resharding or a database-level event like drain, everything will shut down in the right order. The abort source is passed to the database, who is responsible from constructing the compaction manager. Tests: unit (dev), manual start+stop, manual drain + stop Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200506184749.98288-1-glauber@scylladb.com>	2020-05-07 13:24:47 +03:00
Asias He	71d0d58f8c	Revert "config: Do not enable repair based node operations by default" This reverts commit `b8ac10c451`. The repair based node operations will be enabled by default in 4.1. Revert the patch which disables it by default.	2020-05-07 13:17:35 +03:00
Avi Kivity	fbf2194b31	Merge 'cql3: Fix detection of bound variables in tuples' from Juliusz This is unrelated to counters, but happens to fix #4209 `tuple::delayed_value::contains_bind_marker` used to check that ALL terms are bound (not that ANY of them is bound). As a result, scylla would crash in prepare codepath for collections of tuples. After this fix `invalid_request_exception` is thrown instead. * jul-stas-4209-crash-on-counter-shards-set: boost/tests: test for bound variable in a list of tuple literals cql3: fix detection of bound variables in tuples	2020-05-07 13:13:51 +03:00
Botond Dénes	2e09a0317c	types, compound: pass std::current_exception() to on_internal_error() So that nested exceptions are not lost. Also, marshal exceptions, the ones we have in these places, already have a backtrace, so might as well use that, instead of creating a new one, loosing unwound frames. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200507091405.244544-1-bdenes@scylladb.com>	2020-05-07 11:25:25 +02:00
Juliusz Stasiewicz	7b48d8c33c	boost/tests: test for bound variable in a list of tuple literals This test checks that the list literals of tuples with some (but not all!) bind markers are rejected.	2020-05-07 11:03:53 +02:00
Pavel Solodovnikov	55d89d2cbe	lwt: add cql tests to test delete+insert behavior on the same row in one batch Add a couple of cql tests regarding conditional batches: 1. Verify that "delete" takes priority over "insert" when applied to the same row within the same batch. 2. Test that a workaround for the issue works as expected (i.e. delete only individual cells instead of the full record). Tests: unit(dev) Fixes: #6273 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200506201200.176590-1-pa.solodovnikov@scylladb.com>	2020-05-07 10:53:22 +02:00
Tomasz Grabiec	b0f2d2bee0	Merge "lwt: fix linearisability issues with reads and writes with non met conditions" form Gleb Fixes #6299.	2020-05-07 10:49:01 +02:00
Juliusz Stasiewicz	b46d7cf8d1	cql3: fix detection of bound variables in tuples `tuple::delayed_value::contains_bind_marker` used to check that ALL terms are bound (not that ANY of them is bound). As a result, scylla would crash in prepare codepath for collections. After this fix `invalid_request_exception` is thrown instead. Fixes #4209	2020-05-07 10:44:52 +02:00
Benny Halevy	b2f50224d9	table: database_sstable_write_monitor: revert charges in destructor We must unregister the monitor upon destruction to prevent use-after-free from `compaction_backlog_tracker::backlog` path. This is similar to ~compaction_read_monitor as implemented in commit `ca284174d0` Fixes #6385 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200506214419.569655-1-bhalevy@scylladb.com>	2020-05-07 10:39:39 +02:00
Nadav Har'El	0214f0ad60	main: really enable the "--start-native-transport" option In commit `da3bf20e71` we supposedly enabled support for Cassandra's "start_native_transport" option which can be set to 0 to run Scylla without listening on the CQL port. This can be useful, for example, if a user only want the DynamoDB or Redis APIs but not CQL. Unfortunately, the option was still marked "Unused", so it wasn't really enabled as a valid command line option. This patch fixes that, and documents the start_native_transport option in docs/protocols.md, where we document the different protocols, ports, and options to configure them. Fixes #6387. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200506174850.13616-1-nyh@scylladb.com>	2020-05-07 11:09:18 +03:00
Avi Kivity	2b0c317dec	test: lib: exception_utils: fix crash with fmt-6.2.0 fmt, the formatting library we use, detects types with conversion to std::string_view (and formats them as strings) and types that support operator<<(std::ostream, const T&) (and performs custom formatting on them). However, if <fmt/ostream.h>, the latter is not done. The problem happens with seastar::sstring, which implements both, and debug mode, which disables inlining. Some translation units do include <fmt/ostream.h>, and so generate code to do custom formatting. exception_utils.cc doesn't, and so generates code to format via string_view conversion. At link time, the compiler picks one of the generated functions and includes it in the final binary; it happened to pick one generated outside exception_utils.cc, using custom formatting. However, there is also code in fmt to encode which path fmt chose - string_view or custom. This code is constexpr and so is evaluated in exception_utils.cc. The result is that the function to perform formatting of seastar::sstring uses custom formatting, while the descriptor containing the method used says it is formatting via string_view. This is enough to cause a crash. The problem is limited to debug mode, since in other modes all this code is inlined, and so is consistent within the translation unit. We need a more general fix (hopefully in fmt), but for now a simple fix is to add the missing include. Ref https://github.com/fmtlib/fmt/issues/1662	2020-05-07 08:59:02 +03:00
Avi Kivity	6f1a8cfeea	Merge 'Use special partitioner for CDC Log' from Piotr " CDC has to create CDC streams that are co-located with corresponding BaseTable data. This is not always easy. Especially for small vnodes. This PR introduces new partitioner which allows us to easily find such stream ids that the stream belongs to a given vnode and shard. The idea is that a partitioner accepts only keys that are a blob composed of two int64 numbers. The first number is the token of the key. Tests: unit(dev), dtests(CDC) " * haaawk-cdc_partitioner: cdc:use CDCPartitioner for CDC Log dht: Add find_first_token_for_shard dht: use long_token in token::to_int64 cdc: add CDCPartitioner stream_id: add token_from_bytes static function i_partitioner: Stop distinguishing whether keys order is preserved	2020-05-06 20:29:27 +03:00
Piotr Jastrzebski	e3dd78b68f	cdc: Don't cache CDC Log table CDC writes are not expected to be read multiple times so it makes little sense to cache them. Moreover, CDC Log puts much bigger pressure on memory usage than Base Table because some updates to the Base Table override existing data while related CDC Log updates are always a new entry in a memtable. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-06 18:39:01 +02:00
Piotr Jastrzebski	38ede62a02	table: invalidate disabled cache on memtable flush table::update_cache has two branches of its logic. One when caching is enabled and the other when it's disabled. This patch adds unconditional cache invalidation to the second (disabled caching) branch. This is done for two purposes. First and foremost, it gives the guarantee that when we enable the cache later it will be in the right state and will be ready for usage. This is because any memtable flush that would logically invalidate the cache, actually physically does that too now. An additional benefit of this change is that disabled cache will be cleared during the next memtable flush that will happen after turning the switch off. Previously, the cache would also be emptied but it would take more time before all its elements are removed by eviction. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-06 18:39:01 +02:00
Piotr Jastrzebski	1a43849cd2	table: Add cache_enabled member function This function determines cache usage based both on table _config and dynamic schema information. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-06 18:39:01 +02:00
Piotr Jastrzebski	546dbf1fcc	cf_prop_defs: persist caching_options in schema Previously 'WITH CACHING =' was ignored both in CREATE TABLE and in ALTER TABLE statements. Now it will be persisted in schema so that it can be used later to control caching per table. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-06 18:38:37 +02:00
Piotr Jastrzebski	812dfd22bd	property_definitions: add get that returns variant Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-06 18:38:04 +02:00
Pavel Solodovnikov	1d3f9174c5	cql3: avoid using shared_ptr's in unrecognized_entity_exception Using shared_ptr's in `unrecognized_entity_exception` can lead to cross-cpu deletion of a pointer which will trigger an assert `_cpu == std::this_thread::get_id()' when shared_ptr is disposed. Copy `column_identifier` to the exception object and avoid using an instance of `cql3::relation`: just get a string representation from it since nothing more is used in associated exception handling code. Fixes: #6287 Tests: unit(dev, debug), dtest(lwt_destructive_ddl_test.py:LwtDestructiveDDLTest.test_rename_column) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200506155714.150497-1-pa.solodovnikov@scylladb.com>	2020-05-06 19:02:36 +03:00
Piotr Sarna	f48e414eab	db, view: remove duplicate entries from pending endpoints When generating view updates, an endpoint can appear both as a primary paired endpoint for the view update, and as a pending endpoint (due to range movements). In order not to generate the same update twice for the same endpoint, the paired endpoint is removed from the list of pending endpoints if present. Fixes #5459 Tests: unit(dev), dtest(TestMaterializedViews.add_dc_during_mv_insert_test)	2020-05-06 16:42:56 +03:00
Benny Halevy	682fb3acfd	api: storage_service: serialize true_snapshot_size Following up on `91b71a0b1a` We also need to serialize storage_service::true_snapshots_size with snapshot-modifying operations. It seems like it was assumed that get_snapshot_details is done under run_snapshot_list_operation, but the one called here is the table method, not the api::storage_service::get_snapshot_details. Fixes #5603 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200506115732.483966-1-bhalevy@scylladb.com>	2020-05-06 15:33:38 +03:00
Pavel Solodovnikov	b183530f2c	cql3: use lw_shared_ptr instead of shared_ptr for column_condition Both `cql3::column_condition` and `cql3::column_condition::raw` classes are marked as `final`: it's safe to use lw_shared_ptr instead of generic `seastar::shared_ptr`. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200428202249.82785-1-pa.solodovnikov@scylladb.com>	2020-05-06 13:11:07 +03:00
Nadav Har'El	ddb483461a	test/alternator: xfailing tests for FilterExpression feature This patch adds a comprehensive, hopefully complete, test for the yet-unimplemented FilterExpression feature. FilterExpression is the modern syntax which allows filtering the results of Query and Scan requests. The patch includes 50 tests spanning more than 700 lines of code, testing (hopefully) all the various FilterExpression features, sub-cases, syntax peculiarities, and so on. As usual, all included tests pass when run against DynamoDB ("pytest --aws") and xfail when run against Scylla. This test should be helpful to understand how to implement FilterExpression correctly, as well as test the future implementation. Refs #5038. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200503165639.15320-1-nyh@scylladb.com>	2020-05-06 12:56:20 +03:00
Botond Dénes	6de51db84a	tools: introduce scylla_types We often have to examine raw values, obtained from various sources, like sstables, logs and coredumps. For some types it is quite simple to convert raw hex values to human readable ones manually (integers), for others it is very hard or simply not practical. This command-line tool aims to ease working with raw values, by providing facilities to print them in human readable form and compare them. We can extend it with more functions as needed. Examples: $ scylla_types -a print -t Int32Type b34b62d4 -1286905132 $ scylla_types -a compare -t 'ReversedType(TimeUUIDType)' b34b62d46a8d11ea0000005000237906 d00819896f6b11ea00000000001c571b b34b62d4-6a8d-11ea-0000-005000237906 > d0081989-6f6b-11ea-0000-0000001c571b Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200505124914.104827-1-bdenes@scylladb.com>	2020-05-06 12:56:20 +03:00
Avi Kivity	bf2ab10b6a	Update seastar submodule * seastar 3c2e27811...e708d1df3 (10): > Merge "Fix a few issues found by clang's asan" from Rafael > seastar: app_template: allow a description to be provided for the app > membarrier: fix madvise(MADV_DONTNEED) failure and crash with --lock-memory Fixes #6346 > rpc::compressor: Fix static init fiasco with names > fair_queue: express all internal fair_queue quantities as fair_queue_tickets > net: remove API v1 compatibility layer (variadic future in networking) > testing: Move parts of the exchanger out of line > on_internal_error: add overload taking an std::exception_ptr > tuple_utils: Add a missing include > Merge "Fix use of uninitialized found by valgrind" from Rafael	2020-05-06 12:56:20 +03:00
Raphael S. Carvalho	a214ccdf89	sstables/compaction: Don't invalidate row cache when adding GC SSTable to SSTable set Garbage collected SSTable is incorrectly added to SSTable set with a function that invalidates row cache. This problem is fixed by adding GC SStable to set using mechanism which replaces old sstables with new sstables. Also, adding GC SSTable to set in a separate call is not correct. We should make sure that GC SSTable reaches the SSTable set at the same time its respective old (input) SSTable is removed from the set, and that's done using a single request call to table. Fixes #5956. Fixes #6275. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-05-05 12:03:19 -03:00
Raphael S. Carvalho	8f4458f1d5	sstables/compaction: Change meaning of compaction_completion_desc input and output fields input_sstables is renamed to old_sstables and is about old SSTables that should be deleted and removed from the SSTable set. output_sstables is renamed to new_sstables and is about new SSTable that should be added to the SSTable set, replacing the old ones. This will allow us, for example, to add auxiliary SSTables to SSTable set using the same call which replaces output SSTables by input SSTables in compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-05-05 12:03:08 -03:00
Raphael S. Carvalho	cc5e0d8da8	sstables/compaction: Clean up code around garbage_collected_sstable_writer This cleanup allows us to get rid of the ugly compaction::create_new_sstable(), and reduce complexity by getting rid of observable. garbage_collected_sstable_writer::data is introduced to allow compaction to directly communicate with the GC writer, which is stored in mutation_compaction, making it unreachable after the compaction has started. By making compaction store GC writer's data and using that same data to create g__c__s__w, compaction is able to communicate with GC writer without the complexity of observable utility. This move is important for the subsequent work which will fix a couple of issues regarding management of GC SSTables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-05-05 12:02:41 -03:00
Piotr Sarna	b8df958811	alternator: deduplicate logs on boot Alternator server used to print a startup log line for each shard, which is redundant and creates churn for nodes with many cores. Instead of all that, a single line is now printed once alternator server properly boots. Fixes #6347 Tests: manual(boot), unit(dev)	2020-05-05 16:19:18 +03:00
Gleb Natapov	4622c61a37	lwt: linearise reads Currently the following scenario may happen: Consider 3 nodes A, B and C and a LWT failed write operation that managed to get V accepted on A. The value is read twice. First read access B and C and returns nothing. Next one access A and B, notices failed round and completes it. Returns value V. Since two consequent reads without any writes in the middle return different value this breaks linearisability. This happens because read does not do full paxos round. The patch makes read code to reuse the same logic as write by writing a dummy value which ensures that complete paxos round is used.	2020-05-05 15:37:42 +03:00
Amnon Heiman	ee7b40e31b	api/storage_service.cc: Snapshot, support multiple tables It is sometimes useful to take a snapshot of multiple tables inside a keyspace. This patch add support for multiple tables names when taking a snapshot. The change consist of splitting the table (column family) name and use the array of table instead of just one. After this patch this will be supported: curl -X POST 'http://localhost:10000/storage_service/snapshots?tag=snapshottag&kn=system&cf=range_xfers,large_partitions' Fixes #6333 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-05-05 12:55:36 +03:00
Amnon Heiman	75e2a3b0e7	service/storage_service: Take snapshot of multiple tables This patch change the table snapshot implementation to support multiple tables. The method for taking a snapshot using a single table was modified to use the new implementation. To support multiple tables, the method now takes a vector of tables and it loops over it. Relates to #6333 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-05-05 12:55:16 +03:00
Gleb Natapov	0c2db6f42d	lwt: linearise unmet condition operations Currently the following scenario may happen: Consider 3 nodes A, B and C and a LWT failed write operation that managed to get V accepted on A. Next operation may be conditioned on a value been V, but it may access nodes B and C first and fail. Retrying the same operation without any writes in the middle may now access A and B and succeed since it will notice V and will complete previous transaction. Having to different outcome for the same operation without any writes in the middle breaks linearisability. This happens because when condition is unmet we abandon the paxos round, so this patch makes us complete it with empty value. Now if first conditional write after failure access B and C it will write accepted ballot there with the value greater than one of V and V will no longer be replayed ever.	2020-05-05 12:38:31 +03:00
Gleb Natapov	0fed86e4c6	lwt: change cas_request::apply signature Change the way query result is passed from getting a reference to a result to getting a foreign_ptr<lw_shared_ptr<query::result>>. This will allow cas_request to keep it without copying.	2020-05-05 12:38:23 +03:00
Benny Halevy	580d397d2e	test: database_test: do_with_some_data: retain tmpdir for test duration Currently, the test seems to use the tmpdir class in a wrong way, just to get a path to a temporary directory. It should keep the tmpdir object around for the duration of the test so the temporary directory will be automatically removed when the test completes. Refs #6344 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200504153810.202218-1-bhalevy@scylladb.com>	2020-05-05 11:37:18 +03:00
Piotr Jastrzebski	0475dab359	feature: add PER_TABLE_CACHING feature This feature will ensure that caching can be switched off per table only after the whole cluster supports it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-05 08:14:49 +02:00
Piotr Jastrzebski	2d727114ed	caching_options: add enabled parameter Scylla inherits from Origin two caching parameters (keys and rows_per_partition) that are ignored. This patch adds a new parameter called "enabled" which is true by default and controls whether cache is used for a selected table or not. If the parameter is missing in the map then it has the default value of true. To minimize the impact of this change, enabled == true is represented as an absence of this parameter. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-05 08:14:49 +02:00
Piotr Sarna	1c4e8f5030	alternator: fix checking max item depth Maximum item depth accepted by DynamoDB is 32, and alternator chose 39 as its arbitrary value in order to provide 7 shining new levels absolutely free of charge. Unfortunately, our code which checks the nesting level in rapidjson parsing bumps the counter by 2 for every object, which is due to rapidjson's internal implementation. In order to actually support at least 32 levels, the threshold is simply doubled. This commit comes with a test case which ensures that 32-nested items are accepted both by alternator and DynamoDB. The test case failed for alternator before the fix. Fixes #6366 Tests: unit(dev), alternator(local, remote)	2020-05-04 23:46:20 +03:00
Glauber Costa	c5cdd77f8e	gossip_test: start the compaction manager explicitly Right now the compaction_manager needs to be started explicitly. We may change it in the future, but right now that's how it is. Everything works now even without it, because compaction_manager::stop happens to work even if it was not started. But it is technically illegal. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200504143048.17201-1-glauber@scylladb.com>	2020-05-04 17:40:32 +03:00
Bentsi Magidovich	e77dad3adf	scylla_coredump_setup: Fix incorrect coredump directory mount The issue is that the mount is /var/lib/scylla/coredump -> /var/lib/systemd/coredump. But we need to do the opposite in order to save the coredump on the partition that Scylla is using: /var/lib/systemd/coredump-> /var/lib/scylla/coredump Fixes #6301	2020-05-04 15:47:45 +03:00
Avi Kivity	f3bcd4d205	Merge 'Support SSL Certificate Hot Reloading' from Calle " Fixes #6067 Makes the scylla endpoint initializations that support TLS use reloadable certificate stores, watching used cert + key files for changes, and reload iff modified. Tests in separate dtest set. " * elcallio-calle/reloadable-tls: transport: Use reloadable tls certificates redis: Use reloadable tls certificates alternator: Use reloadable tls certificates messaging_service: Use reloadable TLS certificates	2020-05-04 15:11:16 +03:00
Piotr Sarna	bec95a0605	treewide: use thread-safe variant of localtime In order to ensure thread-safety, all usages of localtime() are replaced with localtime_r(), which may accept a local buffer. Tests: unit(dev) Fixes #6364 Message-Id: <ad4a0c0e1707f0318325718715a3a647e3ebfdfe.1588592156.git.sarna@scylladb.com>	2020-05-04 14:46:08 +03:00
Calle Wilund	70aca26a3e	transport: Use reloadable tls certificates	2020-05-04 11:32:21 +00:00
Calle Wilund	bacf2fa981	redis: Use reloadable tls certificates	2020-05-04 11:32:21 +00:00
Calle Wilund	cc9bb6454c	alternator: Use reloadable tls certificates	2020-05-04 11:32:21 +00:00
Calle Wilund	08d069f78d	messaging_service: Use reloadable TLS certificates Changes messaging service rpc to use reloadable tls certificates iff tls is enabled- Note that this means that the service cannot start listening at construction time if TLS is active, and user need to call start_listen_ex to initialize and actually start the service. Since "normal" messaging service is actually started from gms, this route too is made a continuation.	2020-05-04 11:32:21 +00:00
Piotr Sarna	fb7fa7f442	alternator: fix signature timestamps Generating timestamps for auth signatures used a non-thread-safe ::gmtime function instead of thread-safe ::gmtime_r. Tests: unit(dev) Fixes #6345	2020-05-04 14:12:11 +03:00
Piotr Sarna	05ec95134a	clocks-impl: switch to thread-safe time conversion std::gmtime() has a sad property of using a global static buffer for returning its value. This is not thread-safe, so its usage is replaced with gmtime_r, which can accept a local buffer. While no regressions where observed in this particular area of code, a similar bug caused failures in alternator, so it's better to simply replace all std::gmtime calls with their thread-safe counterpart. Message-Id: <39e91c74de95f8313e6bb0b12114bf12c0e79519.1588589151.git.sarna@scylladb.com>	2020-05-04 14:11:38 +03:00
Takuya ASADA	57f3f82ed1	redis: add EX option for set command Add EX option for SET command, to set TTL for the key. A behavior of SET EX is same as SETEX command, it just different syntax. see: https://redis.io/commands/set	2020-05-04 13:58:18 +03:00
Eliran Sinvani	a346e862c1	Auth: return correct error code when role is not found Scylla returns the wrong error code (0000 - server internal error) in response to trying to do authentication/authorization operations that involves a non-existing role. This commit changes those cases to return error code 2200 (invalid query) which is the correct one and also the one that Cassandra returns. Tests: Unit tests (Dev) All auth and auth_role dtests	2020-05-04 12:57:27 +03:00
Glauber Costa	55f5ca39a9	sstable_test: rework test to use a thread The compaction_manager test lives inside a thread and it is not taking advantage of it, with continuations all over. One of the side effects of it is that the test is calling stop() twice on the compaction_manager. While this works today, it is not good practice. A change I am making is just about to break it. This patch converts the test to fully use .get() instead of chained continuations and in doing so also guarantees that the compaction manager will be RAII-stopped just one, from a defer object. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200503161420.8346-2-glauber@scylladb.com>	2020-05-03 19:54:04 +03:00
Piotr Sarna	bf5f247bc5	db: set gc grace period to 0 for local system tables Local system tables from `system` namespace use LocalStrategy replication, so they do not need to be concerned about gc grace period. Some system tables already set gc grace period to 0, but other ones, including system.large_partitions, did not. That may result in millions of tombstones being needlessly kept for these tables, which can cause read timeouts. Fixes #6325 Tests: unit(dev), local(running cqlsh and playing with system tables)	2020-05-03 17:41:50 +03:00
Avi Kivity	9952cdfec1	Merge "scylla-gdb.py: improve finding references to intrusive container elements" from Botond " Intrusive containers often have references between containers elements that point to some non-first word of the element. This references currently fly below the radar of `scylla find` and `scylla generate-object-graph`, as they are looking to references to only the first word of the objects. So objects that are members of an intrusive container often appear to have no inbound references at all. This patch-set improves support for finding such references by looking for references to non-first words of objects. It also includes some generic, minor improvements to scylla generate_object_graph. " * 'scylla-gdb.py-scylla-generate-object-graph-linked-lists/v1' of https://github.com/denesb/scylla: scylla-gdb.py: scylla generate_object_graph: make label of initial vertice bold scylla-gdb.py: scylla generate_object_graph: remove redundant lookup scylla-gdb.py: scylla generate_object_graph: print "to" offsets scylla-gdb.py: scylla generate-object-graph: use value-range to find references scylla-gdb.py: scylla find: allow finding ranges of values scylla-gdb.py: find_in_live(): return pointer_metadata instances	2020-05-03 16:22:22 +03:00
Glauber Costa	70e5252a5d	table: no longer accept online loading of SSTable files in the main directory Loading SSTables from the main directory is possible, to be compatible with Cassandra, but extremely dangerous and not recommended. From the beginning, we recommend using an separate, upload/ directory. In all this time, perhaps due to how the feature's usefulness is reduced in Cassandra due to the possible races, I have never seen anyone coming from Cassandra doing procedures involving refresh at all. Loading SSTables from the main directory forces us to disable writes to the table temporarily until the SSTables are sorted out. If we get rid of this, we can get rid of the disabling of the writes as well. We can't do it now because if we want to be nice to the odd user that may be using refresh through the main directory without our knowledge we should at least error out. This patch, then, does that: it errors out if SSTables are found in the main directory. It will not proceed with the refresh, and direct the user to the upload directory. The main loop in reshuffle_sstables is left in place structurally for now, but most of it is gone. The test for is is deleted. After a period of deprecation we can start ignoring these SSTables and get rid of the lock. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200429144511.13681-1-glauber@scylladb.com>	2020-05-03 08:40:38 +03:00
Glauber Costa	e44b2826ab	compaction: avoid abandoned futures when using interposers When using interposers, cancelling compactions can leave futures that are not waited for (resharding, twcs) The reason is when consume_end_of_stream gets called, it tries to push end_of_stream into the queue_reader_handle. Because cancelling a compaction is done through an exception, the queue_reader_handle is terminated already at this time. Trying to push to it generates another exception and prevents us from returning the future right below it. This patch adds a new method is_terminated() and if we detect that the queue_reader_handle is already terminated by this point, we don't try to push. We call it is_terminated() because the check is to see if the queue_reader_handle has a _reader. The reader is also set to null on successful destruction. Signed-off-by: Glauber Costa <glauber@scylladb.com> Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200430175839.8292-1-glauber@scylladb.com>	2020-05-01 16:30:23 +03:00
Avi Kivity	122f57871d	Update seastar submodule * seastar 0523b0fac...3c2e27811 (2): > future: Add a futurizer::satisfy_with_result_of > future: Move concept definitions earlier	2020-05-01 12:55:48 +03:00
Tomasz Grabiec	d78fbf7c16	Merge "storage_service: Make replacing node take writes" from Asias Background: Replace operation is used to replace a dead node in the cluster. Currently during replace operation, the replacing node does not take any writes. As a result, new writes to a range after the sync for that range is done, e.g., after streaming for that range is finished, will not be synced to the replacing node. Hinted hand off or repair after the replacing operation will help. But it is better if we can make the writes to the replacing node to avoid any post replacing operation actions. After this series and repair based node operation series, the replace operation will guarantee the replacing node has all the latest copy of data including the new writes during the replace operation. In short, no more repairs before or after the replacing operation. Just replacing the node is enough. Implementation: Filter the node being replaced out of the natural endpoints in storage_proxy, so that: The node being replaced will not be selected as the target for normal write or normal read. Do not depend on the gossip liveness to avoid selecting replacing node for normal write or normal read when the replacing node has the same ip address as the node being replaced. No more special handling for hibernate state in gossip which makes it is simpler and more robust. Replacing node will be marked as UP. Put the replacing node in the pending list, so that: Replacing node will take writes but write to replacing will not be counted as CL. Replacing node will not take normal read. Example: For example, with RF = 3, n1, n2, n3 in the cluster, n3 is dead and being replaced by node n4. When n4 starts: writes to nodes {n1, n2, n3} are changed to normal_replica_writes = {n1, n2} and pending_replica_writes= {n4}. reads to nodes {n1, n2, n3} are changed to normal_replica_reads = {n1, n2} only. This way, the replacing node n4 now takes writes but does not take reads. Tests: Measure the number of writes during pending period that is the replacing starts and finishes the replace operation. Start 5 nodes, n1 to n5. Stop n5 Start write in the background Start n6 to replace n5 Get scylla_database_total_writes metrics when the replacing node announces HIBERNATE (replacing) and NORMAL status. Before: 2020-02-06 08:35:35.921837 Get metrics when other knows replacing node = HIBERNATE 2020-02-06 08:35:35.939493 scylla_database_total_writes: node1={'scylla_database_total_writes': 15483} 2020-02-06 08:35:35.950614 scylla_database_total_writes: node2={'scylla_database_total_writes': 15857} 2020-02-06 08:35:35.961820 scylla_database_total_writes: node3={'scylla_database_total_writes': 16195} 2020-02-06 08:35:35.978427 scylla_database_total_writes: node4={'scylla_database_total_writes': 15764} 2020-02-06 08:35:35.992580 scylla_database_total_writes: node6={'scylla_database_total_writes': 331} 2020-02-06 08:36:49.794790 Get metrics when other knows replacing node = NORMAL 2020-02-06 08:36:49.809189 scylla_database_total_writes: node1={'scylla_database_total_writes': 267088} 2020-02-06 08:36:49.823302 scylla_database_total_writes: node2={'scylla_database_total_writes': 272352} 2020-02-06 08:36:49.837228 scylla_database_total_writes: node3={'scylla_database_total_writes': 274004} 2020-02-06 08:36:49.851104 scylla_database_total_writes: node4={'scylla_database_total_writes': 262972} 2020-02-06 08:36:49.862504 scylla_database_total_writes: node6={'scylla_database_total_writes': 513} Writes = 513 - 331 After: 2020-02-06 08:28:56.548047 Get metrics when other knows replacing node = HIBERNATE 2020-02-06 08:28:56.560813 scylla_database_total_writes: node1={'scylla_database_total_writes': 290886} 2020-02-06 08:28:56.573925 scylla_database_total_writes: node2={'scylla_database_total_writes': 310304} 2020-02-06 08:28:56.586305 scylla_database_total_writes: node3={'scylla_database_total_writes': 304049} 2020-02-06 08:28:56.601464 scylla_database_total_writes: node4={'scylla_database_total_writes': 303770} 2020-02-06 08:28:56.615066 scylla_database_total_writes: node6={'scylla_database_total_writes': 604} 2020-02-06 08:29:10.537016 Get metrics when other knows replacing node = NORMAL 2020-02-06 08:29:10.553257 scylla_database_total_writes: node1={'scylla_database_total_writes': 336126} 2020-02-06 08:29:10.567181 scylla_database_total_writes: node2={'scylla_database_total_writes': 358549} 2020-02-06 08:29:10.581939 scylla_database_total_writes: node3={'scylla_database_total_writes': 351416} 2020-02-06 08:29:10.595567 scylla_database_total_writes: node4={'scylla_database_total_writes': 350580} 2020-02-06 08:29:10.610548 scylla_database_total_writes: node6={'scylla_database_total_writes': 45460} Writes = 45460 - 604 As we can see the replacing node did not take write before and take write after the patch. Check log of writer handler in storage_proxy storage_proxy - creating write handler for token: -2642068240672386521, keyspace_name=ks, original_natrual={127.0.0.1, 127.0.0.5, 127.0.0.2}, natural={127.0.0.1, 127.0.0.2}, pending={127.0.0.6} The node being replaced, n5=127.0.0.5, is filtered out and the replacing node, n6=127.0.0.6 is in the pending list. * asias/replace_take_writes: storage_service: Make replacing node take writes repair: Use token_metadata with the replacing node in do_rebuild_replace_with_repair abstract_replication_strategy: Add get_ranges which takes token_metadata abstract_replication_strategy: Add get_natural_endpoints_without_node_being_replaced abstract_replication_strategy: Add allow_remove_node_being_replaced_from_natural_endpoints token_metadata: Calculate pending ranges for replacing node storage_service: Unify handling of replaced node removal from gossip storage_service: Update tokens and replace address for replace operation	2020-04-30 19:28:35 +02:00
Pavel Emelyanov	513ce1e6a5	storage_proxy_stats: Make get_ep_stat() noexcept The .get_ep_stat(ep) call can throw when registering metrics (we have issue for it, #5697). This is not expected by it callers, in particular abstract_write_response_handler::timeout_cb breaks in the middle and doesn't call the on_timeout() and the _proxy->remove_response_handler(), which results in not removed and not released responce handler. In turn not released response handler doesn't set the _ready future on which response_wait() waits -> stuck. Although the issue with .get_ep_stat() should be fixed, an exception in it mustn't lead to deadlocks, so the fix is to make the get_ep_stat() noexcept by catching the exception and returning a dummy stat object instead to let caller(s) finish. Fixes #5985 Tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200430163639.5242-1-xemul@scylladb.com>	2020-04-30 19:40:08 +03:00
Avi Kivity	88224619b6	Update seastar submodule * seastar d0cbf7d1e8...0523b0fac4 (1): > Merge "Fix issues found by valgrind" from Rafael	2020-04-30 19:20:37 +03:00
Asias He	b8ac10c451	config: Do not enable repair based node operations by default Give it some more time to mature. Use the old stream plan based node operations by default. Fixes: #6305 Backports: 4.0	2020-04-30 12:37:24 +03:00
Avi Kivity	8925e00e96	Merge 'Fix hang in multishard_writer' from Asias " This series fix hang in multishard_writer when error happens. It contains - multishard_writer: Abort the queue attached to consumers when producer fails - repair: Fix hang when the writer is dead Fixes #6241 Refs: #6248 " * asias-stream_fix_multishard_writer_hang: repair: Fix hang when the writer is dead mutation_writer_test: Add test_multishard_writer_producer_aborts multishard_writer: Abort the queue attached to consumers when producer fails	2020-04-30 12:27:55 +03:00
Avi Kivity	280854ab46	Merge " Avoid use-after-free of sstable writer" from Rafael " The backlog_controller has a timer that periodically accesses the sstable writers of ongoing writes. This patch series makes sure we remove entries from the list of ongoing writes before the corresponding sstable writer is destroyed. Fixes #6221. " * 'espindola/fix-6221-v5' of https://github.com/espindola/scylla: sstables: Call revert_charges in compaction_write_monitor::write_failed sstables: Call monitor->write_failed earlier. sstables: Add write_failed to the write_monitor interface	2020-04-30 12:21:27 +03:00
Pekka Enberg	5c6265d14b	Merge 'redis: add setex and ttl commands' from Takuya "Enabling TTL feature, add setex and ttl commands to use it." * 'redis_setex_ttl' of git://github.com/syuu1228/scylla: redis: add test for setex/ttl redis: add ttl command redis: add setex command	2020-04-30 09:39:48 +03:00
Pekka Enberg	d4c0d80f13	Merge 'redis: add lolwut test' from Takuya "Add test for lolwut command, and also fix a bug on lolwut found by the test." * 'redis_lolwut_test' of git://github.com/syuu1228/scylla: redis: lolwut parameter fix redis-test: add lolwut test	2020-04-30 09:30:43 +03:00
Piotr Sarna	c7c8bd0978	Update seastar submodule * seastar 8fae03c2...d0cbf7d1 (6): > tests: restore compatibility with C++14 (broken due to std::filesystem) > http: make headers case-insensitive > on_internal_error: add scoped_no_abort_on_internal_error > Merge "make when_all functions noexcept" from Benny > chunked_fifo: fix underflow in reserve() > doc: document compatibility promises Fixes #6319	2020-04-30 07:29:23 +02:00
Asias He	7d86a3b208	storage_service: Make replacing node take writes Background: Replace operation is used to replace a dead node in the cluster. Currently during replace operation, the replacing node does not take any writes. As a result, new writes to a range after the sync for that range is done, e.g., after streaming for that range is finished, will not be synced to the replacing node. Hinted hand off or repair after the replacing operation will help. But it is better if we can make the writes to the replacing node to avoid any post replacing operation actions. After this series and repair based node operation series, the replace operation will guarantee the replacing node has all the latest copy of data including the new writes during the replace operation. In short, no more repairs before or after the replacing operation. Just replacing the node is enough. Implementation: 1) Filter the node being replaced out of the natural endpoints in storage_proxy, so that: - The node being replaced will not be selected as the target for normal write or normal read. - Do not depend on the gossip liveness to avoid selecting replacing node for normal write or normal read when the replacing node has the same ip address as the node being replaced. No more special handling for hibernate state in gossip which makes it is simpler and more robust. Replacing node will be marked as UP. 2) Put the replacing node in the pending list, so that: - Replacing node will take writes but write to replacing will not be counted as CL. - Replacing node will not take normal read. Example: For example, with RF = 3, n1, n2, n3 in the cluster, n3 is dead and being replaced by node n4. When n4 starts: - writes to nodes {n1, n2, n3} are changed to normal_replica_writes = {n1, n2} and pending_replica_writes= {n4}. - reads to nodes {n1, n2, n3} are changed to normal_replica_reads = {n1, n2} only. This way, the replacing node n4 now takes writes but does not take reads. Tests: 1) Measure the number of writes during pending period that is the replacing starts and finishes the replace operation. - Start 5 nodes, n1 to n5. - Stop n5 - Start write in the background - Start n6 to replace n5 - Get scylla_database_total_writes metrics when the replacing node announces HIBERNATE (replacing) and NORMAL status. Before: 2020-02-06 08:35:35.921837 Get metrics when other knows replacing node = HIBERNATE 2020-02-06 08:35:35.939493 scylla_database_total_writes: node1={'scylla_database_total_writes': 15483} 2020-02-06 08:35:35.950614 scylla_database_total_writes: node2={'scylla_database_total_writes': 15857} 2020-02-06 08:35:35.961820 scylla_database_total_writes: node3={'scylla_database_total_writes': 16195} 2020-02-06 08:35:35.978427 scylla_database_total_writes: node4={'scylla_database_total_writes': 15764} 2020-02-06 08:35:35.992580 scylla_database_total_writes: node6={'scylla_database_total_writes': 331} 2020-02-06 08:36:49.794790 Get metrics when other knows replacing node = NORMAL 2020-02-06 08:36:49.809189 scylla_database_total_writes: node1={'scylla_database_total_writes': 267088} 2020-02-06 08:36:49.823302 scylla_database_total_writes: node2={'scylla_database_total_writes': 272352} 2020-02-06 08:36:49.837228 scylla_database_total_writes: node3={'scylla_database_total_writes': 274004} 2020-02-06 08:36:49.851104 scylla_database_total_writes: node4={'scylla_database_total_writes': 262972} 2020-02-06 08:36:49.862504 scylla_database_total_writes: node6={'scylla_database_total_writes': 513} Writes = 513 - 331 After: 2020-02-06 08:28:56.548047 Get metrics when other knows replacing node = HIBERNATE 2020-02-06 08:28:56.560813 scylla_database_total_writes: node1={'scylla_database_total_writes': 290886} 2020-02-06 08:28:56.573925 scylla_database_total_writes: node2={'scylla_database_total_writes': 310304} 2020-02-06 08:28:56.586305 scylla_database_total_writes: node3={'scylla_database_total_writes': 304049} 2020-02-06 08:28:56.601464 scylla_database_total_writes: node4={'scylla_database_total_writes': 303770} 2020-02-06 08:28:56.615066 scylla_database_total_writes: node6={'scylla_database_total_writes': 604} 2020-02-06 08:29:10.537016 Get metrics when other knows replacing node = NORMAL 2020-02-06 08:29:10.553257 scylla_database_total_writes: node1={'scylla_database_total_writes': 336126} 2020-02-06 08:29:10.567181 scylla_database_total_writes: node2={'scylla_database_total_writes': 358549} 2020-02-06 08:29:10.581939 scylla_database_total_writes: node3={'scylla_database_total_writes': 351416} 2020-02-06 08:29:10.595567 scylla_database_total_writes: node4={'scylla_database_total_writes': 350580} 2020-02-06 08:29:10.610548 scylla_database_total_writes: node6={'scylla_database_total_writes': 45460} Writes = 45460 - 604 As we can see the replacing node did not take write before and take write after the patch. 2) Check log of writer handler in storage_proxy storage_proxy - creating write handler for token: -2642068240672386521, keyspace_name=ks, original_natrual={127.0.0.1, 127.0.0.5, 127.0.0.2}, natural={127.0.0.1, 127.0.0.2}, pending={127.0.0.6} The node being replaced, n5=127.0.0.5, is filtered out and the replacing node, n6=127.0.0.6 is in the pending list. Fixes: #5482	2020-04-30 10:22:30 +08:00
Asias He	e3fbc8fba1	repair: Use token_metadata with the replacing node in do_rebuild_replace_with_repair We will change the update of tokens in token_metadata in the next patch so that the tokens of the replacing node are updated to token_metadata only after the replace operation is done. In order to get the correct ranges for the replacing node in do_rebuild_replace_with_repair, we need to use a copy of token_metadata contains the tokens of the replacing node. Refs: #5482	2020-04-30 10:22:30 +08:00
Asias He	b640614aa6	abstract_replication_strategy: Add get_ranges which takes token_metadata It is useful when the caller wants to calculate ranges using a custom token_metadata. It will be used soon in do_rebuild_replace_with_repair for replace operation. Refs: #5482	2020-04-30 10:22:30 +08:00
Asias He	37d3d3e051	abstract_replication_strategy: Add get_natural_endpoints_without_node_being_replaced Similar to natural_endpoints but with the node being replaced filtered out. Refs: #5482	2020-04-30 10:22:30 +08:00
Asias He	1a75a60cfc	abstract_replication_strategy: Add allow_remove_node_being_replaced_from_natural_endpoints Decide if the replication strategy allow removing the node being replaced from the natural endpoints when a node is being replaced in the cluster. LocalStrategy is the not allowed to do so because it always returns the node itself as the natural_endpoints and the node will not appear in the pending_endpoints. It is needed by the "Make replacing node take writes" work. Refs: #5482	2020-04-30 10:22:30 +08:00
Pekka Enberg	eac9e253e7	sstables: Fix open-coded version parsing in make_descriptor() The make_descriptor() function parses a string representation of sstable version using a ternary operator. Clean it up by using sstables::from_string(), which is future-proof when we add support for later sstable formats. Message-Id: <20200429082126.15944-1-penberg@scylladb.com>	2020-04-29 16:25:12 +02:00
Asias He	bd6691301e	token_metadata: Calculate pending ranges for replacing node It will be needed soon for making replace node take writes. Refs: #5482	2020-04-29 16:02:10 +08:00
Asias He	75cf1d18b5	storage_service: Unify handling of replaced node removal from gossip Currently, after the replacing node finishes the replace operation, it removes the node being replaced from gossip directly in storage_service::join_token_ring() with gossiper::replaced_endpoint(), so the gossip states for the replaced node is gone. When other nodes knows the replace operation is done, they will call storage_service::remove_endpoint() and gossiper::remove_endpoint() to quarantine the node but keep the gossip states. To prevent the replacing node learns the state of replaced node again from existing node again, the replacing node uses 2X quarantine time. This makes the gossip states for the replaced node different on other nodes and replacing nodes. It makes it is harder to reason about the gossip states because the discrepancy of the states between nodes. To fix, we unify the handling of replaced node on both replacing node and other nodes. On all the nodes, once the replacing node becomes NORMAL status, we remove the replaced node from token_metadata and quarantine it but keep the gossip state. Since the replaced node is no longer a member of the cluster, the fatclient timer will count and expire and remove the replaced node from gossip. Refs: #5482	2020-04-29 16:02:10 +08:00
Asias He	66c1907524	storage_service: Update tokens and replace address for replace operation The motivation is to make the replacing node has the same view of the token ring as the rest of the cluster. If the replacing node has the same ip of the node being replaced, we should update the tokens in token_metadata when the replace operation starts, so that this replacing node and the rest of the cluster see the same token ring. If the replacing node has the different ip address of the node being replaced, we should update the tokens in token_metadata only when replace operation is done, because the other nodes will update the replacing node's token in token_metadata when the replace operation is done. Refs: #5482	2020-04-29 16:02:00 +08:00
Nadav Har'El	ff5615d59d	alternator test: drastically reduce time to boot Scylla The alternator test, test/alternator/run, runs Scylla and runs the various tests against it. Before this patch, just booting Scylla took about 26 seconds (for a dev build, on my laptop). This patch reduces this delay to less than one second! It turns out that almost the entire delay was artificial, two periods of 12 seconds "waiting for the gossip to settle", which are completely unnecessary in the one-node cluster used in the Alternator test. So a simple "--skip-wait-for-gossip-to-settle 0" parameter eliminates these long delays completely. Amusingly, the Scylla boot is now so fast, that I had to change a "sleep 2" in the test script to "sleep 1", because 2 seconds is now much more than it takes to boot Scylla :-) Fixes #6310. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200428145035.22894-1-nyh@scylladb.com>	2020-04-29 07:55:03 +02:00
Benny Halevy	3b31acfa80	exceptions: drop OVERFLOW_ERROR cql binary protocol extension Client drivers act differently on errors codes they don't recognize. Adding new errors codes is considered a protocol extension and should be negotiated with the client. This change keeps `overflow_error_exception` internally but uses the INVALID cql error code to return the error message back to the client similar to keyspace_not_defined_exception. We (and cassandra) already use `invalid_request_exception` extensively to return various errors related to invalid values or types in the query. Fixes #6264 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Gleb Natapov <gleb@scylladb.com> Message-Id: <20200422130011.108003-1-bhalevy@scylladb.com>	2020-04-28 12:16:00 +03:00
Piotr Sarna	09e4f3b917	alternator: implement ScanIndexForward The ScanIndexForward parameter is now fully implemented and can accept ScanIndexForward=false in order to query the partitions in reverse clustering order. Note that reading partition slices in reverse order is less efficient than forward scans and may put a strain on memory usage, especially for large partitions, since the whole partition is currently fetched in order to be reversed. Fixes #5153	2020-04-28 11:44:46 +03:00
Piotr Sarna	be5d3f4733	Merge 'A bunch of refactors in versioned_value and gossiper' from Kamil 1. Remove the `versioned_value::factory` class, it didn't add any value. It just forced us to create an object for making `versioned_value`s, for no sensible reason. 2. Move some `versioned_value` deserialization code (string -> internal data structures) into the versioned_value module. Previously, it was scattered all around the place. 3. Make `gossiper::get_seeds` const and return a const reference. I needed these refactors for a PR I was preparing to fix an issue with CDC. The attempt of fixing the issue failed (I'm trying something different now), but the refactors might be useful anyway. * kbr--vv-refactor: gossiper: make `get_seeds` method const and return a const ref versioned_value: remove versioned_value::factory class gms: move TOKENS string deserialization code into versioned_value	2020-04-28 10:27:45 +02:00
Pavel Solodovnikov	ed7a7554b8	storage_proxy: allow cas() to accept nullptr read_command This patch allows users of storage_proxy::cas() to supply nullptr as `query::read_command` which is supposed to skip the procedure of reading the existing value. The feature is used in alternator code for Read-Modify-Write operations: some of them don't require reading previous item values before updating. Move `read_nothing_read_command` from alternator code to storage_proxy layer and fabricate a new no-op command from it when storage_proxy::cas() is used with nullptr read_command. This allows to avoid sprinkling if-else branches all over the code in order to check for null-equality of `cmd`. We return from storage_proxy::query() very early with an empty result in case we're given an empty partition_slice (which resides inside the passed `read_command`) so this approach should be perfectly fine. Expand documentation for the `cas()` function to cover new possible value for `cmd` argument. Fixes: #6238 Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200428065235.5714-1-pa.solodovnikov@scylladb.com>	2020-04-28 10:44:19 +03:00
Asias He	35c5ef78b9	repair: Fix hang when the writer is dead Consdier: When repair master gets data from repair follower: 1) apply_rows_on_master_in_thread is called 2) a repair writer is created with _repair_writer.create_writer 3) the repair writer fails 4) data is written to the queue _mq[node_idx]->push_eventually attached with the writer Since the writer is dead. No one is going to fetch data from the _mq queue. The apply_rows_on_master_in_thread will block forever. To fix, when the writer is failed, we should abort the _mq queue. Refs: #6248	2020-04-28 12:14:32 +08:00
Raphael S. Carvalho	02e046608f	api/service: fix segfault when taking a snapshot without keyspace specified If no keyspace is specified when taking snapshot, there will be a segfault because keynames is unconditionally dereferenced. Let's return an error because a keyspace must be specified when column families are specified. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com>	2020-04-27 23:37:00 +03:00
Pekka Enberg	3a10bddd7d	configure.py: Add '--with-seastar' option This patch adds a '--with-seastar=<PATH>' option to configure.py, which allows user to override the default seastar submodule path. This is useful when building packages from source tarballs, for example. Message-Id: <20200427165511.6448-1-penberg@scylladb.com>	2020-04-27 20:01:35 +03:00
Rafael Ávila de Espíndola	c7d74a59f5	sstables: Call revert_charges in compaction_write_monitor::write_failed We still call it in the destructor or to cover the successful case. We can't do that in on_data_write_completed because it is too early. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-04-27 08:58:31 -07:00
Rafael Ávila de Espíndola	95ee54f3cc	sstables: Call monitor->write_failed earlier. A writer is destroyed just before consume_in_thread returns, since the adapter takes ownership of it. The problem is that a monitor can keep a reference to the a writer_offset_tracker that is owned by that writer. The monitor is accessed periodically via backlog_controller::_update_timer. This means we have to deregister from the list of ongoing writes before the writer is destroyed. If the write fails, the deregistration happens in write_failed, but it is currently called after the writer is destroyed. This patch moves the call to write_failed to the writer destructor as I could not find a convenient location to put it. Since the writer is destroyed in consume_in_thread, we could call it there, but then we also have to update consume. The is a similar problem with the case where the sstable is written correctly. That will be fixed in the next patch. Fixes #6221. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-04-27 08:58:31 -07:00
Rafael Ávila de Espíndola	95acfd1d58	sstables: Add write_failed to the write_monitor interface Only database_sstable_write_monitor needs it so far, but the call needs to be moved earlier, which requires calling it in code paths that don't know about database_sstable_write_monitor. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-04-27 08:58:31 -07:00
Raphael S. Carvalho	5ac0d31323	test: perf_simple_query: fix test with smp count > 1 that code doesn't run under a thread, so let's futurize it. the code worked with single cpu because get() returns right away due to no deferring point. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200427155303.82763-1-raphaelsc@scylladb.com>	2020-04-27 18:58:25 +03:00
Pavel Emelyanov	108a944e7b	ring_position_ext: Add formatter It's not currently used, but helped when debugging reworked row cache lookups. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200427144712.14794-1-xemul@scylladb.com>	2020-04-27 18:01:01 +03:00
Avi Kivity	50e82f523a	Update seastar submodule * seastar 37a22d9de6...8fae03c22d (5): > Merge "Reloadable TLS certificates" from Calle > future: improve variadic future warning > io_queue: deprecated request tracking > test: futures_test: adjust to make_ready_future noexcept > future: specify make_ready_future as noexcept	2020-04-27 16:35:05 +03:00
Nadav Har'El	858a12755b	test.py: run Alternator test with the correct Scylla binary The Alternator test's run script, test/alternator/run, runs Scylla. By default, it chooses the last built Scylla executable build/*/scylla. However, test.py has a "mode" option, that should be able to choose which build mode to run. Before this patch, this mode option wasn't honored by the Alternator test, so a "test.py alternator/run" would run the same Scylla binary (the one last built) three times, instead of running each of the three build modes. We fix this in this patch: test.py now passes the "SCYLLA" environment variable to the test/alternator/run script, indicating the location of the Scylla binary with the appropriate build mode. The script already supported this environment variable to override its default choice of Scylla binary. In test.py, we add to the run_test() function an optional "env" parameter which can be used to pass additional environment variables to the test. Fixes #6286 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200427131958.28248-1-nyh@scylladb.com>	2020-04-27 16:23:58 +03:00
Pekka Enberg	fad6712673	dbuild: Improve error message if Docker is not installed If you run "dbuild" on a freshly installed machine, the error message is not the most helpful one. Fix it up. Before: $ ./tools/toolchain/dbuild ./tools/toolchain/dbuild: line 113: docker: command not found ./tools/toolchain/dbuild: line 156: docker: command not found After: $ ./tools/toolchain/dbuild dbuild: Please install Docker on this machine to run dbuild. Run `./tools/toolchain/dbuild --help' to print the full help message. Message-Id: <20200426192746.11034-1-penberg@scylladb.com>	2020-04-27 16:22:18 +03:00
Calle Wilund	040ffa6e64	distributed_loader: Add concurrency control override for named keyspaces Fixes #6202 Distributed loader sstable opening is gated through the database::sstable_load_concurrency_sem() semaphore (at a concurrency of 3). This is (according to creation comment) to reduce memory footprint during bootstrap, by partially serializing the actual opening of existing sstables. However, in certain versions of the product, there exist circular dependencies between data in some sstables and the ability to actually read others. Thus when gated as above, we can end up with the dependents acquiring the semaphore fully, and once stuck waiting for population of their dependency effectively blocking this from ever happening. Since we probably do not want to remove the concurrency control, and increasing it would only push the problem further away, we solve the issue by adding the ability to mark certain keyspaces as "prioritized" (pre-bootstrap), and allow them to populate outside the normal concurrency control semaphore. Concurrency increase is however limited to one extra sstable per shard and prio keyspace. Message-Id: <20200415102431.20816-1-calle@scylladb.com>	2020-04-27 16:21:13 +03:00
Piotr Sarna	d3aba44aea	Merge 'cdc: fix the "NoHostAvailable" client error when CL is not met' from Juliusz. CL of LOCAL_QUORUM used to be hardcoded into CDC preimage query and led to an error every time the number of replicas was lower than CL could require. The solution here is to link the CLs of writes to base table with the CLs of CDC reads, so the client will get the (limited) control over the consistency of preimage SELECTs (instead of constant misleading errors). The algorithm is as follows: 1. If write that caused CDC activity was done with CL = ANY, then do preimage read with CL = ONE. 2. If write that caused CDC activity was done with CL = ALL, then do preimage read with CL = QUORUM. 3. SERIAL and LOCAL_SERIAL writes cause preimage read with QUORUM and LOCAL_QUORUM, respectively. 4. In other cases do preimage read with the same CL as base write. To further mitigate the incomprehensible error being sent to client, I wrapped the preimage's SELECT query in try-catch and intercept the `unavailable_exception`, which was manifesting as `NoHostAvailable` in Python and Java drivers. Now client gets a new error code and a message specific to the issue of CL not being met by the preimage query. Fixes #5746 * jul-stas-5746-cdc-replication-factor: cdc: fix the "NoHostAvailable" client error when CL is not met cdc: CL for preimage select is calculated from base write CL	2020-04-27 14:24:12 +02:00
Juliusz Stasiewicz	d37b3f34f1	cdc: fix the "NoHostAvailable" client error when CL is not met This commit resolves the client-observable effect of CDC read consistencies. I wrapped the preimage's SELECT query in try-catch to intercept the `unavailable_exception`, which led to misleading `NoHostAvailable` in Python and Java drivers. Now client gets a new error code and a message specific to the issue of CL not being met by the preimage query. Fixes #5746	2020-04-27 13:56:57 +02:00
Piotr Sarna	c32faee657	Merge 'counters: Fix filtering of counters' from Juliusz Queries with `ALLOW FILTERING` and constraints on counter values used to be rejected as "unimplemented". The reason was a missing tri-comparator, which is added in this patch. Fixes #5635 * jul-stas-5635-filtering-on-counters: cql/tests: Added test for filtering on counter columns counters: add comparator and remove `unimplemented` from restrictions	2020-04-27 13:53:34 +02:00
Juliusz Stasiewicz	afee590ed7	cql/tests: Added test for filtering on counter columns Tested predicates: IN, EQ, GE, GT, LE, LT. Untouched counters are expected to evaluate as 0. Deleted counters are expected not to appear at all.	2020-04-27 13:36:16 +02:00
Juliusz Stasiewicz	cf2d81bb12	counters: add comparator and remove `unimplemented` from restrictions CQL `counter_type_impl` is now made comparable by deserializing it as an `int64_t`. It allows the use of counters in statement restrictions.	2020-04-27 13:27:48 +02:00
Avi Kivity	1f902302ad	build: replace xxhash submodule with OS package The xxhash library has been packaged by Fedora, so we can use it instead of carrying the submodule. This reduces allows us to receive updates as the OS packages are updated. Build time will not be reduced since it is a header-only library. xxhash preserves the hash results across versions so rolling upgrades will still work. The frozen toolchain is updated with the new package. Tests: unit (dev)	2020-04-27 14:00:31 +03:00
Mike Goltsov	068bb3a5bf	fix error in fstrim service (scylla_util.py) On Centos 7 machine: fstrim.timer not enabled, only unmasked due scylla_fstrim_setup on installation When trying run scylla-fstrim service manually you get error: Traceback (most recent call last): File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 60, in <module> main() File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 44, in main cfg = parse_scylla_dirs_with_default(conf=args.config) File "/opt/scylladb/scripts/scylla_util.py", line 484, in parse_scylla_dirs_with_default if key not in y or not y[k]: NameError: name 'k' is not defined It caused by error in scylla_util.py Fixes #6294.	2020-04-27 13:32:11 +03:00
Pavel Solodovnikov	f6e765b70f	cql3: pass `column_specification` via lw_shared_ptr `column_specification` class is marked as "final": it's safe to use non-polymorphic pointer "lw_shared_ptr" instead of a more generic "shared_ptr". tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200427084016.26068-1-pa.solodovnikov@scylladb.com>	2020-04-27 12:47:42 +03:00
Takuya ASADA	811b256f2b	redis: add test for setex/ttl	2020-04-27 13:58:33 +09:00
Takuya ASADA	d845fde560	redis: add ttl command Add ttl command that returns remaining TTL of the key. See: https://redis.io/commands/ttl	2020-04-27 13:58:33 +09:00
Takuya ASADA	98cae802c0	redis: add setex command Add setex to set key with TTL. See: https://redis.io/commands/setex	2020-04-27 13:58:33 +09:00
Pekka Enberg	7304a795e5	scripts/jobs: Keep memory reserve when calculating parallelism The "jobs" script is used to determine the amount of compilation parallelism on a machine. It attempts to ensure each GCC process has at least 4 GB of memory per core. However, in the worst case scenario, we could end up having the GCC processes take up all the system memory, forcin swapping or OOM killer to kick in. For example, on a 4 core machine with 16 GB of memory, this worst case scenario seems easy to trigger in practice. Fix up the problem by keeping a 1 GB of memory reserve for other processes and calculating parallelism based on that. Message-Id: <20200423082753.31162-1-penberg@scylladb.com>	2020-04-26 19:38:47 +03:00
Piotr Sarna	e17c237feb	alternator: fix integer overflow warning in token generation When generating tokens for parallel scan, debug mode undefined behavior sanitizer complained that integer overflow sometimes happens when multiplying two big values - delta and segment number. In order to mitigate this warning, the multiplication is now split into two smaller ones, and the generated machine code remains identical (verified on gcc and clang via compiler explorer). Fixes #6280 Tests: unit(dev)	2020-04-26 19:06:07 +03:00
Piotr Sarna	c66661c582	table: bypass cache when generating view updates from streaming There's no indication that data needed for generating view updates from staging sstables is going to be immediately useful for the user, and a large amount of it can push hot rows out of the cache, thus deteriorating performance. Fixes #6233 Tests: unit(dev)	2020-04-26 15:43:02 +03:00
Rafael Ávila de Espíndola	0d89bbd57f	row_cache_alloc_stress_test: Make sure GCC can't delete a new We want to test that a std::bad_alloc is thrown, but GCC 10 has a new optimization (-fallocation-dce) that removes dead allocations. This patch assigns the value returned by new to a global so that GCC cannot delete it. With this all tests in a dev build pass with GCC 10. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200424201531.225807-1-espindola@scylladb.com>	2020-04-26 15:22:04 +03:00
Rafael Ávila de Espíndola	543a9ebd9b	tests: Wait for a few futures GCC 10 now warns on these. This fixes the dev build with gcc 10. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200424161006.17857-1-espindola@scylladb.com>	2020-04-26 15:20:40 +03:00
Takuya ASADA	df4fac2849	dist: add scylla_memory_setup To ask user the host is not shared with another services, then set "--lock-memory 1" if it's not shared. Fixes #1393	2020-04-26 13:34:05 +03:00
Rafael Ávila de Espíndola	ac3c1f6c0f	configure: Don't use -static-libgcc The configure option is --static-stdc++, to is surprising that it also enables -static-libgcc. Also, -static-libgcc doesn't seem to work with debug builds. This patch removes -static-libgcc which fixes debug builds with --static-stdc++. Such builds are convenient for testing new versions of gcc. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200424214117.257195-1-espindola@scylladb.com>	2020-04-25 19:47:36 +03:00
Tomasz Grabiec	31ccd3750b	Update seastar submodule * seastar b5fb927...37a22d9 (19): > io_queue: bring capacity back > tls_test: Remove redundant move > httpd_test: Remove unused fields > everywhere: Remove unused lambda captures > rpc: add Doxygen documentation the protocol class > build: Pass --create-cc to seastar-json2code.py > seastar-json2code: Add a --create-cc option > future: move some static_assert()ions from future.hh to future.cc > http server: fix date function on non-English locales > everywhere: Add messages to static_assert > http server: fix "Date" header format > future: Fix invalid static_assert > fair_queue: remove legacy capacity configuration > reactor: fix private 'pollfn' alias > defer: include std headers > spinlock: add try_lock method > testing: Add missing <iostream> include to seastar_test.cc > rpc: Avoid excessive number of reallocations when reading compressed frames > timer: document	2020-04-23 20:50:27 +02:00
Pavel Emelyanov	98635b74a6	main: Keep feature_service for storage_proxy Fixes #6250 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200423165608.32419-1-xemul@scylladb.com>	2020-04-23 20:46:36 +02:00
Pavel Emelyanov	83fe0427d2	api/cache_service: Relax getting partitions count This patch has two goals -- speed up the total partitions calculations (walking databases is faster than walking tables), and get rid og row_cache._partitions.size() call, which will not be available on new _partitions collection implementation. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200423133900.27818-1-xemul@scylladb.com>	2020-04-23 17:47:58 +02:00
Pavel Emelyanov	6ede253479	api/cache_service: Fix get_row_capacity calculation Current code gets table->row_cache->cache_tracker->region and sums up the region's used space for all tables found. The problem is that all row_cache-s share the same cache_tracker object from the database, thus the resulting number is not correct. Fix this by walking cache_tracker-s from databases instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200423133755.27187-1-xemul@scylladb.com>	2020-04-23 17:05:52 +03:00
Pavel Emelyanov	d3b6f66f50	row_cache: Remove unused invalidate_unwrapped() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200423133557.27053-1-xemul@scylladb.com>	2020-04-23 17:04:31 +03:00
Rafael Ávila de Espíndola	e6f4996e44	atomic_vetor: Don't pass references to callbacks This is more strict than it needs to be, but it avoids any bugs like the one fixed by the previous patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200422182304.120906-2-espindola@scylladb.com>	2020-04-23 16:06:37 +03:00
Rafael Ávila de Espíndola	d8555513a9	gms: Don't keep references to reallocated vector entries These callbacks can block a seastar thread and the underlying vector can be reallocated concurrently. This is no different than if it was a plain std::vector and the solution is similar: use values instead of references. Fixes #6230 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200422182304.120906-1-espindola@scylladb.com>	2020-04-23 16:06:36 +03:00
Rafael Ávila de Espíndola	fbcf741c2d	cql functions: Use switch to find the cast function to use This produces more compact code and avoids the anti-pattern of building a map with statically known values. If the values are given to GCC via a switch statement it can do a much better job at compile time than libstdc++ can at runtime. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200422224905.198794-1-espindola@scylladb.com>	2020-04-23 11:46:09 +03:00
Nadav Har'El	1f75efb556	alternator: use RF=3 even if some nodes are temporarily down Alternator is supposed to use RF=3 for new tables. Only when the cluster is smaller than 3 nodes do we use RF=1 (and warn about it) - this is useful for testing. However, our implementation incorrectly tested the number of live nodes in the cluster instead of the total number of nodes. As a result, if a 3-node cluster had one node down, and a new table was created, it was created with RF=1, and immediately could not be written because when RF=1, any node down means part of the data is unavailable. This patch fixes this: The total number of nodes in the cluster - not the number of live nodes - is consulted. The three-node-cluster-with-a-dead-node setup above creates the table with RF=3, and it can be written because two living nodes out of three are enough when RF=3 and we do quorum writes and reads. We have a dtest to reproduce this bug (and its fix), and it's also easy to reproduce manually by starting a 3-node cluster, killing one of the nodes, and then running "pytests". Before this patch, the tests can create tables but then fail to write to them. After this patch, the test succeed on the same cluster with the dead node. Fixes #6267 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200422182035.15106-2-nyh@scylladb.com>	2020-04-23 08:23:05 +02:00
Nadav Har'El	08c39bde1a	gossiper: add convenience function for getting number of nodes The gossiper has a convenience functions get_up_endpoint_count() and get_down_endpoint_count(), but strangely no function to get the total number. Even though it's easy to calculate the total by summing up their result it is inefficient and also incovenient because of of these functions returns a future. So let's add another function, get_all_endpoint_count(), to get the total number of nodes. We will use this function in the next patch. Signed-off-by: Nadav Har'El <n...@scylladb.com> Message-Id: <20200422182035.15106-1-nyh@scylladb.com>	2020-04-23 08:23:05 +02:00
Nadav Har'El	86fadd700f	docs: Alternator parallel scan is supported now After fixing issue #6260, the "parallel scan" feature in Alternator is supported, so drop the sentence in alternator.md saying that it isn't. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200422090738.21648-1-nyh@scylladb.com>	2020-04-23 08:16:16 +02:00
Nadav Har'El	92e36c5df5	test/alternator: increase timeout on Scylla boot The Alternator test boots Scylla to test against it. We set an arbitrary timeout for this boot to succeed: 100 seconds. This 100 seconds is significantly more than 25 seconds it takes on my laptop, and I though we'll never reach it. But it turns out that in some setups - running the very slow debug build on slow and overcommitted nodes - 100 seconds is not enough. So this patch doubles the timeout to 200 seconds. Note that this "200 seconds" is just a timeout, and doesn't affect normal runs: Both a successful boot and a failed boot are recognized as soon as they happen, and we never unnecessarily wait the entire 200 seconds. Fixes #6271. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200422193920.17079-1-nyh@scylladb.com>	2020-04-23 07:47:21 +02:00
Piotr Jastrzebski	0416d70c9f	cdc:use CDCPartitioner for CDC Log This will allow deterministic stream_id generation and would remove the risk of not being able to generate a stream id for some vnode. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-04-22 18:25:51 +02:00
Piotr Jastrzebski	1d1c6af72a	dht: Add find_first_token_for_shard This new function finds the first token in range (start, end] that belongs to given shard. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-04-22 18:24:54 +02:00
Piotr Jastrzebski	c82adb7906	dht: use long_token in token::to_int64 Previous implementation of to_int64 wasn't handling dht::minimum_token and dht::maximum_token. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-04-22 16:12:00 +02:00
Tomasz Grabiec	c59ec8d97f	Merge "Avoid some memory copies in lwt" from Gleb * seastar-dev.git gleb/lwt-shared-proposal: lwt: pass paxos::proposal as a shared pointer everywhere lwt: do not copy proposal in paxos_state::accept lwt: make load_paxos_state to take partition_key_view instead of a deference	2020-04-22 13:43:03 +02:00
Gleb Natapov	97af6bb0bd	lwt: make load_paxos_state to take partition_key_view instead of a deference Some caller have partition_key_view, but not partition_key, so thy need to create a temporary and copy just to pass a reference. Change it by accepting a view.	2020-04-22 13:51:43 +03:00
Gleb Natapov	c970da3811	lwt: do not copy proposal in paxos_state::accept A proposal is passed as a reference and all callers have it in stable memory until the call ends, so it is safe to use the reference everywhere.	2020-04-22 13:51:43 +03:00
Gleb Natapov	fbb04698d0	lwt: pass paxos::proposal as a shared pointer everywhere paxos::proposal reference is passed into a lot of functions and sometimes it has to be copied to prolong its lifetime. Create it as a shared pointer and pass it everywhere to avoid those copies.	2020-04-22 13:51:43 +03:00
Calle Wilund	525b283326	commitlog::read_log_file: Preserve subscription across reading Fixes #6265 Return type for read_log_file was previously changed from subscription to future<>, returning the previously returned subscriptions result of done(). But it did not preserve the subscription itself, which in turn will cause us to (in work::stream), call back into a deleted object. Message-Id: <20200422090856.5218-1-calle@scylladb.com>	2020-04-22 12:12:11 +03:00
Asias He	8b7189f2dd	mutation_writer_test: Add test_multishard_writer_producer_aborts Without the patch "multishard_writer: Abort the queue attached to consumers when producer fails", the test would hang forever. Fixes #6241	2020-04-22 16:28:07 +08:00
Piotr Sarna	dbb9574aa2	alternator: allow parallel scan Parallel scans can be performed by providing Segment and TotalSegments attributes to Scan request, which can be used to split the work among many workers. This test makes the parallel scan test succeed, so the xfail is removed. Fixes #5059	2020-04-22 11:06:15 +03:00
Botond Dénes	e778b072b1	read_command: use bool_class for is_first_page parameter The constructor of `read_command` is used both by IDL and clients in the code. However, this constructor has a parameter that is not used by IDL: `read_timestamp`. This requires that this parameter is the very last in the list and that new parameters that are used by IDL are added before it. One such new parameter was `bool is_first_page`. Adding this parameter right before the read timestamp one created a situation where the last parameter (read_timestamp) implicitly converts to the one before it (is_first_page). This means that some call sites passing `read_timestamp` were now silently converting this to `is_first_page`, effectively dropping the timestamp. This patch aims to rectify this, while also avoiding similar accidents in the future, by making `is_first_page` a `bool_class` which doesn't have any implicit convertions defined. This change does not break the ABI as `bool_class` is also sent as a `bool` on the wire. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Tests: unit(dev) Message-Id: <20200422073657.87241-1-bdenes@scylladb.com>	2020-04-22 11:01:22 +03:00
Rafael Ávila de Espíndola	45ee52724c	cql functions: Don't use a std::function for casts Casts only depend on their operands, so a plain function pointer is sufficient. This allows replacing all the make_castas_* functions that return a lambda with plain castas_* functions that do the casting. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200413162014.23884-2-espindola@scylladb.com>	2020-04-22 10:44:56 +03:00
Glauber Costa	1f9c37fb5e	view_updating_consumer: move reference to a pointer It is currently not possible to wrap the view_updating_consumer in an std::optional. I intend to do it to allow for compactions to optionally generate view updates. The reason for that is that view_updating_consumer has a reference as a member, which makes the move assignment constructor not be implicitly generated. This patch fixes it by keeping a pointer instead of a reference. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200421123648.8328-1-glauber@scylladb.com>	2020-04-22 10:05:35 +03:00
Botond Dénes	7dabf75682	service: messaging_service: resolve rpc set_logger deprecation warning Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200407091413.310764-1-bdenes@scylladb.com>	2020-04-22 10:05:35 +03:00
Piotr Jastrzebski	7884eada1a	cdc: add CDCPartitioner This is a special partitioner that will be used by CDC Log. It works only with partition key that is blob composed of two ints. The first int is a token this partitioner will map the key to. The second int is there to make it possible to create multiple keys that are different from each other but map to the same token. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-04-21 15:50:22 +02:00
Piotr Jastrzebski	330cd162f0	stream_id: add token_from_bytes static function This function will be used by CDCPartitioner to extract token from partition key. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-04-21 15:50:22 +02:00
Piotr Jastrzebski	ae1f14095f	i_partitioner: Stop distinguishing whether keys order is preserved Scylla inherited a concept of partitioners that preserve order of keys from the origin but it is not used for anything. Moreover, none of the existing partitioners preserves keys order. The only partitioner that did this in the past was ByteOrderedPartitioner and Scylla does not support it any more. For a partitioner to preserve an order of the keys means that if there are two keys A and B such that A < B then token(A) < token(B) where token(X) isa token the partitioner assignes to key X. This patch removes dht::i_partitioner::preserves_order with all its overrides. The only place that was using this member function was a check in thrift server and it is safe to remove the check because the check was only done to differentiate the error message for partitioners that do and do not preserve the order of the keys. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-04-21 15:50:22 +02:00
Botond Dénes	c9d3053e91	test/boost: castas_fcts_test: add test for identity casts `aa9a582f4` allowed all types to be cast to themselves, but didn't add a unit test for this. This patch rectifies this. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200421125902.1709684-1-bdenes@scylladb.com>	2020-04-21 15:10:28 +02:00
Juliusz Stasiewicz	c70311f73e	cdc: CL for preimage select is calculated from base write CL CL of LOCAL_QUORUM used to be hardcoded into CDC preimage query and led to an error when number of replicas was lower than CL would require. The solution here is to link the CLs of writes to base table with the CLs of CDC reads, so the client will get the (limited) control over the consistency of preimage SELECTs (instead of getting error every time). The algorithm is as follows: 1. If write that caused CDC activity was done with CL = ANY, then do preimage read with CL = ONE. 2. If write that caused CDC activity was done with CL = ALL, then do preimage read with CL = QUORUM. 3. SERIAL and LOCAL_SERIAL writes cause preimage read with QUORUM and LOCAL_QUORUM, respectively. 4. In other cases do preimage read with the same CL as base write.	2020-04-21 14:33:36 +02:00
Avi Kivity	2482e53de9	test: alternator: configure scylla for test environment in terms of cpu and disk Currently, the alternator tests configure scylla to use all the logical cores in the host system, but only 1GB of RAM. This can lead to a small amount of memory per core. It also uses the default disk configuration, which is safe, but can be very slow on mechanical or non-enterprise disks. Change to use a fixed --smp 2 configuration, and add --overprovisioned for maximum flexibility (no spinning). Use --unsafe-bypass-fsync for faster performance on non-enterprise or mechanical disks, assuming that the test data is not important. Fixes #6251. Message-Id: <20200420154112.123386-1-avi@scylladb.com>	2020-04-20 18:50:46 +03:00
Nadav Har'El	44a1daf025	merge: Allow accessing Scylla system tables from alternator Merged patch series from Piotr Sarna: This series allows reading rows from Scylla's system tables via alternator by using a virtual interface. If a Query or Scan request intercepts a table name with the following pattern: .scylla.alternator.KEYSPACE_NAME.TABLE_NAME, it will read the data from Scylla's KEYSPACE_NAME.TABLE_NAME table. The interface is expected to only return data for Scylla system tables and trying to access regular tables via this interface is expected to return an error. This series comes with tests (alternator-test, scylla_only). Fixes #6122 Tests: alternator-test(local,remote (to verify that scylla_only works) Piotr Sarna (5): alternator: add fallback serialization for all types alternator: add fetching static columns if they exist alternator: add a way of accessing system tables from alternator alternator-test: add scylla-only test for querying system tables docs: add an entry about accessing Scylla system tables alternator-test/test_system_tables.py \| 61 +++++++++++++++++++++++++++ alternator/executor.cc \| 38 ++++++++++++++++- alternator/executor.hh \| 1 + alternator/serialization.cc \| 11 +++-- docs/alternator/alternator.md \| 15 +++++++ 5 files changed, 122 insertions(+), 4 deletions(-) create mode 100644 alternator-test/test_system_tables.py	2020-04-20 18:21:20 +03:00
Piotr Sarna	03f41b9d96	db: remove trailing whitespace Found when backporting a patch to 3.3. Message-Id: <fa406597deaacff56dbba99fa167715b041bbb52.1587375123.git.sarna@scylladb.com>	2020-04-20 12:58:55 +02:00
Kamil Braun	d73a21057a	gossiper: make `get_seeds` method const and return a const ref	2020-04-20 12:57:16 +02:00
Kamil Braun	1f7290a0ff	versioned_value: remove versioned_value::factory class If there was a Most Useless Abstraction award, this would be a good candidate.	2020-04-20 12:57:16 +02:00
Kamil Braun	113384b6f8	gms: move TOKENS string deserialization code into versioned_value And do the same with CDC_STREAMS_TIMESTAMP. The code that took a list of tokens represented as a string inside versioned_value (for gossiping) and deserialized it into an `unordered_set<dht::token>` lived in the storage_service module, while the code that did the serializing (set -> string) lived in versioned_value. There was a similar situation with the CDC generation timestamp. To increase maintanability and reusability, the deserialization code is now placed next to the serialization code in versioned_value. Furthermore, the `make_full_token_string`, `make_token_string`, and `make_cdc_streams_timestamp_string` (serialization functions) are moved out of versioned_value::factory and made static methods of versioned_value instead.	2020-04-20 12:57:13 +02:00
Tomasz Grabiec	e648e314e5	Merge "Drop only learnt value on PRUNE" from Gleb It is unsafe to remove entire row, so only drop learn value from system.paxos table. Fixes: #6154	2020-04-20 12:06:04 +02:00
Asias He	d86958d3b2	multishard_writer: Abort the queue attached to consumers when producer fails We have this in multishard_writer: future<uint64_t> multishard_writer::operator()() { return distribute_mutation_fragments().finally([this] { return wait_pending_consumers(); }).then([this] { return _consumed_partitions; }); } The wait_pending_consumers which waits for the consumers to finish is called even when distribute_mutation_fragments fails. When distribute_mutation_fragments fails and the failure is due to the producer fails, consumers can wait for data which will never come because the producer has failed already. This can cause a deadlock. To fix, when distribute_mutation_fragments fails, we should abort the queues that are attached to the readers used by the consumers. Fixes #6241	2020-04-20 14:53:24 +08:00
Piotr Jastrzebski	2aaf81bf7c	dht: Exclude -2^63 value from get_random_token -2^63 is a value reserved for min/max token boundaries and shouldn't be used for regular tokens. This patch fixes get_random_token to never create token with value -2^63. On the way dht::get_random_number template method is removed because it was exclusively used by get_random_token. Also use uniform_int_distribution with int64_t instead of uint64_t by using correct constructor parameter that guarantees values between -2^63+1 and 2^63-1 inclusively. Tests: unit(dev) Fixes #6237. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <0a1a939355f5005039d5c2c7c513bad94cf60be2.1587302093.git.piotr@scylladb.com>	2020-04-19 18:17:35 +03:00
Gleb Natapov	73391420fb	lwt: drop only most recently learnt value during prune. It turned out we cannot drop the information about most recent commit entirely since it is used to cut off already outdate accepted values. Otherwise the following scenario can happen: 1. cas1 prepares on A, B, C, gets one accept from A 2. cas2 prepares on B, C, gets 2 accepts on B and C, learns on B, C 3. cas3 initiates a prepare on A, learns about cas1's accept, 4. cas2 learns on A, prunes on A, B, C Now cas3 will reply cas1's value because it does not know that it is less than already committed on (removed during step 4). The patch drops only committed value and keep the information about latest committed ballot. Fixed #6154	2020-04-19 17:12:15 +03:00
Gleb Natapov	d3d31d66d4	lwt: treated accepted ballot as a promised PAXOS node is allowed to accept a proposal without promising it first as long as its ballot is greater than already promised one. Treat such accepted ballot as promised since 'learn' stage removes accepted ballot, but we still want to remember it as the latest promised one. The goal is to be closer to formal PAXOS specification.	2020-04-19 17:12:03 +03:00
Raphael S. Carvalho	c350b864e8	compaction: Short-circuit TWCS interposer if only a single time window is needed If we know in advance that only a single window is needed, the TWCS interposer can be short-circuited. perf_sstable shows up to ~14% performance regression in compaction with interposer enabled for a table with schema containing 10 columns. no interposer (50k partitions) 81090.77 +- 33.82 partitions / sec (100 runs, 1 concurrent ops) TWCS interposer (50k partitions) 71149.80 +- 26.06 partitions / sec (100 runs, 1 concurrent ops) no interposer (100k partitions) 83791.13 +- 22.65 partitions / sec (100 runs, 1 concurrent ops) TWCS interposer (100k partitions) 72147.81 +- 13.39 partitions / sec (100 runs, 1 concurrent ops) command used: ./build/dev/test/perf/perf_sstable --num_columns 10 --partitions 100000 \ --iterations 100 --mode compaction --sstables 1 --testdir /home/fedora/xfs \ --smp 1 --cpuset 3-3 --poll-mode Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200409194235.6004-3-raphaelsc@scylladb.com>	2020-04-19 17:06:05 +03:00
Raphael S. Carvalho	3edff36cd2	compaction: Fix partition estimation with TWCS interposer Max and min windows are microsecond timestamps, which should be divided by window size in microseconds to properly estimate window count based on provided mutation_source_metadata. Found this problem after properly setting mutation_source_metadata with min and max metadata on behalf of regular compaction. Fixes #6214. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200409194235.6004-2-raphaelsc@scylladb.com>	2020-04-19 17:04:48 +03:00
Avi Kivity	1e2b3f7eb4	Merge "memory_footprint_test improvements" from Tomasz " Includes: - code cleanups - support for measuring data stores with more than one partition - measure sstable footprint for all supported formats - less verbose mode by default " * tag 'memory-footprint-test-improvement-v2' of github.com:tgrabiec/scylla: test: memory_footprint: Silence logging by default test: memory_footprint: Introduce --partition-count option test: memory_footprint: Run under a cql_test_env test: memory_footprint: Calculate sstable size for each format version sstables: Move all_sstable_versions to version.hh	2020-04-19 17:03:02 +03:00
Piotr Sarna	9c15604659	treewide: deprecate passing explicit order in schema building In order to avoid confusion with regard to whose responsibility it is to sort the key columns (see #5856), the interface which allows adding columns to the builder with explicit column id is moved to a private function. An internal with_column_ordered() overload is maintained to be used for internal operations, but it's encouraged to use simpler with_column() in new code. Fixes #6235 Tests: unit(dev)	2020-04-19 16:19:17 +03:00
Botond Dénes	a4aa753f0f	schema: schema(): use std::stable_sort() to sort key columns When multiple key columns (clustering or partition) are passed to the schema constructor, all having the same column id, the expectation is that these columns will retain the order in which they were passed to `schema_builder::with_column()`. Currently however this is not guaranteed as the schema constructor sort key columns by column id with `std::sort()`, which doesn't guarantee that equally comparing elements retain their order. This can be an issue for indexes, the schemas of which are built independently on each node. If there is any room for variance between for the key column order, this can result in different nodes having incompatible schemas for the same index. The fix is to use `std::stable_sort()` which guarantees that the order of equally comparing elements won't change. This is a suspected cause of #5856, although we don't have hard proof. Fixes: #5856 Signed-off-by: Botond Dénes <bdenes@scylladb.com> [avi: upgraded "Refs" to "Fixes", since we saw that std::sort() becomes unstable at 17 elements, and the failing schema had a clustering key with 23 elements] Message-Id: <20200417121848.1456817-1-bdenes@scylladb.com>	2020-04-19 13:42:44 +03:00
Nadav Har'El	7e7c688946	docs/alternator/alternator.md: fix typos Fix a couple of typos in the Alternator documentation. Fixes scylladb/scylla-doc-issues#280 Fixes scylladb/scylla-doc-issues#281 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200419091900.23030-1-nyh@scylladb.com>	2020-04-19 11:19:26 +02:00
Piotr Sarna	a6cf0bfa7d	table: switch to correct io_priority for streaming view updates The io_priority parameter used when generating view updates from streaming is used by the sstable reader, so it should use the I/O priority for streaming read operations, not streaming write operations. Fixes #6231 Tests: unit(dev)	2020-04-19 09:56:43 +03:00
Rafael Ávila de Espíndola	f3fd466156	dht: Use get_random_number<uint64_t> instead of int64_t in token::get_random_token I bisect the opposite change in `9c202b52da` as the cause of issue 6193. I don't know why. Maybe get_random_number<signed_type> is buggy? In any case, reverting to uint64_t solves the issue. Fixes #6193 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200418001611.440733-1-espindola@scylladb.com>	2020-04-19 09:46:06 +03:00
Alejo Sanchez	bd849764e0	utils: error injection sleep add support for manual_clock Requested by @tgrabiec in previous patch (already merged). Adds support for sleep using manual clock. Add test. NOTE: Removes system_clock support (and test) as sleep is not explicitly instantiated in seastar/src/core/reactor.cc Branch URL: https://github.com/alecco/scylla/tree/error_injection_5_manual_clock Tests: unit ({dev}) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200417081518.868900-1-alejo.sanchez@scylladb.com>	2020-04-17 11:45:05 +02:00
Tomasz Grabiec	92771e904a	test: memory_footprint: Silence logging by default	2020-04-17 11:34:13 +02:00
Tomasz Grabiec	1df63b60c3	test: memory_footprint: Introduce --partition-count option	2020-04-17 11:34:13 +02:00
Tomasz Grabiec	7c2f6dd75e	test: memory_footprint: Run under a cql_test_env	2020-04-17 11:34:13 +02:00
Tomasz Grabiec	04c093cbec	test: memory_footprint: Calculate sstable size for each format version	2020-04-17 11:34:12 +02:00
Tomasz Grabiec	3e74dd4df3	sstables: Move all_sstable_versions to version.hh	2020-04-17 11:34:02 +02:00
Rafael Ávila de Espíndola	3586324a61	sstables: Delete never overwritten methods Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200417012330.246071-1-espindola@scylladb.com>	2020-04-17 09:16:16 +03:00
Avi Kivity	2039b79664	commitlog: filter out files in the commitlog directory which don't have the correct prefix Commitlog replay is given a filename prefix to filter files against, but it ignores it. As a result we will replay anything in that directory, including recycled segments, which is wasteful. Fix by adding a check for the prefix. Tests: unit (dev), manual test that regular commitlog files are not filtered. Message-Id: <20200416174542.133230-1-avi@scylladb.com>	2020-04-17 08:44:32 +03:00
Kamil Braun	3d811e2f95	sstables: freeze types nested in collection types in legacy sstables Some legacy `mc` SSTables (created in Scylla 3.0) may contain incorrect serialization headers, which don't wrap frozen UDTs nested inside collections with the FrozenType<...> tag. When reading such SSTable, Scylla would detect a mismatch between the schema saved in schema tables (which correctly wraps UDTs in the FrozenType<...> tag) and the schema from the serialization header (which doesn't have these tags). SSTables created in Scylla versions 3.1 and above, in particular in Scylla versions that contain this commit, create correct serialization headers (which wrap UDTs in the FrozenType<...> tag). This commit does two things: 1. for all SSTables created after this commit, include a new feature flag, CorrectUDTsInCollections, presence of which implies that frozen UDTs inside collections have the FrozenType<...> tag. 2. when reading a Scylla SSTable without the feature flag, we assume that UDTs nested inside collections are always frozen, even if they don't have the tag. This assumption is safe to be made, because at the time of this commit, Scylla does not allow non-frozen (multi-cell) types inside collections or UDTs, and because of point 1 above. There is one edge case not covered: if we don't know whether the SSTable comes from Scylla or from C*. In that case we won't make the assumption described in 2. Therefore, if we get a mismatch between schema and serialization headers of a table which we couldn't confirm to come from Scylla, we will still reject the table. If any user encounters such an issue (unlikely), we will have to use another solution, e.g. using a separate tool to rewrite the SSTable. Fixes #6130.	2020-04-16 18:44:56 +03:00
Avi Kivity	141bd44982	Update seastar submodule * seastar f846a348b...b5fb92739 (3): > Merge 'file utils infrastructure' from Benny > future: future_state: make exception constructors noexcept > timer: add scheduling_group awareness Fixes #6170.	2020-04-16 15:20:50 +03:00
Nadav Har'El	606ae0744c	docs, alternator: alternator.md cleanup Clean up the alternator.md document, by: * Updating out-of-date information that outstayed its welcome. * When Scylla does have a feature but it's just not supported via the DynamoDB API (e.g., CDC and on-demand backups) mention that. * Remove mention of Alternator being experimental and users should not store important data on it :-) * Miscellaneous cleanups. Fixes #6179. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200412094641.27186-1-nyh@scylladb.com>	2020-04-16 13:39:28 +02:00
Rafael Ávila de Espíndola	3b8e84731b	configure: Make the stack usage warning more strict All the dev and release warning at the previous level have been fixed, so tighten the warning a bit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200413212241.365022-1-espindola@scylladb.com>	2020-04-16 09:02:22 +03:00
Vlad Zolotarov	b83e84b467	db::hints:: optimize with_file_update_mutex() Avoid extra shared_ptr copy. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200311214313.2988-1-vladz@scylladb.com>	2020-04-16 09:01:40 +03:00
Piotr Sarna	71ac6ebcc5	Merge 'prepare the view building generator to work through a compaction' from Glauber There is no reason to read a single SSTable at a time from the staging directory. Moving SSTables from staging directory essentially involves scanning input SSTables and creating new SSTables (albeit in a different directory). We have a mechanism that does that: compactions. In a follow up patch, I will introduce a new specialization of compaction that moves SSTables from staging (potentially compacting them if there are plenty). In preparation for that, some signatures have to be changed and the view_updating_consumer has to be more compaction friendly. Meaning: - Operating with an sstable vector - taking a table reference, not a database Because this code is a bit fragile and the reviewer set is fundamentally different from anything compaction related, I am sending this separately * glommer-view_build: staging: potentially read many SSTables at the same time view_build_test: make sure it works with smp > 1	2020-04-15 18:07:09 +02:00
Glauber Costa	4e6400293e	staging: potentially read many SSTables at the same time There is no reason to read a single SSTable at a time from the staging directory. Moving SSTables from staging directory essentially involves scanning input SSTables and creating new SSTables (albeit in a different directory). We have a mechanism that does that: compactions. In a follow up patch, I will introduce a new specialization of compaction that moves SSTables from staging (potentially compacting them if there are plenty). In preparation for that, some signatures have to be changed and the view_updating_consumer has to be more compaction friendly. Meaning: - Operating with an sstable vector - taking a table reference, not a database Because this code is a bit fragile and the reviewer set is fundamentally different from anything compaction related, I am sending this separately Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-15 11:26:44 -04:00
Glauber Costa	94d6b75a27	view_build_test: make sure it works with smp > 1 This test doesn't work with higher smp counts, because it relies on dealing with keys named 'a' and 'b' and creates SSTables containing one of them manually. This throws an exception if we happen to execute on a shard that don't own the tokens corresponding to those keys. This patch avoids that problem by pre-selecting keys that we know to belong to the current shard in which the test is executed. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-15 10:53:32 -04:00
Konstantin Osipov	18b9bb57ac	lwt: rename metrics to match accepted terminology Rename inherited metrics cas_propose and cas_commit to cas_accept and cas_learn respectively. A while ago we made a decision to stick to widely accepted terms for Paxos rounds: prepare, accept, learn. The rest of the code is using these terms, so rename the metrics to avoid confusion/technical debt. While at it, rename a few internal methods and functions. Fixes #6169 Message-Id: <20200414213537.129547-1-kostja@scylladb.com>	2020-04-15 12:20:30 +02:00
Piotr Jastrzebski	20bc93b941	cdc: Stop storing CDC options in scylla tables Initially we were storing CDC options in scylla tables but then we realized that we can use schema extensions. Extensions are more flexible and cause less problems with schema digest. The transition was done in 4.0 and with that we stopped reading 'cdc' column in scylla tables. Commit `861c7b5626` removed the code that used to read 'cdc' column. Since no Scylla node should be reading 'cdc' column, we can always keep it empty now. This will allow removal of schema::cdc_options in the future. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-04-15 06:56:44 +02:00
Benny Halevy	35892e4557	db::commitlog: close file if wrapping failed When I/O error (e.g. EMFILE / ENOSPC) happens we hit an assert in ~append_challenged_posix_file_impl(): Assertion _closing_state == state::closed' failed. Commit `6160b9017d` add close on failure of the lamda defined in allocate_segment_ex, but it doesn't handle an error after the file is opened/created while it is wrapped with commitlog_file_extensions. Refs #5657 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Calle Wilund <calle@scylladb.com> Message-Id: <20200414115231.298632-1-bhalevy@scylladb.com>	2020-04-14 16:14:28 +03:00
Calle Wilund	a62d75fed5	commitlog_test: Ensure "when_over_disk_limit" reads segment list only once Fixes #6195 test_commitlog_delete_when_over_disk_limit reads current segment list in flush handler, to compare with result after allowing deletetion of segement. However, it might be called more than once in rare cases, because timing and us using rather small sizes. Reading the list the second time however is not a good idea, because it might just very well be exactly the same as what we read in the test check code, and we actually overwrite the list we want to check against. Because callback is on timer. And test is not. Message-Id: <20200414114322.13268-1-calle@scylladb.com>	2020-04-14 15:31:08 +03:00
Avi Kivity	40459fea0e	Merge "compound-compat: composite::iterator: cover error paths with `on_internal_error()`" from Botond " This is a continuation of recent efforts to cover more and more internal de-serialization paths with `on_internal_error()`. Errors like this should always be investigated but this can only be done with a core. This patch covers the error paths of `composite::iterator` with `on_internal_error()`. As we need this patch to investigate a 4.0 blocker issue (#6121) it only does the minimal amount of changes needed to allow generating a core for de-serializiation failures of composites. There are a few FIXMEs left in the code that I plan to address in a follow-up. Ref: #6121 " * 'compound-on-internal-error/v1' of https://github.com/denesb/scylla: compound_compat: composite::iterator cover error-paths with on_internal_error() compound_compat: composite_view: add is_valid()	2020-04-14 14:06:54 +03:00
Avi Kivity	ba6653f60c	Update seastar submodule * seastar cce2ddac1...f846a348b (3): > rpc: always shutdown socket when stopping a client Fixes #6060. > reactor: Deprecate cpu_id > httpd: switch main() to use seastar::async	2020-04-14 13:31:48 +03:00
Piotr Dulikowski	ff80b7c3e2	cdc: do not change frozen list type in cdc log table For a column of type `frozen<list<T>>` in base table, a corresponding column of type `frozen<map<timeuuid, T>>` is created in cdc log. Although a similar change of type takes place in case of non-frozen lists, this is unneeded in case of frozen lists - frozen collections are atomic, therefore there is no need for complicated type that will be able to represent a column update that depends on its previous value (e.g. appending elements to the end of the list). Moreover, only cdc log table creation logic performs this type change for frozen lists. The logic of `transformer::transform`, which is responsible for creation of mutations to cdc log, assumes that atomic columns will have their types unchanged in cdc log table. It simply copies new value of the column from original mutation to the cdc log mutation. A serialized frozen list might be copied to a field that is of frozen map type, which may cause the field to become impossible to deserialize. This patch causes frozen list base table columns to have a corresponding column in cdc log with the same type. A test is added which asserts that the type of cdc log columns is not changed in the case of frozen base columns. Tests: unit(dev) Fixes #6172	2020-04-14 09:44:22 +02:00
Piotr Sarna	0638699ffd	Merge 'test.py: run Alternator tests' from Nadav We have in alternator-test a set of over 340 functional tests for Alternator. These tests are written in Python using the pytest framework, expect Scylla to be running and connect to it using the DynamoDB API with the "boto3" library (the AWS SDK for Python). We have a script alternator-test/run which does everything needed to run all these tests: Starts Scylla with the appropriate parameters in a temporary directory, runs all the tests against it, and makes sure the temporary directory is removed (regardless of whether the tests succeeded or failed). The goal of this small patch series is to integrate these Alternator tests into test.py in a simple way. The idea is that we add one test which just runs the aforementioned "run" script which does its own business. The changes we needed to do in this series to achieve this are: 1. Make the alternator-test/run script pick a unique IP address on which to listen, instead of always using 127.0.0.1. This allows running this test in parallel with dtest tests, or even parallel to itself. 2. Move the alternator-test directory to test/alternator. This is the directory where test.py expects all the tests to live in. It also makes sense - since we already have multiple subdirectories in test/, to put the Alternator tests there too. 3. Add a new test suite type, "Run". A "Run" suite is simply a directory with a script called "run", and this script is run to run the entire suite, and this script does its own business. 4. Tests (such as the new "Run" ones) who can be killed gently and clean up after themselves, should be killed with SIGTERM instead of SIGKILL. After this series, to run the Alternator tests from test.py, do: ./test.py --mode dev alternator Note that in this version, the "--mode" has no effect - test/alternator/run always runs the latest compiled Scylla, regardless of the chosen mode. This can be fixed later. The Alternator tests can still be run manually and individually against a running Scylla or DynamoDB as before - just go to the test/alternator directory and run "pytest" with the desired parameters. Fixes #6046 * nyh/alternator-test-v3: alternator-test: make Alternator tests runnable from test.py test.py: add xunit XML output file for "Run" tests test.py: add new test type "Run" test.py: flag for aborting tests with SIGTERM, not SIGKILL alternator-test: change "run" script to pick random IP address alternator-test: add "--url" option to choose Alternator's URL	2020-04-14 07:56:37 +02:00
Kamil Braun	5a454663fd	sstables: move definition of column_translation::state::build to a .cc file	2020-04-13 17:45:25 +03:00
Asias He	13a9c5eaf7	repair: Send reason for node operations Since `956b092012` (Merge "Repair based node operation" from Asias), repair is used by other node operations like bootstrap, decommission and so on. Send the reason for the repair, so that we can handle the materialized view update correctly according to the reason of the operation. We want to trigger the view update only if the repair is used by repair operation. Otherwise, the view table will be handled twice, 1) when the view table is synced using repair 2) when the base table is synced using repair and view table update is triggered. Fixes #5930 Fixes #5998	2020-04-13 13:47:26 +03:00
Takuya ASADA	f24c13f2d1	redis: lolwut parameter fix Currently, lolwut with some parameters output broken square, such as "lolwut 10 1 1": 127.0.0.1:6379> lolwut 10 1 1 ⠀⡤⠤⠤⠤⠤⠤⠤⠤⠤ ⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀ It because we passes incorrect parameters on draw_schotter().	2020-04-13 10:46:45 +09:00
Takuya ASADA	b37ea9c27f	redis-test: add lolwut test Add test for lolwut command.	2020-04-13 10:46:45 +09:00
Calle Wilund	a14a28cdf4	gms::inet_address: Fix sign extension error in custom address formatting Fixes #5808 Seems some gcc:s will generate the code as sign extending. Mine does not, but this should be more correct anyhow. Added small stringify test to serialization_test for inet_address	2020-04-12 17:48:44 +03:00
Avi Kivity	a4a5b77bd5	Merge 'Match Cassandra's null prohibitions' from Dejan " We currently allow null on the right-hand side of certain relations, while Cassandra prohibits it. Since our handling of these null values is mostly incorrect, it's better to match Cassandra in prohibiting it. See the discussion (https://github.com/scylladb/scylla/pull/5763#discussion_r405557323. NB: any reverse mismatch (Scylla prohibiting something that Cassandra allows) is left remaining. For example, we forbid null bounds on clustering columns, which Cassandra allows. Tests: unit (dev) " * dekimir-match-cass-null: restrictions: Forbid null bound for nonkey columns restrictions: Forbid null equality	2020-04-12 17:44:31 +03:00
Nadav Har'El	4e2bf28b84	alternator-test: make Alternator tests runnable from test.py To make the tests in alternator-test runnable by test.py, we need to move the directory alternator-test/ to test/alternator, because test.py only looks for tests in subdirectories of test/. Then, we need to create a test/alternator/suite.yaml saying that this test directory is of type "Run", i.e., it has a single run script "run" which runs all its tests. The "run" script had to be slightly modified to be aware of its new location relative to the source directory. To run the Alternator tests from test.py, do: ./test.py --mode dev alternator Note that in this version, the "--mode" has no effect - test/alternator/run always runs the latest compiled Scylla, regardless of the chosen mode. The Alternator tests can still be run manually and individually against a running Scylla or DynamoDB as before - just go to the test/alternator directory (instead of alternator-test previously) and run "pytest" with the desired parameters. Fixes #6046 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-04-12 16:27:45 +03:00
Nadav Har'El	0cccb5a630	test.py: add xunit XML output file for "Run" tests Assumes that "Run" tests can take the --junit-xml=<path> option, and pass it to ask the test to generate an XML summary of the run to a file like testlog/dev/xml/run.1.xunit.xml. This option is honored by the Alternator tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-04-12 16:26:50 +03:00
Nadav Har'El	0ae3136900	test.py: add new test type "Run" This patch adds a new test type, "Run". A test subdirectory of type "Run" has a script called "run" which is expected to run all the tests in that directory. This will be used, in the next patch, by the Alternator functional tests. These tests indeed have a "run" script, which runs Scylla and then runs all of Alternator's tests, finishing fairly quickly (in less than a minute). All of that will become one test.py test. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-04-12 16:26:50 +03:00
Nadav Har'El	36e44972f1	test.py: flag for aborting tests with SIGTERM, not SIGKILL Today, if test.py is interrupted with SIGINT or SIGTERM, the ongoing test is killed with SIGKILL. Some types of tests - such as Alternator's test - may depend on being killed politely (e.g., with SIGTERM) to clean up files. We cannot yet change the signal to SIGTERM for all tests, because Seastar tests often don't deal well with signals, but we can at least add a flag that certain test types - that know they can be killed gently - will use. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-04-12 16:26:50 +03:00
Nadav Har'El	24fcc0c0ff	alternator-test: change "run" script to pick random IP address Before this patch, the Alternator tests "run" script ran Scylla on a fixed listening address, 127.0.0.1. There is a problem that there might be other concurrent runs of Scylla using the same IP address - e.g., CCM (used by dtest) uses exactly this IP address for its first node. Luckily, Linux's loopback device actually allows us to pick any of over a million addresses in 127.0.0.0/8 to listen on - we don't need to use 127.0.0.1 specifically. So the code in this patch picks an address in 127.1.., so it cannot collide with CCM (which uses 127.0.0.* for up to 255 nodes). Moreover, the last two bytes of the listen address are picked based on the process ID of the run script; This allows multiple copies of this script to run concurrently - in case anybody wishes to do that. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-04-12 16:26:31 +03:00
Nadav Har'El	1aec4baa51	alternator-test: add "--url" option to choose Alternator's URL The "--aws" and "--local" test options chooses between two useful default URLs - Amazon's, or http://localhost:8000 for a local installation. However, sometimes one wants to run Scylla on a different IP address or port, so in this patch we add a "--url" option to choose a specific URL to connect to. For example, "--url http://127.1.2.3:1234". We will later use this option in the alternator-test/run script, to pick a random IP address on which to run Scylla, and then run the test against this address. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-04-12 16:25:04 +03:00
Pekka Enberg	c8247aced6	Revert "api: support table auto compaction control" This reverts commit `1c444b7e1e`. The test it adds sometimes fails as follows: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed Ivan is working on a fix, but let's revert this commit to avoid blocking next promotion failing from time to time.	2020-04-11 17:56:02 +03:00
Takuya ASADA	679fb5887a	redis: add exists command Add exists command that returns key availablitiy. see: https://redis.io/commands/exists	2020-04-11 12:45:54 +02:00
Israel Fruchter	e3d764bb58	dist/docker: make docker-entrypoint.py pass signals to supervisord Stopping docker currectly didn't pass the signals to supervisord, hence scylla wasn't gracefully shutdown. Fixes #6150	2020-04-11 12:45:54 +02:00
Piotr Sarna	ea827d42b9	test: move config to heap in config_test ... in order to get rid of a large stack warning. Tests: unit(dev) Message-Id: <010517a6029a70de069d5952cc853f5724280eea.1586422630.git.sarna@scylladb.com>	2020-04-09 11:22:49 +02:00
Piotr Sarna	dea5bc41ff	docs: add an entry about accessing Scylla system tables A paragraph explaining how to access Scylla system tables via alternator HTTP(S) interface is added.	2020-04-09 09:41:30 +02:00
Piotr Sarna	e4b1da4047	alternator-test: add scylla-only test for querying system tables The first test case checks that system tables are readable via Scan/Query requests. The second test case checks that it's not possible to read user tables by using the virtual interface. The third test case checks that creating a table which looks like an internal system table pattern (.scylla.alternator.KS_NAME.TABLE_NAME) is not possible and returns a validation error.	2020-04-09 09:41:30 +02:00
Piotr Sarna	53bbef1e6c	alternator: add a way of accessing system tables from alternator Scylla's system tables often provide interesting information for clients. In order to be able to access this information without CQL, a notion of virtual tables is introduced to alternator. When a table named .scylla.alternator.KS_NAME.TABLE_NAME is accessed with read-only operation - Query or Scan, Scylla's internal KS_NAME.TABLE_NAME table will be queried instead. For instance, if a user wants to read about system_auth.roles, the Scan request should target the following table: ".scylla.alternator.system_auth.roles". Fixes #6122	2020-04-09 09:41:30 +02:00
Piotr Sarna	09d09ddefb	alternator: add fetching static columns if they exist Until now, the list of static column ids was always empty for alternator tables anyway, so the list wasn't fetched. However, with the virtual interface of fetching Scylla internal tables, we need to list the ids of selected static columns explicitly to avoid segfaults - since we select the whole row, static columns included.	2020-04-09 09:41:30 +02:00
Piotr Sarna	df02fc6b06	alternator: add fallback serialization for all types While most types (e.g. boolean) are not valid key types for alternator users, system tables derived from Scylla may still use this type for keys, e.g. system_auth.roles. Note that types which are not directly supported by alternator (e.g. double) will not be representable out-of-the-box - instead, they simply fall back to string, which is both human-readable and supported by alternator.	2020-04-09 09:41:30 +02:00
Dejan Mircevski	1ab04ac861	restrictions: Forbid null bound for nonkey columns Cassandra prohibits null bounds for non-key columns. Match that prohibition. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-04-08 16:35:47 -04:00
Ivan Prisyazhnyy	1c444b7e1e	api: support table auto compaction control This patch adds API endpoint /column_family/autocompaction/{name} that listen to GET and POST requests to pick and control table background compactions. To implement that the patch introduces "_compaction_disabled_by_user" flag that affects if CompactionManager is allowed to push background compactions jobs into the work. It introduces table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const to control auto compaction state. Fixes #1488 Fixes #1808 Fixes #440 Tests: unit(sstable_datafile_test autocompaction_control_test), manual	2020-04-08 21:18:38 +03:00
Dejan Mircevski	4f262e31d2	restrictions: Forbid null equality Cassandra prohibits `=null` for both column values and map values. Match that prohibition. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-04-08 13:57:49 -04:00
Botond Dénes	aa9a582f4a	cql3: functions/castas_fcts: allow self-casting any type Casting a type to itself doesn't make sense, but it is harmless so allow it instead of reporting a confusing error message that makes even less sense: InvalidRequest: Error from server: code=2200 [Invalid query] message="org.apache.cassandra.db.marshal.BooleanType cannot be cast to org.apache.cassandra.db.marshal.BooleanType" Note that some types already supported self-casting, this patch just extends this to all types in a forward compatible way. Fixes: #5102 Tests: unit(dev), manual test casting boolean to boolean. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200408135041.854981-1-bdenes@scylladb.com>	2020-04-08 18:52:36 +03:00
Piotr Sarna	123edfc10c	alternator: fix failure on incorrect table name with no indexes If a table name is not found, it may still exist as a local index, but the check tried to fetch a local index name regardless if it was present in the request, which was a nullptr dereference bug. Fixes #6161 Tests: alternator-test(local, remote) Message-Id: <428c21e94f6c9e450b1766943677613bd46cbc68.1586347130.git.sarna@scylladb.com>	2020-04-08 15:33:48 +03:00
Botond Dénes	196dd5fa9b	treewide: throw std::bad_function_call with backtraces We typically use `std::bad_function_call` to throw from mandatory-to-implement virtual functions, that cannot have a meaningful implementation in the derived class. The problem with `std::bad_function_call` is that it carries absolutely no information w.r.t. where was it thrown from. I originally wanted to replace `std::bad_function_call` in our codebase with a custom exception type that would allow passing in the name of the function it is thrown from to be included in the exception message. However after I ended up also including a backtrace, Benny Halevy pointed out that I might as well just throw `std:bad_function_call` with a backtrace instead. So this is what this patch does. All users are various unimplemented methods of the `flat_mutation_reader::impl` interface. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200408075801.701416-1-bdenes@scylladb.com>	2020-04-08 13:54:06 +02:00
Avi Kivity	a490cb669b	Update seastar submodule * seastar fd9af3a26...cce2ddac1 (6): > rpc: fix build failures in C++14 mode due to std::string_view > util/backtrace: introduce make_backtraced_exception_ptr() > future: make do_for_each noexcept > fair_queue rename the fair_queue_descriptor and change its default init > future: do_with: make noexcept > io_queue: batch communication with the fair_queue for ready requests	2020-04-08 13:54:06 +02:00
Botond Dénes	f0530c7d41	configure.py: add {mode}-test, {mode}-check, test and check targets The test target builds all tests and runs them. The check target compiles all the headers in addition to this. The {mode} variants do these just for the respective mode. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200407132641.598412-1-bdenes@scylladb.com> Reviewed-by: Pekka Enberg <penberg@scylladb.com>	2020-04-08 13:54:06 +02:00
Calle Wilund	65a6ebbd73	cdc: Postimage must check iff we have (pre-)image row data for non-touched columns Fixes #6143 When doing post-image generation, we also write values for columns not in delta (actual update), based on data selected in pre-image row. However, if we are doing initial update/insert with only a subset of columns, when the pre-image result set is nil, this cannot be done. Adds check to non-touched column post-image code. Also uses the pre-image value extractor to handle non-atomic sets properly. Tests updated.	2020-04-08 13:48:54 +02:00
Tomasz Grabiec	55240e9db2	Merge "Fix open-ended tombstone issues in alternator" from Piotr Sarna This miniseries provides workarounds for open-ended range tombstones reportedly appearing in alternator tables. The issue was that row tombstones created for tables without clustering keys look like open-ended range tombstones, which confuses the LA/KA format writer. Tests: alternator-test(local) Fixes #6035 Refs #6157	2020-04-08 13:43:40 +02:00
Pavel Solodovnikov	3206c1bf66	paxos_state: introduce error injections for testing timeouts in paxos stages The following sleep injections are added to paxos_state: * paxos_state_prepare_timeout (timeouts in paxos_state::prepare) * paxos_state_accept_timeout (timeouts in paxos_state::accept) * paxos_state_learn_timeout (timeouts in paxos_state::learn) Tests: unit ({dev}), unit ({debug}) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200403092107.181057-1-alejo.sanchez@scylladb.com>	2020-04-08 10:47:15 +02:00
Piotr Sarna	a4da07f8b3	alternator-test: mark identical gsi test as skipped Creating an index on a table with only the partition key can lead to open-ended range tombstones appearing, if the indexed column is also the very same partition key - which is quite a useless case, but it's allowed both by alternator and DynamoDB. In order to make the tests pass when KA/LA sstables are used, this test case is hereby skipped until further notice. Refs #6157	2020-04-08 08:11:39 +02:00
Piotr Sarna	0a2d7addc0	alternator: use partition tombstone if there's no clustering key As @tgrabiec helpfully pointed out, creating a row tombstone for a table which does not have a clustering key in its schema creates something that looks like an open-ended range tombstone. That's problematic for KA/LA sstable formats, which are incapable of writing such tombstones, so a workaround is provided in order to allow using KA/LA in alternator. Fixes #6035	2020-04-08 08:08:45 +02:00
Glauber Costa	54a0928a85	systemd: disable start timeout I am about to change resharding to block the start of the node. Being a somewhat slow operation, the timeout of 900 sec is guaranteed to trigger in large nodes with lots of data. This patch effectively disables the start timeout, while keeping the stop timeout unchanged. My preference would have been to use a timeout extension mechanism during resharding. Systemd actually has such mechanism, where we can send a message through sd_notify asking the timeout to be extended. However such mechanism is not present in SystemD v219, used by RHEL7. That means for RHEL7 we need a different way to deal with the timeout anyway. The second preference is also obviously to write "infinity" as the timeout value. But guess what? SystemD v219 also has a bug in which infinity is interepreted as zero (https://bugzilla.redhat.com/show_bug.cgi?id=1446015) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200407155754.10020-1-glauber@scylladb.com>	2020-04-08 08:14:35 +03:00
Botond Dénes	e17d8af3c6	compound_compat: composite::iterator cover error-paths with on_internal_error() But only non-validation error paths. When validating we do expect it to maybe fail, so we don't want to generate cores for validation. Validation is in fact a de-serialization pass with some additional checks. To be able to keep reusing the same code for de-serialization and validation just with different error handling, introduce a `strict_mode` flag that can be passed to `composite::iterator` constructor. When in strict mode (the default) the iterator will convert any `marshal_exception` thrown during the de-serialization to `on_internal_error()`. We don't want anybody to use the iterator in non-strict mode, besides validation, so the iterator constructors are made private. This is standard practice for iterators anyway.	2020-04-07 13:18:03 +03:00
Botond Dénes	16246d1c99	frozen_schema: make freezing constructor explicit Freezing is an expensive operation, that involves serializing the entire mutation. Having an implicit freezing constructor means this can happen as part of an implicit type conversion without the programmer even noticing, even when this is not really necessary. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200407080245.234021-1-bdenes@scylladb.com>	2020-04-07 12:00:36 +03:00
Botond Dénes	e0e9b6d9b0	compound_compat: composite_view: add is_valid() Until now this was open-coded in `sstables::validate_min_max_metadata()`. We want to cover non-validation compound de-serialization error-paths with `on_internal_error()` and so we need more control over how compounds are validated. As a first step we want to centralize validation in the class itself as in the next patches they will use private APIs to bypass `on_internal_error()` in the error paths during validation.	2020-04-07 11:45:45 +03:00
Benny Halevy	89b3974e56	sstables: print invalid boundary type as unsigned int Otherwise it prints a binary value to the log and corrupting it. Seen when testing scrub with randomly-corrupted sstable using scrub_with_one_node_expect_data_loss_test as of https://github.com/scylladb/scylla-dtest/pull/1414 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200407055617.1045977-1-bhalevy@scylladb.com>	2020-04-07 10:18:19 +02:00
Benny Halevy	a20c85713b	storage_proxy: paxos_response_handler::prune: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200405115046.733450-2-bhalevy@scylladb.com>	2020-04-07 08:47:38 +03:00
Benny Halevy	4e37aee8a2	storage_proxy: paxos_response_handler::prune: no need for futurize_apply parallel_for_each already futurize_invoke's the lambda passed to it since seastar commit c5e158e5f173e25a62308997a3da4348053b2a0f Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200405115046.733450-1-bhalevy@scylladb.com>	2020-04-07 08:47:38 +03:00
Raphael S. Carvalho	044f80b1b5	cql3: don't reset default TTL when not explicitly specified in alter table statement Any alter table statement that doesn't explicitly set the default time to live will reset it to 0. That can be very dangerous for time series use cases, which rely on all data being eventually expired, and a default TTL of 0 means data never being expired. Fixes #5048. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200402211653.25603-1-raphaelsc@scylladb.com>	2020-04-07 08:47:38 +03:00
Avi Kivity	0bc90756db	tools: toolchain: add note explaining how to use podman to build images podman is compatible with docker, but by default emits a manifest format that is not understood by old docker clients, so give it an extra flag to generate the old format instead. Message-Id: <20200406134526.21521-1-avi@scylladb.com>	2020-04-07 08:47:38 +03:00
Glauber Costa	80f414ed6e	sstables: restore ident Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200401162722.28780-3-glauber@scylladb.com>	2020-04-06 16:02:31 +03:00
Glauber Costa	463d0ab37c	compaction: move rewrite_sstables to the compaction_manager There is no reason why the table code has to be aware of the efforts of rewriting (cleanup, scrub, upgrade) an SSTable versus compacting it. Rewrite is special, because we need to do it one SSTable at a time, without lumping it together. However, the compaction manager is totally capable of doing that itself. If we do that, the special "table::rewrite_sstables" can be killed. This code would maybe be better off as a thread, where we wouldn't need to keep state. However there are some methods like maybe_stop_on_error() that expect a future so I am leaving this be for now. This is a cleanup that can be done later. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200401162722.28780-2-glauber@scylladb.com>	2020-04-06 16:02:30 +03:00
Nadav Har'El	ac43a9e2aa	merge: Fix generating base keys from empty indexing paging state Merged pull request https://github.com/scylladb/scylla/pull/6136 from Piotr Sarna: An empty partition/clustering key pair is a valid state of the query paging state. Unfortunately, recent attempts at debugging a flaky test (#5856) resulted in introducing an assertion (`7616290`) which breaks when trying to generate a key from such a pair. In order to keep the assertion (since it still makes sense in its scope), but at the same time translate empty keys properly, empty keys are now explicitly processed at the beginning of the function. This behaviour was 100% reproducible in a secondary index dtest below. Fixes #6134 Refs #5856 Tests: unit(dev), dtest(TestSecondaryIndexes.test_truncate_base)	2020-04-06 15:23:39 +03:00
Takuya ASADA	3ce6cdc6d8	install.sh: suppoprt --upgrade To use install.sh as Scylla install script w/o using .rpm/.deb package, we need to provide a way to upgrade Scylla version, not just install. With --upgrade option, install.sh does not overwrite config files. It will install <filename>.new file on same directory, when old config file and new config file does not contain same data. If old one and new one is exactly same, it will nothing. To implement this, rewriting api_ui_dir/api_doc_dir path on scylla.yaml moved from .rpm/.deb scriptlet to install.sh. Fixes #5874	2020-04-06 15:07:28 +03:00
Takuya ASADA	5f18964763	dist/common/scripts/scylla_coredump_setup: bind-mount coredump directory, add coredump test On some environment systemd-coredump does not work with symlink directory, we can use bind-mount instead. Also, it's better to check systemd-coredump is working by generating coredump. To fix #5916, drop scylla_coredump_setup from .rpm %post scriptlet. Fixes #5753 Fixes #5916	2020-04-06 15:03:11 +03:00
Avi Kivity	e9e2b75a76	Merge "Allow Major compactions for TWCS" from Glauber " This patch makes makes major compaction aware of time buckets for TWCS. That means that calling a major compaction with TWCS will not bundle all SSTables together, but rather split them based on their timestamps. There are two motivations for this work: Telling users not to ever major compact is easier said than done: in practice due to a variety of circumstances it might end up being done in which case data will have a hard time expiring later. We are about to start working with offstrategy compactions, which are compactions that work in parallel with the main compactions. In those cases we may be converting SSTables from one format to another and it might be necessary to split a single big STCS SSTable into something that TWCS expects In order to achieve that, we start by changing the way resharding works: it will now work with a read interposer, similar to the one TWCS uses for streaming data. Once we do that, a lot of assumptions that exist in the compaction code can be simplified and supporting TWCS major compactions become a matter of simply enabling its interposer in the compaction code as well. There are many further simplifications that this work exposes: The compaction method create_new_sstable seems out of place. It is not used by resharding, and it seems duplicated for normal compactions. We could clean it up with more refactoring in a later patch. The whole logic of the feed_writer could be part of the consumer code. Testing details: scylla unit tests (dev, release) sstable_datafile_test (debug) dtests (resharding_test.py) manual scylla resharding Fixes #1431 " Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> * 'twcs-major-v3' of github.com:glommer/scylla: compaction: make major compaction time-aware with TWCS compaction: do resharding through an interposer mutation_writer: introduce shard_based splitting writer mutation_writer: factor out part of the code for the timestamp splitter compaction: abort if create_new_sstable is called from resharding	2020-04-06 12:54:08 +03:00
Gleb Natapov	e5f7ccc4c8	lwt: fix possible leak of "prune" counter If get_schema_for_read() fails "prune" counter will not be decremented. The patch fixes it by creating RAI object earlier. Also return releasing of a mutation in release_mutation() which was dropped by mistake. Fixes #6124 Message-Id: <20200405080233.GA22509@scylladb.com>	2020-04-06 11:30:38 +02:00
Nadav Har'El	d9d50362af	alternator: remove mentions of experimental status of LWT Since commit `9948f548a5`, the LWT no longer requires an "experimental" flag, so Alternator documents and scripts which referred to the need for enabling experimental LWT, are fixed here to no longer do that. Fixes #6118. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200405143237.12693-1-nyh@scylladb.com>	2020-04-06 12:12:08 +03:00
Piotr Sarna	8fea5075f2	test: fix manual gossip test When trying to get rid of a large stack warning for gossip test, I found out that it actually does not run at all for multiple reasons: 1. It segfaults due to wrong initialization order 2. After fixing that, it segfaults on use-after-free (due to capturing a shared pointer by reference instead of by copy) 3. After that, cleanups are in order: * seastar thread does not need to be spawned inside another thread; * default captures are harmful, so they're made explicit instead; * db::config is moved to heap, to finally get rid of the warning. Tests: manual(gossip) Message-Id: <feaca415d0d29a16c541f9987645365310663630.1585128338.git.sarna@scylladb.com>	2020-04-06 11:07:10 +02:00
Piotr Sarna	88913e9d44	test: add cases for empty paging state for index queries In order to check regressions related to #6136 and similar issues, test cases for handling paging state with empty partition/clustering key pair are added.	2020-04-06 08:59:40 +02:00
Piotr Sarna	45751ee24f	cql3: fix generating base keys from empty index paging state An empty partition/clustering key pair is a valid state of the query paging state. Unfortunately, recent attempts at debugging a flaky test resulted in introducing an assertion which breaks when trying to generate a key from such a pair. In order to keep the assertion (since it still makes sense in its scope), but at the same time translate empty keys properly, empty keys are now explicitly processed at the beginning of the function. This behaviour was 100% reproducible in a secondary index dtest below. Fixes #6134 Refs #5856 Tests: unit(dev), dtest(TestSecondaryIndexes.test_truncate_base)	2020-04-06 07:49:06 +02:00
Avi Kivity	4e6f543676	tools: toolchain: use "docker build --pull" in instructions for building an image Specify --pull in order to refresh the base image (some Fedora release). Usually this is not important, because we run `dnf update`. But if the cached image happens to be a pre-release version of Fedora, the image will have the update-testing repository enabled, and we may get some unwanted updates. It's sad that we need two separate flags for correctness (the other is --no-cache. Message-Id: <20200405164227.8210-1-avi@scylladb.com>	2020-04-05 19:48:25 +03:00
Piotr Sarna	0bb211a65f	alternator: defuse a serialization path time bomb The default serialization path for items was subtly broken - instead of parsing JSON string representation of objects, it tried to parse a regular string implementation - which is often also a valid JSON, but nothing guarantees that it actually is. Tests: alternator-test(local) Message-Id: <e1668bf4e9029f2675a4ac28bb4598714575efeb.1586096732.git.sarna@scylladb.com>	2020-04-05 18:55:54 +03:00
Nadav Har'El	c1a7a071ea	merge: Remove most inclusions of reactor.hh Merged patch series from Avi Kivity: This patchset removes most inclusions of reactor.hh, by switching to new namespace-scoped API:s instead of those using engine() as a way to get the reactor. With this, we are down to 12 translation units depending on reactor.hh, mostly for deprecated API:s like reactor::at_exit(). Avi Kivity (3): logalloc: use namespace-scope seastar::idle_cpu_handler and related rather than reactor scope test: sstable-utils: deinline do_make_keys() treewide: replace calls to engine().some_api() with some_api() configure.py \| 14 +++----- auth/common.hh \| 3 +- checked-file-impl.hh \| 4 +-- db/system_keyspace_view_types.hh \| 2 +- flat_mutation_reader.hh \| 1 + lister.hh \| 2 +- message/messaging_service.hh \| 2 +- redis/server.hh \| 2 +- sstables/compress.hh \| 2 +- sstables/integrity_checked_file_impl.hh \| 2 +- test/lib/sstable_utils.hh \| 35 ++++--------------- test/lib/test_services.hh \| 2 +- thrift/server.hh \| 2 +- transport/server.hh \| 2 +- utils/error_injection.hh \| 3 +- utils/joinpoint.hh \| 2 +- utils/loading_cache.hh \| 2 +- utils/logalloc.hh \| 6 ++-- utils/rate_limiter.hh \| 2 +- api/system.cc \| 1 + auth/default_authorizer.cc \| 2 +- auth/password_authenticator.cc \| 2 +- database.cc \| 1 + db/commitlog/commitlog.cc \| 4 +-- db/hints/resource_manager.cc \| 3 +- db/system_distributed_keyspace.cc \| 2 +- dht/i_partitioner.cc \| 2 +- gms/feature_service.cc \| 3 +- lister.cc \| 4 +-- locator/ec2_snitch.cc \| 3 +- locator/gce_snitch.cc \| 1 + main.cc \| 1 + reader_concurrency_semaphore.cc \| 2 +- redis/server.cc \| 4 +-- sstables/sstables.cc \| 11 +++--- table.cc \| 3 +- test/boost/commitlog_test.cc \| 2 +- test/boost/database_test.cc \| 2 +- test/boost/flush_queue_test.cc \| 2 +- test/boost/gossip_test.cc \| 2 +- .../gossiping_property_file_snitch_test.cc \| 1 + test/boost/loading_cache_test.cc \| 2 +- test/boost/sstable_3_x_test.cc \| 1 + test/boost/sstable_datafile_test.cc \| 1 + test/boost/sstable_test.cc \| 1 + test/lib/sstable_utils.cc \| 26 ++++++++++++++ test/manual/gossip.cc \| 2 +- test/manual/hint_test.cc \| 2 +- test/manual/sstable_scan_footprint_test.cc \| 2 +- test/perf/perf_mutation.cc \| 1 + test/perf/perf_row_cache_update.cc \| 1 + test/perf/perf_sstable.cc \| 1 + test/tools/cql_repl.cc \| 2 +- thrift/server.cc \| 2 +- transport/server.cc \| 4 +-- utils/config_file.cc \| 3 +- utils/file_lock.cc \| 2 +- utils/logalloc.cc \| 14 ++++---- utils/updateable_value.cc \| 2 +- 59 files changed, 119 insertions(+), 98 deletions(-)	2020-04-05 13:47:39 +03:00
Nadav Har'El	dcfdd917e1	merge: Guard against potential races in view builder Merge patch series from Piotr Sarna: This series adds extra precautions against potential races in view building. In particular, it was based on the following scenario: 1. View builder detects that a view V is no longer here, so it schedules removing its info from bookkeeping, without any semaphores, and this continuation gets preempted immediately. 2. A view is deleted and recreated with the same name - V. 3. View V building is finished. 4. The continuation from (1.) is finally executed, and it removes old view V info from bookkeeping - which is a problem, since view building bookkeeping is based on names, not uuids - consequently, the new view bookkeeping info is erroneously removed. The issue is solved by putting startup code (which also does cleanup from point (1.)) under the same semaphore as other bookkeeping operations. With that, it will be impossible to execute step (2.) before (1.) ends, which effectively prevents the race. Refs #6094 (possible fixes it too, but since I could not reproduce the issue...) Tests: unit(dev) Piotr Sarna (4): db,view: fix waiting for a view building future db,view: remove unneeded implicit capture-by-reference db,view: nitpick: change & operator to && for booleans db,view: guard view builder startup with a semaphore db/view/view.cc \| 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)	2020-04-05 13:19:23 +03:00
Avi Kivity	88ade3110f	treewide: replace calls to engine().some_api() with some_api() This removes the need to include reactor.hh, a source of compile time bloat. In some places, the call is qualified with seastar:: in order to resolve ambiguities with a local name. Includes are adjusted to make everything compile. We end up having 14 translation units including reactor.hh, primarily for deprecated things like reactor::at_exit(). Ref #1	2020-04-05 12:46:04 +03:00
Avi Kivity	5e32ecb514	test: sstable-utils: deinline do_make_keys() This hides a call to engine_is_ready() which is only available in reactor.hh. Dependencies are adjusted so tests link. Ref #1.	2020-04-05 12:46:04 +03:00
Avi Kivity	1799cfa88a	logalloc: use namespace-scope seastar::idle_cpu_handler and related rather than reactor scope This allows us to drop a #include <reactor.hh>, reducing compile time. Several translation units that lost access to required declarations are updated with the required includes (this can be an include of reactor.hh itself, in case the translation unit that lost it got it indirectly via logalloc.hh) Ref #1.	2020-04-05 12:45:08 +03:00
Piotr Sarna	1a9083b342	db,view: guard view builder startup with a semaphore The startup routine performs some bookkeeping operations on views, and so do these events: - on_create_view; - on_drop_view; - on_update_view. Since the above events are guarded with a semaphore, the startup routine should also take the same semaphore - in order to ensure that all bookkeeping operations are serialized. Refs #6094	2020-04-05 11:41:26 +02:00
Piotr Sarna	8da4a5b78c	db,view: nitpick: change & operator to && for booleans Although it's technically correct to use the bitwise and operator on booleans as well, it's slightly confusing for the reader.	2020-04-05 11:41:25 +02:00
Piotr Sarna	e49805b7b8	db,view: remove unneeded implicit capture-by-reference The lambda does not use any other captures, so it does not to implicitly capture anything by reference.	2020-04-05 11:41:25 +02:00
Piotr Sarna	3f19865493	db,view: fix waiting for a view building future The future was marked with a `FIXME: discarded future`, but there's really no reason not to wait for it, and it was probably meant to be waited for since its implementation.	2020-04-05 11:41:25 +02:00
Piotr Sarna	76969ea619	test: move config to heap in gossip_test ... in order to get rid of a large stack warning. Tests: unit(dev) Message-Id: <da4349b89554265ec419544b63ce084eab25ac0f.1586068467.git.sarna@scylladb.com>	2020-04-05 10:18:14 +03:00
Rafael Ávila de Espíndola	c59a307f17	table_helper: Use CanInvoke instead of CanApply The CanApply predicate is deprecated. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200403225907.7910-1-espindola@scylladb.com>	2020-04-05 08:36:29 +02:00
Tomasz Grabiec	df48b5ec9d	gossip: Fix a confusing parameter name Message-Id: <1585940635-1194-1-git-send-email-tgrabiec@scylladb.com>	2020-04-05 08:24:51 +02:00
Piotr Jastrzebski	a15b32c9d9	token: relax the condition of the sanity check When we switched token representation to int64_t we added some sanity checks that byte representation is always 8 bytes long. It turns out that for token_kind::before_all_keys and token_kind::after_all_keys bytes can sometimes be empty because for those tokens they are just ignored. The check introduced with the change is too strict and sometimes throws the exception for tokens before/after all keys created with empty bytes. This patch relaxes the condition of the check and always uses 0 as value of _data for special before/after all keys tokens. Fixes #6131 Tests: unit(dev, sct) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-04-04 15:50:10 +03:00
Rafael Ávila de Espíndola	4db4237310	configure: Delete dead options These options are not used anywhere. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200403173458.119939-1-espindola@scylladb.com>	2020-04-04 14:52:24 +03:00
Rafael Ávila de Espíndola	a10bdb17b3	user_function_test: Test UDF without the corresponding experimental flag The existing test was not using the db::config it was creating. Use it and test the produced exception. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200403170235.113558-2-espindola@scylladb.com>	2020-04-03 20:00:24 +02:00
Rafael Ávila de Espíndola	3f3634ece1	test: Use feature_config_from_db_config to setup feature_config This reduces code duplication and uses the same code path that is used in scylla itself. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200403170235.113558-1-espindola@scylladb.com>	2020-04-03 19:59:00 +02:00
Tomasz Grabiec	4578031bd6	Update seastar submodule * seastar 41c83ec...fd9af3a (7): > stall_detector: Delete unused member variable > future: Avoid a move in finally_body > Merge "Followup cleanups for the apply/invoke split" from Rafael > Merge "make trivial future related functions noexcept" from Benny > rpc_test: silence depreceted lambda logger warning > rpc_demo: stop using variadic futures > future: Move two static_asserts to the top	2020-04-03 19:48:00 +02:00
Botond Dénes	9e1d6ada0f	types: compare(): cover more paths with on_internal_error() Currently we call `on_internal_error()` if `tri_compare()` throws `marshal_exception`. Some compare paths however might go around `tri_compare()` and call `abstract_type::compare()` directly. Move the check there to cover these cases too. Tests: dev Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200403162530.1175801-1-bdenes@scylladb.com>	2020-04-03 18:35:30 +02:00
Rafael Ávila de Espíndola	8d0e40e37b	service: Replace engine().cpu_id() with this_shard_id() Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200403160915.59481-1-espindola@scylladb.com>	2020-04-03 18:18:25 +02:00
Rafael Ávila de Espíndola	891f3f44ee	tombstone: Move can_gc_fn to a .cc This reduces the total size reported by $ find . -name *.hh.o \| xargs du -bc by 1.3%, from 49911928 to 49249680 bytes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200403153241.34400-1-espindola@scylladb.com>	2020-04-03 18:17:31 +02:00
Glauber Costa	098b215b0d	compaction: make major compaction time-aware with TWCS This patch makes makes major compaction aware of time buckets for TWCS. That means that calling a major compaction with TWCS will not bundle all SSTables together, but rather split them based on their timestamps. There are two motivations for this work: 1. Telling users not to ever major compact is easier said than done: in practice due to a variety of circumstances it might end up being done in which case data will have a hard time expiring later. 2. We are about to start working with offstrategy compactions, which are compactions that work in parallel with the main compactions. In those cases we may be converting SSTables from one format to another and it might be necessary to split a single big STCS SSTable into something that TWCS expects With the motivation out of the way, let's talk about the implementation: The implementation is quite simple and builds upon the previous patches. It simply specializes the interposer implementation for regular compaction with a table-specific interposer. Fixes #1431 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-03 10:10:10 -04:00
Glauber Costa	55a8b6e3c9	compaction: do resharding through an interposer Our resharding code is complex, since the compaction object has to keep track of many output SSTables, the current shard being written. When implementing TWCS streaming writers, we ran away from such write-side complexity by implementing an interposer: the interposer consumes the flat_mutation_reader stream, creating many different writer streams. We can do a similar thing for resharding SSTables and have each writer be guaranteed to contain keys for only a specific source shard. As we do that, we can move the SSTable and sstable_writer information to the compacting_sstable_writer object. The compaction object will no longer be responsible for it and can be simplified, paving the way for TWCS-major, which will go through an interposer as well. Note that the compaction_writer, which now holds both the SSTable pointer and the sstable_writer still needs to be optional. This is because LCS (and potentially others) still want to create more than one SSTable per source stream. That is done to guarantee that each SSTable complies with the max_sstable_size parameter, which is information available in the sstable_writer that is not present at the level of the flat_mutation_reader. We want to keep it in the writer side. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-03 10:10:10 -04:00
Pavel Emelyanov	86296ba557	main: Do not destroy token_metadata The storage_proxy instances hold references to token_metadata ones and leave unwaited futures continuing to its query_partition_key_range_concurrent method. The latter is called from do_query so it's not that easy to find out who is leaking. Keep the tokens not freed for a while. Fixes: #6093 Test: manual start-stop Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200402183538.9674-1-xemul@scylladb.com>	2020-04-03 16:00:08 +02:00
Rafael Ávila de Espíndola	8da235e440	everywhere: Use futurize_invoke instead of futurize<T>::invoke No functionality change, just simpler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200330165308.52383-1-espindola@scylladb.com>	2020-04-03 15:53:35 +02:00
Gleb Natapov	36a24bbb70	storage_proxy: limit read repair only to replicas that answered during speculative reads Speculative reader has more targets that needed for CL. In case there is a digest mismatch the repair runs between all of them, but that violates provided CL. The patch makes it so that repair runs only between replicas that answered (there will be CL of them). Fixes #6123 Reviewed-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200402132245.GA21956@scylladb.com>	2020-04-02 17:32:08 +03:00
Avi Kivity	a6156a9caf	build: make headers check compatible with distcc distcc doesn't like the -x c++ flag, so create an empty.cc file for this purpose and compile it. Also drop the "=" from "--include=", which is also disliked by distcc. Message-Id: <20200402124312.48963-1-avi@scylladb.com>	2020-04-02 16:39:30 +03:00
Glauber Costa	8fe10863f4	mutation_writer: introduce shard_based splitting writer This is similar to the timestamp based splitting writer, except that it splits data based on the shard where the partition key is supposed to be placed. It is similar to the multishard_writer, in the sense that it creates n streams for n shards, but it does not want to process the streams in the owner shards. We want to use that in processes like resharding where it is fine for a foreign shard to deal with a mutation. One option would be to augment the multishard_writer to optionally achieve these properties, but having a separate splitter is both simpler and faster. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-02 08:55:16 -04:00
Glauber Costa	a258f111c7	mutation_writer: factor out part of the code for the timestamp splitter I am about to introduce a new splitter. Therefore, move parts of it that are common to its own file. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-02 08:55:16 -04:00
Glauber Costa	a2d7a9c230	compaction: abort if create_new_sstable is called from resharding I am about to get rid of the _shard attribute in the compaction object, as I will create different streams of writers for different shards. In preparation for that, remove the arbitrary _shard reference. Raphael confirms that resharding should never be calling this, as this method is used exclusively for garbage collection component of run-based compaction. Therefore we'll just throw in this case and remove the shard reference. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-02 08:55:16 -04:00
Glauber Costa	375cb8a32b	compaction: pass current shard to sstable creation function The shard parameter is ignored for SSTable creation on regular compaction. It is still good practice and good future proofing to pass something meaningful here instead of zero. This patch passes the id of the current shard. Thanks Botond for pointing that out. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200402122212.12218-1-glauber@scylladb.com>	2020-04-02 14:43:35 +02:00
Botond Dénes	240b5e0594	frozen_schema: key() remove unused schema parameter Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200402092249.680210-1-bdenes@scylladb.com>	2020-04-02 14:43:35 +02:00
Pekka Enberg	75b55cea88	Merge "Resharding through compact sstables" from Glauber " This patchseries is part of my effort to make resharding less special - and hopefully less problematic. The next steps are a bit heavy, so I'd like to, if possible, get this out of the way. After these two patches, there is no more need to ever call reshard_sstables: compact_sstables will do, and it will be able to recognize resharding compactions. To do that we need to unify the creator function, which is trivially done by adding a shard parameter to regular compactions as well: they can just ignore it. I have considered just making the compaction_descriptor have a virtual create() function and specializing it, but because we have to store the creator in the compaction object I decided to keep the virtual function for now. In a later cleanup step, if we can for instance store the entire compaction_descriptor object in the compaction object we could do that. Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Botond Dénes <bdenes@scylladb.com> Tests: unit tests (dev), dtest (resharding.py) " * 'resharding-through-compact-sstables' of github.com:glommer/scylla: resharding: get rid of special reshard_sstables compaction: enhance compaction_descriptor with creator and replace function	2020-04-02 14:43:35 +02:00
Pekka Enberg	43b488a7bc	Revert "schema: Default dc_local_read_repair_chance to zero" This reverts commit `fdd2d9de3d` because it breaks one heat-weighted load balancing dtest: FAIL: heat_weighted_load_balancing_cl_QUORUM_test (heat_weighted_load_balancing_test.HeatWeightedLB) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/penberg/src/scylla/scylla-dtest/heat_weighted_load_balancing_test.py", line 182, in heat_weighted_load_balancing_cl_QUORUM_test self.run_heat_weighted_load_balancing('QUORUM') File "/home/penberg/src/scylla/scylla-dtest/heat_weighted_load_balancing_test.py", line 165, in run_heat_weighted_load_balancing self.verify_metrics(metrics, cached=False) File "/home/penberg/src/scylla/scylla-dtest/heat_weighted_load_balancing_test.py", line 73, in verify_metrics mean_avg, node_mean_avg, key)) AssertionError: 19.0 not found in range(3, 13) : Cache difference between nodes is less then expected: 6469.6/328.2, metric scylla_storage_proxy_coordinator_reads_local_node I am reverting because it's a test issue, and we should bring this commit back once the test is fixed. Gleb Natapov explains: "dtest result directly depends on replicas we contact. Glauber's patch make us contacts less replicas, so numbers differ."	2020-04-02 13:43:29 +03:00
Nadav Har'El	55f02c00f2	alternator-test: run: use the Python driver, not cqlsh The "run" script for the Alternator tests needs to set a system table for authentication credentials, so we can test this feature. So far we did this with cqlsh, but cqlsh isn't always installed on build machines. But install-dependencies.sh already installs the Cassandra driver for Python, so it makes more sense to use that, so this patch switches to use it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200331131522.28056-1-nyh@scylladb.com>	2020-04-02 13:43:29 +03:00
Nadav Har'El	8627ae42a6	install-dependencies.sh: add dependencies for Alternator tests To run Alternator tests, only two additional dependencies need to be added to install-dependencies.sh: pytest, and python3-boto3. We also need python3-cassandra-driver, but this dependency is already listed. This patch only updates the dependencies for Fedora, which is what we need for dbuild and our Jenkins setups. Tested by building a new dbuild docker image and verifying that the Alternator tests pass. Signed-off-by: Nadav Har'El <nyh@scylladb.com> [avi: update toolchain image; note this upgrades gcc to 9.3.1] Message-Id: <20200330181128.18582-1-nyh@scylladb.com>	2020-04-02 13:43:16 +03:00
Piotr Sarna	b3fdb742ae	cql3,index: add panic checks to base key generation In order to be extra sure that we always generate proper base partition/clustering keys from paging info when executing an indexed query, additional checks are added - if any of them triggers, an exception will be thrown. Created in order to help debug an existing issue: Refs #5856 Tests: unit(dev)	2020-04-01 18:27:07 +03:00
Gleb Natapov	4d9d226596	lwt: fix cas_now_pruning counter Due to c&p error cas_now_pruning counter is increased instead of decreased after an operation completes. Fix it. Fixes #6116 Message-Id: <20200401142859.GA16953@scylladb.com>	2020-04-01 17:18:33 +02:00
Alejo Sanchez	3a4dd0a856	utils: error injection inject() returning a future Make inject() return a future. Suggested by Gleb. Botond helped on dealing with complex function/lambda overload. Refs #3295 (closed) Tests: unit ({dev}) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200331143839.1781424-7-alejo.sanchez@scylladb.com>	2020-04-01 16:22:52 +02:00
Alejo Sanchez	8bae38cef9	utils: error injection support multiple clocks Use template to support multiple clock classes for time point for deadline injection. Refs: #3295 (closed) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200331143839.1781424-6-alejo.sanchez@scylladb.com>	2020-04-01 16:22:45 +02:00
Alejo Sanchez	71f2f423bc	utils: error injection reorder args for exceptions Move exception factory to end of argument list. Refs: #3295 (closed) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200331143839.1781424-5-alejo.sanchez@scylladb.com>	2020-04-01 16:22:38 +02:00
Alejo Sanchez	fd1eb6a466	utils: error injection simplify API Split error injection C++ API to have 1. sleep duration 2. sleep to deadline (timeout) TODO: support multiple types of clocks Refs: #3295 (closed) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200331143839.1781424-4-alejo.sanchez@scylladb.com>	2020-04-01 16:22:30 +02:00
Avi Kivity	5671b3d7d3	Update seastar sudmodule * seastar 36e8dfc89...41c83ec55 (3): > api: add file_type() global function > json: Add backtrace information for json generation exceptions > scheduling: avoid defining friend namespace qualified function scheduling_group_key_id()	2020-04-01 11:16:30 +03:00
Konstantin Osipov	9948f548a5	lwt: remove Paxos from experimental list Always enable lightweight transactions. Remove the check for the command line switch from the feature service, assuming LWT is always enabled. Remove the check for LWT from Alternator. Note that in order for the cluster to work with LWT, all nodes need to support it. Rename LWT to UNUSED in db/config.hh, to keep accepting lwt keyword in --experimental-features command line option, but do nothing with it. Changes in v2: * remove enable_lwt feature flag, it's always there Closes #6102 test: unit (dev, debug) Message-Id: <20200401071149.41921-1-kostja@scylladb.com>	2020-04-01 09:12:21 +02:00
Glauber Costa	87dd23db03	compaction: use a larger min_threshold during bootstrap, replace During bootstrap and replace operations the node can't take reads and we'd like to see the process ending ASAP. This is because until the process ends, we keep having to duplicate writes to an extended set. Not to mention, in the case of a cluster expansion users want to use the added capacity sooner rather than later. Streaming generates a lot of compaction activity, that competes with the bootstrap itself, slowing it down. Long term, we are moving to treat those compactions differently and maybe postpone them altogether. However for now we can reduce the amount of compactions by increasing the minimum threshold of SSTables that have to accumulate before they are selected for compactions. The default is 2, meaning we will trigger a compaction every time 2 SSTables of about the same size are found (for STCS, others follow a similar pattern). Until we have offstrategy infrastructure we don't want the compactions to stop happening altogether so the reads, when they start, don't suffer. This patch sets the minimum threshold to 16 (for the default max_threshold of 32), meaning we will generate a lot less compaction activity during streaming. Once streaming is done we revert it to its original. Unfortunately there isn't much we can do at the moment about decommission. During decommission the nodes receiving data are also taking reads and we don't want SSTables to accumulate. Fixes #5109 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-01 10:06:27 +03:00
Glauber Costa	fdd2d9de3d	schema: Default dc_local_read_repair_chance to zero dc_local_read_repair_chance is a legacy of old times: Cassandra itself now defaults to zero, and we should look into that too. Most serious production clusters are either repaired through our asynchronous repair, or don't need repair at all. Synchronous read repair can help things converging, but it implies an impact at query time. For clusters that are on an asynchronous repair schedule this should not be needed. Fixes #6109 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200331183418.21452-1-glauber@scylladb.com>	2020-04-01 08:27:49 +02:00
Glauber Costa	05efd6a5e9	resharding: get rid of special reshard_sstables There is a method, reshard_sstables(), whose sole purpose is to call a resharding compaction. There is nothing special about this method: all the information it needs is now present in the compaction_descriptor. This patch extend the compaction_options class to recognize resharding compactions as well, and uses that so that make_compaction() can also create resharding compactions. To make that happen we have to create a compaction_descriptor object in the resharding method. Note however that resharding works by passing an object very close to the compaction_descriptor around. Once this patch is merged, a logical next step is to reuse it, and avoid creating the descriptor right before calling compact_sstables(). Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-03-31 19:57:53 -04:00
Glauber Costa	e8801cd77b	compaction: enhance compaction_descriptor with creator and replace function There are many differences between resharding and compaction that are artificial, arising more from the way we ended up implementing it than necessity. This patch attempts to pass the creator and replacer functions through the compaction_descriptor. There is a difference between the creator function for resharding and regular compaction: resharding has to pass the shard number on behalf of which the SSTable is created. However regular compactions can just ignore this. No need to have a special path just for this. After this is done, the constructor for the compaction object can be greatly simplified. In further patches I intend to simplify it a bit further, but some more cleanup has to happen first. To make that happen we have to construct a compaction_descriptor object inside the resharding function. This is temporary: resharding currently works with a descriptor, but at some point that descriptor is lost and broken into pieces to be passed to this function. The overarching goal of this work is exactly to be able to keep that descriptor for as long as possible, which should simplify things a lot. Callers are patched, but there are plenty for sstable_datafile_test.cc. For their benefit, a helper function is provided to keep the previous signature (test only). Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-03-31 19:41:25 -04:00
Avi Kivity	dee0b68347	Merge 'Separate sharding and partitioning logic' from Piotr J " Currently, both sharding and partitioning logic is encapsulated into partitioners. This is not desirable because these two concepts are totally independent and shouldn't be coupled together in such a way. This PR separates sharding and partitioning. Partitioning will still live in i_partitioner class and its subclasses. Sharding is extracted to a new class called sharding_info. Both partitioners and sharding_info are still managed by schema class. Partitioner can be accessed with schema::get_partitioner while sharding_info can be accessed with schema::get_sharding_info. The transition is done in steps: 1. sharding_info class is defined and all the sharding logic is extracted from partitioner to the new class. Temporarily sharding_info is still embedded into i_partitioner and all sharding related functions in i_partitioner call delegate to the embedded sharding_info object. 2. All calls to i_partitioner functions that are related to sharding are gradually switched to calls to sharding_info equivalents. sharding_info. 3. Once everything uses sharding_info, all sharding logic is dropped from i_partitioner. Tests: unit(dev, release) " * haaawk-sharding_info: (32 commits) dummy_sharder: rename dummy_sharding_info.* to dummy_sharder.* sharding_info: rename the class to sharder i_partitioner:remove embeded sharding_info i_partitioner: remove unused get_sharding_info schema: remove incorrect comment schema: make it possible to set sharding_info per schema i_partitioner: remove unused shard_count multishard_writer: stop calling i_partitioner::shard_count i_partitioner: remove sharding_ignore_msb partitioner_test: test ranges and sharding_infos i_partitioner: remove unused split_ranges_to_shards i_partitioner: remove unused shard_of function sstable-utils: use sharding_info::shard_of create_token_range_from_keys: use sharding info for shard_of multishard_mutation_query_test: use sharding info for shard_of distribute_reader_and_consume_on_shards: use sharding_info::shard_of multishard_mutation_query: use sharding_info::shard_of dht::shard_of: use schema::get_sharding_info i_partitioner: remove unused token_for_next_shard split_range_to_single_shard: use sharding info instead of partitioner ...	2020-03-31 13:40:51 +03:00
Alejo Sanchez	4a3b98facc	utils: error injection fix deadline test timeout Rafael reported test_inject_future_sleep_timeout_short failed sometimes as limit is too close. Bump limit. Refs #3295 (closed) Repro: ./test.py --mode=dev -v boost/error_injection_test --repeat 300 Tests: unit ({dev}) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200328204454.1326514-3-alejo.sanchez@scylladb.com>	2020-03-31 11:58:38 +02:00
Alejo Sanchez	e5a2ba32b9	utils: error injection allocate string for remote invoke Allocate string before sending to other shards. Reported by Pavel Solodovnikov. Refs #3295 (closed) Tests: unit ({dev}) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200328204454.1326514-2-alejo.sanchez@scylladb.com>	2020-03-31 11:58:27 +02:00
Nadav Har'El	fe6cecb26d	alternator-test: comment out an error-path test that doesn't work on newer boto3 Unfortunately, the boto3 library doen't allow us to check some of the input error cases because it unnecessarily tests its input instead of just passing it to Alternator and allowing Alternator to report the error. In this patch we comment out a test case which used to work fine - i.e., the error was reported by Alternator - until recent changes to boto3 made it catch the problem without passing it to Alternator :-( Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200330190521.19526-2-nyh@scylladb.com>	2020-03-31 07:58:01 +02:00
Nadav Har'El	db7cebd663	alternator-test: skip one test in test_tag.py if botocore is too old One of the Alternator tests in test_tag.py checks the feature of creating a table with a set of tags (as opposed to adding tags to an existing table). This is a relatively new DynamoDB feature, only added in April 2019, so if the botocore library is too old, it cannot test this feature, and we have to skip the test. Alternator developers should make an effort to keep the botocore library up-to-date and test the latest DynamoDB features, but it is less important if some test environments (like Jenkins) cannot verify this specific test until its distro gets updated - it is more important that the fast majority of the tests, which do not rely on very new features, get tested. After this patch, if running on Fedora 30 with python3-botocore-1.12.101-2.fc30.noarch installed, we get the following skip message: $ pytest-3 -rs test_tag.py ... test_tag.py ..s..x [100%] =================================================== short test summary info =================================================== SKIP [1] /home/nyh/scylla/test/alternator/test_tag.py:114: Botocore version 1.12.136 or above required to run this test Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200330190521.19526-1-nyh@scylladb.com>	2020-03-31 07:57:53 +02:00
Gleb Natapov	8a408ac5a8	lwt: remove entries from system.paxos table after successful learn stage The learning stage of PAXOS protocol leaves behind an entry in system.paxos table with the last learned value (which can be large). In case not all participants learned it successfully next round on the same key may complete the learning using this info. But if all nodes learned the value the entry does not serve useful purpose any longer. The patch adds another round, "prune", which is executed in background (limited to 1000 simultaneous instances) and removes the entry in case all nodes replied successfully to the "learn" round. It uses the ballot's timestamp to do the deletion, so not to interfere with the next round. Since deletion happens very close to previous writes it will likely happen in memtable and will never reach sstable, so that reduces memtable flush and compaction overhead. Fixes #5779 Message-Id: <20200330154853.GA31074@scylladb.com>	2020-03-30 21:02:14 +03:00
Piotr Jastrzebski	c44f019eee	dummy_sharder: rename dummy_sharding_info.* to dummy_sharder.* Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	e72696a8e6	sharding_info: rename the class to sharder Also rename all variables that were named si or sinfo to sharder. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	2e850421a0	i_partitioner:remove embeded sharding_info sharding_info embeded into partitioner is no longer used anywhere and can be removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	b46b35c55a	i_partitioner: remove unused get_sharding_info Previous patches has removed all the usages of this function. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	92cdc21123	schema: remove incorrect comment partitioner is actually part of schema digest and is stored locally in internal tables. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	7bd2b8d73f	schema: make it possible to set sharding_info per schema Previously schema::get_sharding_info was obtaining sharding_info from the partitioner but we want to remove sharding_info from the partitioner so we need a place in schema to store it there instead. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	79adee2fae	i_partitioner: remove unused shard_count Previous patches have switched all the calls to i_partitioner::shard_count to sharding_info::shard_count and this function can now be removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	db3d7df893	multishard_writer: stop calling i_partitioner::shard_count Replace it with sharding_info::shard_count. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	b7834634ee	i_partitioner: remove sharding_ignore_msb Every place that has previously called this method is now using sharding_info::sharding_ignore_msb and this function can be removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	fb89841cc5	partitioner_test: test ranges and sharding_infos Turn test_something_with_some_interesting_ranges_and_partitioners into test_something_with_some_interesting_ranges_and_sharding_info. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	2aaa33d02e	i_partitioner: remove unused split_ranges_to_shards The function is never called so it can be safely removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	bdb7e89048	i_partitioner: remove unused shard_of function Previous patches switched all the places that called i_partitioner::shard_of to use sharding_info::shard_of so i_partitioner::shard_of is no longer used and can be removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	14ad965733	sstable-utils: use sharding_info::shard_of Create sharding_info with the same parameters as the partitioner and use it instead of the partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	dc2e060313	create_token_range_from_keys: use sharding info for shard_of Replace i_partitioner::shard_of with sharding_info::shard_of Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	c50f7f8143	multishard_mutation_query_test: use sharding info for shard_of Uses sharding_info::shard_of instead of i_partitioner::shard_of. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	8aabba6041	distribute_reader_and_consume_on_shards: use sharding_info::shard_of Switches all uses of i_partitioner::shard_of to sharding_info::shard_of. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	d8ac8fd6e8	multishard_mutation_query: use sharding_info::shard_of This patch replaces all the uses of i_partitioner:shard_of with sharding_info::shard_of in read_context. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	88364b6c30	dht::shard_of: use schema::get_sharding_info i_partitioner::shard_of will be removed so we should use sharding_info::shard_of instead. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	8b6be90310	i_partitioner: remove unused token_for_next_shard Previous patches have switched all the places that was using i_partitioner::token_for_next_shard to sharding_info::token_for_next_shard. Now the function can be removed from i_partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	8a6c377352	split_range_to_single_shard: use sharding info instead of partitioner The function relies only on i_partitioner::shard_count and i_partitioner::token_fon_next_shard. Both are really implemented in sharding_info so the method can use them directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	c5d0887471	schema_builder: remove unused with_partitioner_for_tests_only After previous patches that switched some tests to use sharding_info instead of i_partitioner, we now don't need with_partitioner_for_tests_only and the function can be removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	41591f15d2	tests: rename dummy_partitioner.* to dummy_sharding_info.* dummy_partitioner was renamed to dummy_sharding_info in the previous patch. This patch cleans up the names of files. It's done in a separate patch to not obstruct the diff of previous patch. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	031f589dba	multishard_combining_reader: use token_for_next_shard from sharding info not partitioner Previously this function was accessing sharding logic through partitioner obtained from the schema. While converting tests, dummy_partitioner is turned into dummy_sharding_info. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:25 +02:00
Tomasz Grabiec	f2b091967b	Merge "migration_manager: Make sync_schema return error when node is down" from Asias sync_schema is supposed to make sure that this node knows about all schema changes known by "nodes" that were made prior to this call. Currently, when a node is down, the sync is sliently skipped. To fix, add a flag to migration_task::run_may_throw to indicate that it should fail if a node is down. Fixes #4791	2020-03-30 17:31:57 +02:00
Gleb Natapov	b3db6f5b04	lwt: rename "in_progress_ballot" cell to "promise" in system.paxos table The value that is stored in "in_progress_ballot" cell is the value of promised ballot, so call the cell accordingly to avoid confusion especially as we have a notion of "in progress" proposal in the code which is not the same as in_progress_ballot here. We can still do it without care about backwards compatibility since LWT is still marked as experimental. Fixes #6087. Message-Id: <20200326095758.GA10219@scylladb.com>	2020-03-30 12:01:55 +03:00
Avi Kivity	fba6db4a43	Update seastar submodule * seastar 06a8c8f6e...36e8dfc89 (1): > reactor: decouple idle cpu handler from reactor Ref #1.	2020-03-30 10:49:12 +03:00
Piotr Jastrzebski	274a045649	partitioner_test: use token_for_next_shard from sharding info not partitioner partitioner_test contains test_partitioner_sharding function which this patch renames to test_sharding and makes it use sharding_info instead of the partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 09:37:48 +02:00
Piotr Jastrzebski	a3262a2cb2	repair: depend only on sharding logic not on partitioner repair does not use partitioner and only uses sharding logic. This means it does not have to depend on i_partitioner and can instead operate on sharding_info. This has an important consequence of allowing the repair of multiple tables having different partitioners at the same time. All tables repaired together still have to use the same sharding logic. To achieve this the change: 1. Removes partitioner field from repair_info 2. repair_info has access to sharding_info through schema objects of repaired tables 3. partitioner name is removed from shard_config 4. local and remote partitioners are removed from repair_meta. Remote sharding_info is used instead. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 09:37:48 +02:00
Piotr Jastrzebski	dffa9fc880	dht: remove unimplemented split_range_to_single_shard This method is not implemented anywhere not to mention the usage. It is the only resonable thing to remove it instead of keeping an unused and unimplemented declaration. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 09:36:22 +02:00
Piotr Jastrzebski	94ff653b99	selective_token_range_sharder: replace i_partitioner with sharding_info The class does not depend on partitioning logic but only uses sharding logic. This means it is possible and desirable to limit its dependency to only sharding_info. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 09:36:22 +02:00
Piotr Jastrzebski	ecff322fd5	ring_position_range_vector_sharder: replace i_partitioner with sharding_info ring_position_range_vector_sharder does not depend on partitioning logic. It only uses sharding logic so it is not necessary to store i_partitioner in the class. Reference to sharding_info is enough. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 09:35:27 +02:00
Piotr Jastrzebski	8a4c1be129	ring_position_range_sharder: replace i_partitioner with sharding_info ring_position_range_sharder does not depend on partitioning at all. It only uses sharding so it is enough for the class to take sharding_info instead of a whole i_partitioner. This patch changes ring_position_range_sharder class to contain const sharding_info& instead of const i_partitioner&. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 09:35:27 +02:00
Piotr Jastrzebski	52fe241311	dht: remove unused ring_position_exponential_sharder The class is not used anywhere so it can be safely removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 09:35:27 +02:00
Piotr Jastrzebski	8d81a2498f	schema: add get_sharding_info At the moment, we have a single sharding logic per node but we want to be able to set it per table in the future. To make it easy to change in the future sharding_info will be managed inside schema and all the other code will access it through schema::get_sharding_info function. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 09:35:27 +02:00
Piotr Jastrzebski	ca07f8e84d	partitioner: extract sharding fields to a class This patch creates a new class called sharding_info. This new class will now be responsible for all the sharding logic that before was a part of the partitioner. In the end, sharding and partitioning logic will be fully separated but this patch starts with just extracting sharding logic to sharding_info and embedding it into i_partitioner class. All sharding functions are still present in i_partitioner but now they just delegate to the corresponding functions of the embedded sharding_info object. Following patches will gradually switch all uses of the following i_partitioner member functions to their equivalents in sharding_info: 1. shard_of 2. token_for_next_shard 3. sharding_ignore_msb 4. shard_count After that, sharding_info will be removed from i_partitioner and the two classes will be totally independent. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 09:35:27 +02:00
Asias He	ef64f52152	migration_manager: Do not swallow exception in migration_task::run_may_throw The user migration_manager::submit_migration_task needs to know if migration_task::run_may_throw is successful or not. Do not swallow exception. Fixes #4791	2020-03-30 14:50:01 +08:00
Avi Kivity	68750b777e	priority_manager: deinline constructor Make the constructor out-of-line and clean up includes made redundant. This removes an include of Seastar's heavy reactor.hh from a header. Ref #1 Message-Id: <20200329173711.16949-1-avi@scylladb.com>	2020-03-30 09:34:18 +03:00
Avi Kivity	3159ad4484	Update seastar submodule * seastar c7b6b84e5...06a8c8f6e (12): > scheduling_group_specific: remove inclusion of reactor.hh > future: Delete void_futurize_helper > future: Delete unused do_void_futurize_helper instantiation > core: remove io_queue queued requests metric > future: Add assert to set_urgent_state > future: Add a comment to set_urgent_state > future: Use placement new instead of operator= in set_urgent_state > file: use correct io_queue in dup()d files > io_queue: fix miscalculation of sizes when I/O queue is not configured. > merge: Add log levels to RPC loggers > reactor: Replace a call to cpu_id with this_shard_id() > reactor: Drop a few redundant calls to engine()	2020-03-29 15:37:45 +03:00
Botond Dénes	0d224210bb	database: apply_in_memory(): don't look-up the column-family twice The column-family is already looked up as the first line in the method. No need to repeat that lookup in the lambda passed to `run_when_memory_available()`, we can just capture the reference to the already obtained column-family object. These objects are safe to reference, they don't just disappear in the middle of an operation. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200327140827.128647-1-bdenes@scylladb.com>	2020-03-27 15:19:32 +01:00
Asias He	743b529c2b	gossip: Add an option to force gossip generation Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation number g1, g2, g3. n1, n2, n3 running scylla version with commit `0a52ecb6df` (gossip: Fix max generation drift measure) One year later, user wants the upgrade n1,n2,n3 to a new version when n3 does a rolling restart with a new version, n3 will use a generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's gossip update and mark g3 as down. Such unnecessary marking of node down can cause availability issues. For example: DC1: n1, n2 DC2: n3, n4 When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which causes the whole DC2 to be unavailable. To fix, we can start the node with a gossip generation within MAX_GENERATION_DIFFERENCE difference for the new node. Once all the nodes run the version with commit `0a52ecb6df`, the option is no logger needed. Fixes #5164	2020-03-27 12:15:21 +01:00
Rafael Ávila de Espíndola	c5795e8199	everywhere: Replace engine().cpu_id() with this_shard_id() This is a bit simpler and might allow removing a few includes of reactor.hh. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200326194656.74041-1-espindola@scylladb.com>	2020-03-27 11:40:03 +03:00
Nadav Har'El	c639a5ec6f	merge: fix two CDC bugs with preimage/postimage Merged pull request https://github.com/scylladb/scylla/pull/6078 from Calle Wilund, fixing two CDC preimage/postimage bugs: Fixes #6073. Fixes #6070.	2020-03-26 17:38:18 +02:00
Alejo Sanchez	cb26de89a1	tests: port Cassandra CQL tests to cql repl Port CQL only tests to cql repl from: cassandra-dtest/cql_test.py cassandra/test/unit/org/apache/cassandra/cql3/validation/operations/BatchTest.java Refs #5792 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200326103223.1097192-2-alejo.sanchez@scylladb.com>	2020-03-26 15:19:38 +02:00
Alejo Sanchez	febcced4f1	utils: error injection with timeout/deadline Most of Scylla code runs with a user-supplied query timeout, expressed as absolute clock (deadline). When injecting test sleeps into such code, we most often want to not sleep beyond the user supplied deadline. Extend error injection API to optionally accept a deadline, and, if it is provided, sleep no more than up to the deadline. If current time is beyond deadline, sleep injection is skipped altogether. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200326091600.1037717-2-alejo.sanchez@scylladb.com>	2020-03-26 12:41:10 +01:00
Piotr Sarna	6bcc46b08a	cql3: add missing error message context to query processor When caching a prepared statement fails, an error is logged, but due to a typo it only prints "failed to cache the entry", ignoring the specific error message - which this patch fixes. Message-Id: <9c3c1d9c11d559815268fa977c1fb80b8c4459ca.1585213673.git.sarna@scylladb.com>	2020-03-26 12:46:03 +02:00
Piotr Sarna	1178ac5564	test: move config to heap in sstable_resharding_test ... in order to get rid of a large stack warning. Tests: unit(dev) Message-Id: <bca0f854f4e338316c109364257a740a36821b0a.1585129083.git.sarna@scylladb.com>	2020-03-25 14:58:16 +01:00
Piotr Sarna	5ef9dbfa8a	test: move config to heap in schema_registry_test ... in order to get rid of a large stack warning. Tests: unit(dev) Message-Id: <82b55e8440ade8a3d81880dd66127776b2661112.1585128726.git.sarna@scylladb.com>	2020-03-25 14:19:30 +01:00
Nadav Har'El	a0f025f4ce	sstable: LA format is the default, so ignore "LA_SSTABLE" feature flag The previous patch made the LA format the default. We no longer need to choose between writing the older KA format or LA, so the LA_SSTABLE cluster feature has became unnecessary. Unfortunately, we cannot completely remove this feature: Since commit `4f3ce42163` we cannot remove cluster features because this node will refuse to join a cluster which already agreed on features that it lacks - thinking it is an old node trying to join a new cluster. So the LA_SSTABLE feature flag remains, and we continue to advertise that our node supports it. We just no longer care about what other nodes advertised for it, so we can remove a bit of code that cared. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200324232607.4215-3-nyh@scylladb.com>	2020-03-25 13:00:28 +01:00
Nadav Har'El	91aba40114	sstable: default to LA format instead of KA format Over the years, Scylla updated the sstable format from the KA format to the LA format, and most recently to the MC format. On a mixed cluster - as occurs during a rolling upgrade - we want all the nodes, even new ones, to write sstables in the format preferred by the old version. The thinking is that if the upgrade fails, and we want to downgrade all nodes back to the older version, we don't want to lose data because we already have too-new sstables. So the current code starts by selecting the oldest format we ever had - KA, and only switching this choice to LA and MC after we verify that all the nodes in the cluster support these newer formats. But before an agreement is reached on the new format, sstables may already be created in the antique KA format. This is usually harmless - we can read this format just fine. However, the KA format has a problem that it is unable to represent table names or keyspaces with the "-" character in them, because this character is used to separate the keyspace and table names in the file name. For CQL, a "-" is not allowed anyway in keyspace or table names; But for Alternator, this character is allowed - and if a KA table happens to be created by accident (before the LA or MC formats are chosen), it cannot be read again during boot, and Scylla cannot reboot. The solution that this patch takes is to change Scylla's default sstable format to LA (and, as before, if the entire cluster agrees, the newer MC format will be used). From now on, new KA tables will never be written. But we still fully support reading the KA format - this is important in case some very old sstables never underwent compaction. The old code had, confusingly, two places where the default KA format was chosen. This patch fixes is so the new default (LA) is specified in only one place. Fixes #6071. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200324232607.4215-2-nyh@scylladb.com>	2020-03-25 13:00:28 +01:00
Rafael Ávila de Espíndola	eca0ac5772	everywhere: Update for deprecated apply functions Now apply is only for tuples, for varargs use invoke. This depends on the seastar changes adding invoke. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200324163809.93648-1-espindola@scylladb.com>	2020-03-25 08:49:53 +02:00
Avi Kivity	088660680c	Update seastar submodule * seastar 92c488706...c7b6b84e5 (6): > semaphore: Use futurize_invoke instead of futurize_apply > future: specify futurize::make_exception_future as noexcept > future: Move ignore out of line > future: Split then and then_impl to enable NRVO > semaphore_units: allow getting the number of units held > Merge "Split futurize::apply into invoke(...) and apply(tuple)" from Rafael	2020-03-25 08:48:00 +02:00
Asias He	7ba821cbc0	migration_manager: Make sync_schema return error when node is down sync_schema is supposed to make sure that this node knows about all schema changes known by "nodes" that were made prior to this call. Currently, when a node is down, the sync is sliently skipped. To fix, add a flag to migration_task::run_may_throw to indicate that it should fail if a node is down. Fixes #4791	2020-03-25 10:59:13 +08:00
Calle Wilund	532a8634c6	cdc::log: Only generate pre/post-image when enabled Fixes #6073 The logic with pre/post image was tangled into looking at "rs" and would cause pre-image info to be stored even if only post-image data was enabled. Now only generate keys (and rows for them) iff explicitly enabled. And only generate pre-image key iff we have pre-image data.	2020-03-24 15:32:30 +00:00
Calle Wilund	881ebe192b	cdc::log: Handle non-atomic column assignments broken into two Fixes #6070 When mutation splitting was added, non-atomic column assignments were broken into two invocation of transform. This means the second (actual data assignment) does not know about the tombstone in first one -> postimage is created as if we were _adding_ to the collection, not replacing it. While not pretty, we can handle this knowing that we always get invoked in timestamp order -> tombstone first, then assign. So we simply keep track of non-atomic columns deleted across calls and filter out preimage data post this. Added test cases for all non-atomics	2020-03-24 14:07:13 +00:00
Botond Dénes	0418a74fa9	querier: consume_page(): resolve FIXME related to non-movable consumer Now that #3158 is fixed, we can move the consumer to its place after the `compaction_mutation_state::start_new_page()` call. No need to keep it as `std::unique_ptr<>`. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200310185147.207665-1-bdenes@scylladb.com>	2020-03-24 15:28:42 +02:00
Avi Kivity	a314283469	Merge "Minor cleanups to cql3 code regarding shared_ptr's" from Pavel S " This small series consists of several changes that aim to reduce the number of shared_ptr's in cql3 code. Also it contains a patch that makes CqlParser::query to return std::unique_ptr<> instead of seastar::shared_ptr<>, which leads to more understandable code and lays foundation for further optimizations (e.g. possibly eliminating shared_ptr's in `prepared_statement` and just moving raw statements in `prepare` without copying them). Tests: unit(dev, debug) " * 'feature/cql_cleanups_9' of https://github.com/ManManson/scylla: cql3: return raw::parsed_statement as unique_ptr cql3: de-pointerize arguments to some of CQL grammar rules and definitions. cql3: make abstract_marker::make_in_receiver accept cref to column_specification	2020-03-24 14:51:49 +02:00
Calle Wilund	9fee712d62	db::commitlog: Don't write trailing zero block unless needed Fixes #5899 When terminating (closing) a segment, we write a trailing block of zero so reader can have an empty region after last used chunk as end marker. This is due to using recycled, pre-allocated segments with potentially non-zero data extending over the point where we are ending the segment (i.e. we are not fully filling the segment due to a huge mutation or similar). However, if we reach end of segment writing the final block (typically many small mutations), the file will end naturally after the data written, and any trailing zero block would in fact just extend the file further. While this will only happen once per segment recycled (independent on how many times it is recycled), it is still both slightly breaking the disk usage contract and also potentially causing some disk stalls due to metadata changes (though of course very infrequent). We should only write trailing zero if we are below the max_size file size when terminating Adds a small size check to commitlog test to verify size bounds. (Which breaks without the patch) v2: - Fix test to take into account that files might be deleted behind our backs. v3: - Fix test better, by doing verification _before_ segments are queued for delete. Message-Id: <20200226121601.15347-2-calle@scylladb.com> Message-Id: <20200324100235.23982-1-calle@scylladb.com>	2020-03-24 11:31:55 +01:00
Pavel Solodovnikov	adc6a98b59	cql3: return raw::parsed_statement as unique_ptr Change CQL parsing routine to return std::unique_ptr instead of seastar::shared_ptr. This can help reduce redundant shared_ptr copies even further. Make some supplementary changes necessary for this transition: * Remove enabled_shared_from_this base class from the following classes: truncate_statement, authorization_statement, authentication_statement: these were previously constructing prepared_statement instance in `prepare` method using `shared_from_this`. Make `prepare` methods implementation of inheriting classes mirror implementation from other statements (i.e. create a shallow copy of the object when prepairing into `prepared_statement`; this could be further refactored to avoid copies as much as possible). * Remove unused fields in create_role_statement which led to error while using compiler-generated copy ctor (copying uninitialied bool values via ctor). Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-03-23 23:19:21 +03:00
Pavel Solodovnikov	df1d687fc6	cql3: de-pointerize arguments to some of CQL grammar rules and definitions. Make the following rules and definitions accept a reference instead of shared_ptr's: * cfamDefinition * cfamColumns * pkDef * typeColumns * ksName * cfName * idxName * properties * property This will reduce a bit the number of countless shared_ptr copies and moves all over the place in cql3 code. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-03-23 23:19:21 +03:00
Pavel Solodovnikov	279b52f275	cql3: make abstract_marker::make_in_receiver accept cref to column_specification These methods just extract some info out of column_specification, so no need have another copy of shared_ptr since it's not stored anywhere inside. Transform abstract_marker::in_raw::make_in_receiver as well following the call chain. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-03-23 23:19:21 +03:00
Nadav Har'El	f1aaa91e21	merge: add metrics Merged pull request https://github.com/scylladb/scylla/pull/6030 from Piotr Dulikowski: Adds CDC-related metrics. Following counters are added, both for total and failed operations: Total number of CDC operations that did/did not perform splitting, Total number of CDC operations that touched a particular mutation part. Total number of preimage selects. Fixes #6002. Tests: unit(dev, debug) * 'cdc-metrics' of github.com:piodul/scylla: storage_proxy: track CDC operations in LWT flow storage_proxy: track CDC operations in logged batches storage_proxy: track CDC operations in standard flow storage_proxy: add cdc tracker hooks to write response handlers storage_proxy: move "else if" remainder into "else" block cdc: create an operation_result_tracker object cdc: add an object for tracking progress of cdc mutations cdc: count touched mutation parts in transformer::transform cdc: track preimage selects in metrics cdc: register metric counters cdc: fix non-atomic updates in splitting	2020-03-23 21:55:58 +02:00
Botond Dénes	ec36c7cb2f	test: random_schema: remove redundant gc grace period from tombstone expiry Compaction automatically adds gc grace period to expiry times already, no need to add it when creating the tombstones. Remove the redundant additions form the code. The direct impact is really minor as this is only used in tests, but it might confuse readers who are looking at how tombstones are created across the codebase. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200323120948.92104-1-bdenes@scylladb.com>	2020-03-23 15:12:25 +02:00
Piotr Dulikowski	736c1c6056	storage_proxy: track CDC operations in LWT flow Register cdc operation result tracker during LWT flow.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	f7fd6f4607	storage_proxy: track CDC operations in logged batches Register cdc operation result tracker in logged batch flow.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	ef1c62aa04	storage_proxy: track CDC operations in standard flow Register cdc operation result tracker for write response handlers coming from the usual write requests.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	cccc33f0fd	storage_proxy: add cdc tracker hooks to write response handlers Adds a field to abstract_write_response_handler that points to the cdc operation result tracker, and a function for registering the tracker in the handlers that currently write to a CDC log table.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	dc05d30fd3	storage_proxy: move "else if" remainder into "else" block In the following commit, more code will be added to the newly created "else" block.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	5a5cc57878	cdc: create an operation_result_tracker object An `operation_result_tracker` object is now returned as a second return value from the `augment_mutation_call` function.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	1b92cbeabe	cdc: add an object for tracking progress of cdc mutations CDC metrics, apart from tracking "total" metrics for all performed CDC operations, also track metrics for "failed" operations. Because the result of the CDC operation depends on whether all CDC mutations were written successfully by storage_proxy, checking for failure and incrementing appropriate counters is deferred after all write response handlers finish. The `cdc::operation_result_tracker` object was created for that purpose. It contains all the details needed to accurately update the metrics based on what actually happened in the `augment_mutation_call` function, and holds a flag which tells if any of write response handlers failed. This object is supposed to be referenced by write response handlers for CDC mutations created after the same `augment_mutation_call`. After all write response handlers are destroyed, the destructor of `operation_result_tracker` will update appropriate metrics. Actual creating and attaching this object to write response handlers will be done in subsequent commits.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	98e5fdc7ac	cdc: count touched mutation parts in transformer::transform Modifies the transformer::transform so that it also returns a set of flags indicating what parts of the mutation (e.g. rows, tombstones, collections, etc.) were processed during transforming.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	53570d8657	cdc: track preimage selects in metrics This commit causes preimage select counter to be increased after performing this operation.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	e7062de02b	cdc: register metric counters This patch defines a CDC metrics object and registers all of its counters. storage_proxy is chosen as the owner of the metrics object. Because in subsequent commits it will become possible for CDC metrics to be updated after a write operation ends, and because the cdc_service has shorter lifetime than storage_proxy, we could risk a use-after-free if we placed this object inside cdc_service.	2020-03-23 14:05:25 +01:00
Piotr Dulikowski	338e473946	cdc: fix non-atomic updates in splitting This patch fixes a bug in mutation splitting logic of CDC. In the part that handles updates of non-atomic clustering columns, the column definition was fetched from a static column of the same id instead of the actual definition of the clustering column. It could cause the value to be written to a wrong column. Tests: unit(dev)	2020-03-23 13:47:23 +01:00
Ivan Prisyazhnyy	5ec7e77b2e	api: /column_family/major_compaction/{keyspace:table} implementation This implements support for triggering major compations through the REST API. Please note that "split_output" is not supported and Glauber Costa confirmed this this is fine: "We don't support splits, nor do I think we should." Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>	2020-03-23 13:48:29 +02:00
Avi Kivity	0d885dbb00	Merge "Make all headers standalone" from Botond " Make sure all headers compile on their own, without requiring any additional includes externally. Even though this requirement is not documented in our coding guides it is still quasi enforced and we semi-regularly get and merge patches adding missing includes to headers. This patch-set fixes all headers and adds a `{mode}-headers` target that can be used to verify each header. This target should be built by promotion to ensure no new non-conforming code sneaks in. Individual headers can be verified using the `build/dev/path/to/header.hh.o` target, that is generated for every header. The majority of the headers was just missing `seastarx.hh`. I think we should just include this via a compiler flag to remove the noise from our code (in a followup). " * 'compiling-headers/v2' of https://github.com/denesb/scylla: configure.py: add {mode}-headers phony target treewide: add missing headers and/or forward declarations test/boost/sstable_test.hh: move generic stuff to test/lib/sstable_utils.hh sstables: size_tiered_backlog_tracker: move methods out-of-line sstables: date_tiered_compaction_strategy.hh: move methods out-of-line	2020-03-23 13:09:09 +02:00
Avi Kivity	c6a441f9c2	Update seastar submodule * seastar 3c498abcab...92c488706c (14): > dpdk: restore including reactor.hh > tests: distributed_test: add missing #include <mutex> > reactor: un-static-ify make_pollfn() > merge: Reduce inclusions of reactor.hh A few #includes added to compensate for this > sharded: delete move constructor > future: Avoid a move constructor call > future: Erase types a bit more in then_wrapped > memory: Drop a never nullopt optional > semaphore: specify get_units and with_semaphore as noexcept > spinlock.hh: Add include for <cassert> header > dpdk: Avoid a variable sized array > future: Add an explicit promise member to continuation > net: remove smart pointer wrappers around pollable_fd > Merge "cleanup reactor file functions" from Benny	2020-03-23 11:59:30 +02:00
Piotr Dulikowski	a693e6ff6c	cdc: fix non-atomic updates in splitting This patch fixes a bug in mutation splitting logic of CDC. In the part that handles updates of non-atomic clustering columns, the schema for serializing that column was looked up incorrectly in the table schema - instead of a `regular_column`, a `static_column` was looked up. Due to how the `column_at` function works, a correct schema was always returned if the table had no static columns. Therefore, in order for this bug to manifest, a table with a static column and a regular column with non-atomic collection was needed.	2020-03-23 10:20:24 +01:00
Piotr Sarna	602a771105	Merge 'utils: error injector API' from Alejo Closes #3295 The error_injection class allows injecting custom handlers into normal control flow at the pre-determined injection points. This is especially useful in various testing scenarios: * Throwing an exception at some rare and extreme corner-cases * Injecting a delay to test for timeouts to be handled correctly * More advanced uses with custom lambda as an injection handler Injection points are defined by `inject` calls. Enabling and disabling injections are done by the corresponding `enable` and `disable` calls. REST frontend APIs is provided for convenience. Branch URL: https://github.com/alecco/scylla/tree/as_error_injection Tests: unit {{dev}}, unit {{debug}} * 'as_error_injection' of github.com:alecco/scylla: api: add error injection to REST API utils: add error injection	2020-03-23 08:39:22 +01:00
Botond Dénes	5174acb359	configure.py: add {mode}-headers phony target	2020-03-23 09:29:45 +02:00
Botond Dénes	e0284bb9ee	treewide: add missing headers and/or forward declarations	2020-03-23 09:29:45 +02:00
Botond Dénes	575466b2cf	test/boost/sstable_test.hh: move generic stuff to test/lib/sstable_utils.hh sstable_test.hh started as collection of utilities shared between the various `_sstable_test.cc` files. Predictably other tests started using it as well, among them some that are non boost unit tests. This poses a problem as if we add the missing boost/test/unit_test.hpp include to sstable_test.hh these tests will suddenly have missing symbols from boost::test. To avoid linking boost::test into all these users, extract utilities more widely used into sstable_utils.hh	2020-03-23 09:29:45 +02:00
Botond Dénes	84329a16ee	sstables: size_tiered_backlog_tracker: move methods out-of-line	2020-03-23 09:29:45 +02:00
Botond Dénes	d58ec632e3	sstables: date_tiered_compaction_strategy.hh: move methods out-of-line	2020-03-23 09:26:19 +02:00
Glauber Costa	dd65f7dcbb	tests: move token_generation_for_shard to common code We now have a utils file for SSTables. This is potentially useful for other tests. As a matter of fact, this function is repeated right now for the resharding test. And to add insult to injury, the version in the resharding test has the parameters shard and number of tokens flipped, which although extremely confusing is the predictable outcome of such repetition Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-03-22 19:00:26 +02:00
Asias He	be1a196988	repair: Handle keyspace with zero table The following error was seen in materialized_views_test.py:TestMaterializedViews.decommission_node_during_mv_insert_4_nodes_test INFO [shard 0] repair - repair id 3 to sync data for keyspace=ks, status=started repair/repair.cc:662:36: runtime error: member call on null pointer of type 'const struct schema' Aborting on shard 0. The problem is in the test a keyspace was created without creating any table. Since db19a76b1f(selective_token_range_sharder: stop calling global_partitioner()), in get_partitioner_for_tables, we access nullptr when no table is present. schema_ptr last_s; for (auto t: tables) { // set last_s } last_s->get_partitione() To fix: 1) Skip the repair in sync_data_using_repair if there is no table in the keyspace 2) Throw if no schema_ptr is found in get_partitioner_for_tables. Be defensive. After: INFO [shard 0] repair - decommission_with_repair: started with keyspace=ks, leaving_node=127.0.0.2, nr_ranges=744 INFO [shard 0] repair - repair id 3 to sync data for keyspace=ks, status=started WARN [shard 0] repair - repair id 3 to sync data for keyspace=ks, no table in this keyspace INFO [shard 0] repair - repair id 3 completed successfully INFO [shard 0] repair - repair id 3 to sync data for keyspace=ks, status=succeeded Tests: materialized_views_test.py:TestMaterializedViews.decommission_node_during_mv_insert_4_nodes_test Fixes: #6022	2020-03-22 13:46:36 +02:00
Avi Kivity	d310e7c7ea	Merge 'repair: Ignore keyspace that is removed in sync_data_using_repair' from Asias repair: Ignore keyspace that is removed in sync_data_using_repair When a keyspace is removed during node operations, we should not fail the whole operation. Ignore the keyspace that is removed. Fixes #5942 * asias-repair_fix_5942: repair: Stop the nodes that have run repair_row_level_start repair: Ignore keyspace that is removed in sync_data_using_repair	2020-03-22 13:19:51 +02:00
Takuya ASADA	005211bad6	redis: add lolwut command Add lolwut command that shows redis version and ascii art. see: https://redis.io/commands/lolwut	2020-03-22 13:16:20 +02:00
Takuya ASADA	2ab366e653	install.sh: create user/group correctly on redhat variants Seems like adduser in redhat variants and deiban variants are incompatible, and there is no addgroup in redhat variants. Since adduser in install.sh is implemented on debian variants, does not work on redhat compatible. To fix this we need to use 'useradd' / 'groupadd' instead. Fixes #6018	2020-03-22 13:13:00 +02:00
Avi Kivity	7ed083a6a7	Merge "test.py: Allow to change the tests starting order" from Pavel E " In debug mode some tests take veeery looong time to finish, those tests are better to be started first. This set adds this by marking such long tests in suite.yaml files. Tests: unit(dev) " * 'br-split-unit-tests-sorting-2' of https://github.com/xemul/scylla: test.py: Mark some tests as "run_first" test.py: Generate list with short names test.py: Rename "long" to "skip_in_debug_mode"	2020-03-21 19:53:23 +02:00
Rafael Ávila de Espíndola	482fbfcfdb	build: Use more strict stack frame limits A recent seastar update has resolved the worse offenders, so we can lower the limit a bit to warn on the next set of functions. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200317183209.1664860-1-espindola@scylladb.com>	2020-03-21 19:51:57 +02:00
Rafael Ávila de Espíndola	01ac4aef3a	everywhere: Use futurize_apply instead of futurize<void>::apply No functionality change, just simpler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200318234149.283090-1-espindola@scylladb.com>	2020-03-21 19:51:38 +02:00
Rafael Ávila de Espíndola	0d7281ca06	sstable: Move sstables_manager constructor out of line There is no reason to have it in a header. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200320005225.178381-1-espindola@scylladb.com>	2020-03-21 19:47:29 +02:00
Piotr Dulikowski	6c5c745e25	cdc: add cdc log schema test	2020-03-21 07:33:35 +01:00
Piotr Dulikowski	3bfb044bf1	cdc: do not create cdc$deleted columns for pks and cks Primary key and clustering key column should not have a corresponding "cdc$deleted_<name>" column in cdc log table, because it does not make sense to delete such a column from a row. Fixes: #6049 Tests: unit(dev)	2020-03-21 07:33:23 +01:00
Pekka Enberg	6b2cd1bd7d	Revert "db::commitlog: Don't write trailing zero block unless needed" This reverts commit `0b34d88957`. According to Rafael Avila de Espindola: "I have bisected the recent failures [in commitlog_test] on next to this patch."	2020-03-20 22:30:58 +02:00
Pekka Enberg	12b6092ac2	Revert "sstables: Fix incorrect calculation of Compaction Backlog" This reverts commit `458ef4bb06`. According to Glauber Costa: "It may give us the illusion that fixes something for a particular case but this fix is wrong. I am trying to help Raphael figure out why the backlog is wrong but this patch is not the answer."	2020-03-20 22:28:57 +02:00
Piotr Sarna	331ddf41e5	api: add error injection to REST API Simple REST API for error injection is implemented. The API allow the following operations: * injecting an error at given injection name * listing injections * disabling an injection * disabling all injections Currently the API enables/disables on all shards. Closes #3295 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-03-20 20:49:03 +01:00
Pavel Solodovnikov	057adc8b4d	utils: add error injection Error injection class is implemented in order to allow injecting various errors (exceptions, stalls, etc.) in code for testing purposes. Error injection is enabled via compile flag SCYLLA_ENABLE_ERROR_INJECTION TODO: manage shard instances Enable error injection in debug/dev/sanitize modes. Unit tests for error injection class. Closes #3295 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-03-20 19:37:48 +01:00
Rafael Ávila de Espíndola	9445608df6	gms: Add a default constructor to feature_config Also move it out of line while at it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200316180321.45914-1-espindola@scylladb.com>	2020-03-20 13:34:26 +01:00
Nadav Har'El	df8b3cd5dc	alternator-test: a "run" script Running the Alternator tests is easy after you manually run Scylla, but sometimes it's convenient to have a script which just does everything automatically: start Scylla in a temporary directory, set it up properly for the tests (especially the authentication), run all the tests, and remove the temporary directory. This is what this alternator-tests/run script does. This script can be run by Jenkins, for example, to check all the Alternator tests. The script assumes some things (including cqlsh, pytest and the boto3 library) are already installed, and that Scylla has been compiled - by default it takes the latest built build/*/scylla, but this can be overridden by a command like SCYLLA=build/release/scylla alternator-test/run Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200311091918.16170-1-nyh@scylladb.com>	2020-03-19 15:49:46 +01:00
Botond Dénes	82a019b6e2	scylla-gdb.py: scylla generate_object_graph: make label of initial vertice bold So it is easily identifiable. Also, generally improve the readability of labels by moving type names into a new line.	2020-03-19 16:04:03 +02:00
Botond Dénes	a4eb9b8559	scylla-gdb.py: scylla generate_object_graph: remove redundant lookup Currently the initial vertice of the graph is resolved in both `_traverse_object_graph_breadth_first()` and its caller `_do_generate_object_graph()`. This is redundant, so remove the resolving in the latter.	2020-03-19 16:04:03 +02:00
Botond Dénes	7cb3cc23e6	scylla-gdb.py: scylla generate_object_graph: print "to" offsets Currently, for edges, only the "from" offset is printed, that is the offset of the reference in the originating object. Now that we also scan the non-first word of objects for references to them, we can have reference pointing to the non-first word of objects. To make these apparent, also print the "to" offset on edges, that is the offset into the target object where the reference point to. So now edges have tuple labels: (from, to).	2020-03-19 16:01:59 +02:00
Nadav Har'El	2deba4035a	merge: Hook alternator to admission control Merged patch series from Piotr Sarna: This series hooks alernator to admission control, similarly to how CQL server uses it. The estimated memory consumption is set to 2x raw JSON request, since that seems to be the upper limit of how much more memory rapidjson allocates during parsing. Note, that since Seastar HTTP currently reads the whole contents upfront, there's no easy way to apply admission control before reading the request - that would involve some changes to our HTTP API. Note 2: currently, admission control in CQL does not properly pass memory consumption information for requests that are bounced to another shard - that would require either transferring semaphore units between shards or keeping a foreign pointer to the original units. As a result, alternator also does not pass correct admission control info between shards, and all places in code which do that are marked with clear FIXMEs. Fixes #5029 Piotr Sarna (5): storage_service: add memory limiter semaphore getter alternator: add service permit to callbacks alternator: add memory limiter to alternator server alternator: add addmission control stats entry alternator: hook admission control to alternator server alternator/executor.cc \| 113 ++++++++++++++++++++++-------------- alternator/executor.hh \| 32 +++++----- alternator/rmw_operation.hh \| 1 + alternator/server.cc \| 83 +++++++++++++++----------- alternator/server.hh \| 8 ++- alternator/stats.cc \| 2 + alternator/stats.hh \| 1 + main.cc \| 3 +- service/storage_service.hh \| 4 ++ 9 files changed, 149 insertions(+), 98 deletions(-)	2020-03-19 15:51:17 +02:00
Nadav Har'El	7922b9eb8f	materialized views: reduce recompilation when db/view/view.hh changes. Before this patch, when db/view/view.hh was modified, 89 source files had to be recompiled. After this patch, this number is down to 5. Most of the irrelevant source files got view.hh by including database.hh, which included view.hh just for the definition of statistics. So in this patch we split the view statistics to a separate header file, view_stats.hh, and database.hh only includes that. A few source files which included only database.hh and also needed view.hh (for materialized-view related functions) now need to include view.hh explicitly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200319121031.540-1-nyh@scylladb.com>	2020-03-19 15:46:14 +02:00
Botond Dénes	d2dfb6509c	scylla-gdb.py: scylla generate-object-graph: use value-range to find references When looking for references to an object in the graph, look for references to any part of the object, using `scylla_find.find()`:s new `value_range` parameter. This way, the graph can be extended beyond objects that are members of an intrusive containers, or just generally don't have any references to their very first byte. Allow the user to specify a value-range different than the size of the object. This is useful if it is known that references to the object will point to the first N bytes.	2020-03-19 15:41:48 +02:00
Botond Dénes	326c2a408a	scylla-gdb.py: scylla find: allow finding ranges of values One of the most common use-cases of find is finding references to an object. This works great for normal objects, however not for all of them, a prominent example being objects that are members of an intrusive collections. These objects will have pointers to them that don't point to their first byte, instead they point to somewhere in the middle of the object. To help find such references, find now supports searching for a range of values. If the new `--value-range` option is used, it will start searching for the value itself, and if no usages are found it will increment it with the specified size-class, and search again. This is repeated until some usages are found or the range is depleted. `scylla_find.find()` now returns the offset to the value, of which usages were found. Alternatively one can scan the entire value-range using the `--find-all` option. When this is used, `scylla_find` will not stop on the first offset for which references are found.	2020-03-19 15:41:48 +02:00
Botond Dénes	6bf3a0ae8a	scylla-gdb.py: find_in_live(): return pointer_metadata instances find_in_live() currently parses back the output of `scylla ptr`, to return the address to the beginning of the object and the offset. All its current callers do the call to `scylla ptr` again to obtain further information about the object. To avoid this duplicated effort, return `pointer_metadata` instances from `find_in_live()`, obtained via `scylla_ptr.analyze()` which is the python API to `scylla ptr`.	2020-03-19 15:41:47 +02:00
Piotr Dulikowski	59727fb34b	cdc: remove result_callback The `result_callback` was a callback returned by `augment_mutation_call` that was supposed to be used in the CDC postimage implementation. Because CDC postimage was implemented without using this callback, and currently a no-op function is always returned, this callback can safely be removed.	2020-03-19 14:55:07 +02:00
Pavel Emelyanov	7af3bbd57b	test.py: Mark some tests as "run_first" Those tests take long time to finish, so it makes sense to start them earlier than others. The provided list of long tests consists of those running more than 10 minutes in debug mode. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-19 12:52:18 +03:00
Rafael Ávila de Espíndola	e28b17de88	auth: Make create_metadata_table_if_missing noexcept It returns a future, so converting an exception to an exceptional future simplifies error handling in the caller. Without this code like the one in standard_role_manager::create_metadata_tables_if_missing has a surprising behavior: return when_all_succeed( create_metadata_table_if_missing(...), create_metadata_table_if_missing(...)); Since it might not wait for both futures. We could use the lambda version of when_all_succeed, but changing create_metadata_table_if_missing seems a nice API improvement. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200317002051.117832-4-espindola@scylladb.com>	2020-03-19 10:22:50 +01:00
Piotr Sarna	0c11e07faf	view,table: fix waiting for view updates during building View updates sent as part of the view building process should never be ignored, but `fd49fd7` introduced a bug which may cause exactly that: the updates are mistakenly sent to background, so the view builder will not receive negative feedback if an update failed, which will in turn not cause a retry. Consequently, view building may report that it "finished" building a view, while some of the updates were lost. A simple fix is to restore previous behaviour - all updates triggered by view building are now waited for. Fixes #6038 Tests: unit(dev), dtest: interrupt_build_process_with_resharding_low_to_half_test	2020-03-19 10:50:54 +02:00
Pavel Emelyanov	59bc116695	test.py: Generate list with short names The list will be sorted a bit differently, for this I will need the shortname at once Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-19 11:46:02 +03:00
Pavel Emelyanov	30c540aae1	test.py: Rename "long" to "skip_in_debug_mode" The "long" test will mean that it is to be started first, not skipped, so rename "long" to avoid additional confusion Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-19 11:45:55 +03:00
Piotr Sarna	62c34a9085	cql: fix qualifying indexed columns for filtering When qualifying columns to be fetched for filtering, we also check if the target column is not used as an index - in which case there's no need of fetching it. However, the check was incorrectly assuming that any restriction is eligible for indexing, while it's currently only true for EQ. The fix makes a more specific check and contains many dynamic casts, but these will hopefully we gone once our long planned "restrictions rewrite" is done. This commit comes with a test. Fixes #5708 Tests: unit(dev)	2020-03-19 10:34:16 +02:00
Tomasz Grabiec	5fe626a887	sstables: Release reserved space for sharding metadata The intention of the code was to clear sharding metadata chunked_vector so that it doesn't bloat memory. The type of c is `chunked_vector*`. Assigning `{}` clears the pointer while the intended behavior was to reset the `chunked_vector` instance. The original instance is left unmodified with all its reserved space. Because of this, the previous fix had no effect because token ranges are stored entirely inline and popping them doesn't realease memory. Fixes #4951 Tests: - sstable_mutation_test (dev) - manual using scylla binary on customer data on top of 2019.1.5 Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <1584559892-27653-1-git-send-email-tgrabiec@scylladb.com>	2020-03-19 09:46:27 +02:00
Pekka Enberg	0d2b70798f	reloc/build_reloc.sh: Remove unused functions The is_redhat_variant() and is_debian_variant() funtions are not used so let's remove them. Message-Id: <20200317155740.12916-1-penberg@scylladb.com>	2020-03-19 08:39:57 +01:00
Rafael Ávila de Espíndola	7401a63e92	auth: Handle permission cache not being initialized auth::service::start can fail before _permissions_cache is initialized, so we should not assume that it is always set. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200317002051.117832-3-espindola@scylladb.com>	2020-03-18 20:21:24 +01:00
Rafael Ávila de Espíndola	3c2851aafc	test: Make sure auth_service is always stopped An exception thrown after the start of auth_service and before init_server_without_the_messaging_service_part returns would cause the sharded<auth_service> destructor to assert. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200317002051.117832-2-espindola@scylladb.com>	2020-03-18 20:17:55 +01:00
Botond Dénes	e6e894d871	scylla-gdb.py: introduce scylla small-objects When investigating OOM related cores, a common thing to do is trying to identify the objects in a particularly heavily populated size-class. This command is meant to help with that, providing a way to list the objects in any size-class, in a paginated way. Traversing the objects of a pool is done through a `small_object_iterator` object which is also exposed to python code, to be used in custom scripts wanting to scan all objects belonging to a pool. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200318085437.452906-1-bdenes@scylladb.com>	2020-03-18 13:33:59 +02:00
Raphael S. Carvalho	0df8faeaa2	sstables: make delete_atomically() work with empty set If delete_atomically() was called with a empty set for any reason, it will fail to work because it relies on any of the sstables in the set for getting the sstable directory. This will be needed, in the future, when using sstable replacement function only with new sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200305144657.9440-1-raphaelsc@scylladb.com>	2020-03-18 13:29:42 +02:00
Pavel Emelyanov	da3bf20e71	main: Respect config start_native_transport option There's such an option, and it's not taken into account on scylla start. There's a symmetrical start_rpc one, which is, so make both act similarly. The default value for the option is true, so default set-ups will not get broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200310140518.29410-1-xemul@scylladb.com>	2020-03-18 11:17:56 +02:00
Avi Kivity	164881696b	Merge "scylla-gdb.py: scylla_memory: handle per-sg coordinator stats" from Botond " Since `b783d40aa` storage-proxy maintains separate coordinator stats per scheduling group. This broke scylla_memory, which was still trying to access the old global stats. This mini-series updates it to be able to handle per-sg coordinator stats, while preserving backward compatibility with older versions still using global stats. " * 'scylla-memory-per-sg-coordinator-stats/v1' of https://github.com/denesb/scylla: scylla-gdb.py: scylla_memory: update w.r.t. per-sg coordinator stats scylla-gdb.py: scylla_memory: move coordinator code to print_coordinator_stats()	2020-03-18 12:38:44 +02:00
Avi Kivity	c766f50491	Merge "Split some unit tests into smaller pieces" from Pavel E " The debug mode unit tests take ~half-an-hour to complete. Here's the tests run-times top list Test: Time (seconds): ... steady tail goes here ... test/boost/user_function_test 496 test/boost/row_cache_test 502 test/boost/view_schema_test 932 test/boost/cql_query_test 997 test/boost/mutation_reader_test 1048 test/boost/sstable_mutation_test 1417 test/boost/secondary_index_test 1468 Splitting the spike (top-5) is the primary goal. However, the distribution of test-cases in 3 of those tests is also _very_ non-uniform, so just cutting it into equal parts doesn't work. For example, the test_index_with_paging from the slowest one takes ~14 minutes on its own and is the slowest test-case out there. So the set does this: - moves the champion test_index_with_paging into separate file - detaches the most heavy parts from sstable_mutation_test and mutation_reader_test into own tests too. The resulting split is still non-uniform, but it's 4 tests that run notably less than the 14 minutes record each - splits the cql_query_test and view_schema_test into several parts in a wildcard manner to run out of the 14 min threshold - moves some shared code into lib/ As the result, the debug mode test run takes 14.5 minutes =) which is almost 2 times faster than it was. The dev mode run time is not affected noticeably. Test: well, unit(debug) and unit(dev) " * 'br-split-unit-tests-3-next' of https://github.com/xemul/scylla: test: Split view_schema_test test: Split cql_query_test test: Split mutation_reader_test test: Split sstable_mutation_test test: Split secondary_index test	2020-03-18 12:19:32 +02:00
Pavel Emelyanov	96e3d0fa36	mutation_partition: Debloat header form others Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200317191051.12623-1-xemul@scylladb.com>	2020-03-18 11:53:36 +02:00
Asias He	cdcedf5eb9	gossip: Make is_safe_for_bootstrap more strict Consider 1. Start n1, n2 in the cluster 2. Stop n2 and delete all data for n2 3. Start n2 to replace itself with replace_address_first_boot: n2 4. Kill n2 before n2 finishes the replace operation 5. Remove replace_address_first_boot: n2 from scylla.yaml of n2 6. Delete all data for n2 7. Start n2 At step 7, n2 will be allowed to bootstrap as a new node, because the application state of n2 in the cluster is HIBERNATE which is not rejected in the check of is_safe_for_bootstrap. As a result, n2 will replace n2 with a different tokens and a different host_id, as if the old n2 node was removed from the cluster silently. Fixes #5172	2020-03-17 17:37:16 +01:00
Tomasz Grabiec	488482c55a	Merge "lwt: ensure unqualified SELECT works with SERIAL cl" from Kostja Ensure unqualified SELECT throws an appropriate exception with SERIAL consistency level. Since such query touches multiple partitions, we don't support it in SERIAL mode. Branch URL: https://github.com/kostja/scylla/tree/gh-6016-crash-lwt-select	2020-03-17 17:24:06 +01:00
Konstantin Osipov	4978bb513d	test: add a test case for SERIAL read consistency Pass custom query options to execute_prepared and add a test case for custom SERIAL consistency.	2020-03-17 18:58:12 +03:00
Konstantin Osipov	f5180725df	lwt: check SELECT restricts partition key before accessing it Check that SELECT statement checks there is a partition key before accessing it when determining the shard to execute the query on. Essentially move the check for properly restricted partition key from storage_proxy.cc to select_statement.cc, now that we access it earlier in the call stack. Keep the check in storage_proxy.cc since storage_proxy::query() has other call sites (views), which today should never use serial consistency for its queries, but this can change in the future. Please note that Cassandra only partially enforce SERIAL consistency and can silently downgrade SERIAL consistency to the default non-serial one when doing unbounded SELECTS ( https://issues.apache.org/jira/browse/CASSANDRA-15641) Fixes #6016	2020-03-17 16:55:11 +03:00
Pavel Emelyanov	86c712a340	test: Split view_schema_test Detach partition_key and clustering_key ones into own files. The resultint 2 tests run ~4 minutes each, the leftover ones complete within 11 minutes. The same -- the goal to run out of 14 minutes is reached, further splitting needs more thinking than just wildcarding. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-16 20:27:45 +03:00
Pavel Emelyanov	e848d63510	test: Split cql_query_test This detaches like_operator, group_by, functions and large cases into own files. The split is not uniform -- the resulting 4 tests run less that 3 minutes each, what's left in the origin runs ~11 minutes. But since the goal was to get out of 14 minutes threshold and this file contains 126 cases (the champion) so I just did "wildcard" selection that worked. It also required moving require_rows() helpers into a local header. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-16 20:27:45 +03:00
Pavel Emelyanov	3fbd88b226	test: Split mutation_reader_test Detach test_multishard_combining_reader_as_mutation_source into individual file. This particular test runs ~13 minutes. What's left in the origin completes a bit faster. The split also requires moving the reader_lifecycle_policy and the dummy_partitioner into lib/ Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-16 20:27:44 +03:00
Pavel Emelyanov	3577fa2bb8	test: Split sstable_mutation_test Detach test_schema_changes and test_sstable_conforms_to_mutation_source into individual files. These two take ~10 minutes each, what's left in origin finishes within 4 minutes alltogether. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-16 20:26:34 +03:00
Pavel Emelyanov	5b86f4be9a	test: Split secondary_index test Detach test_index_with_paging into individual file. This particular test-case is the longest one in the sute, it takes ~14 minutes to run, further splitting of this test is pointless (for now) and all subsequent splits in this set just make the resulting times less than this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-16 20:26:34 +03:00
Pavel Emelyanov	14de126ff8	migration_manager: Run background schema merge in gate The call for merge_schema_from in some cases is run in the background and thus is not aborted/waited on shutdown. This may result in use-after-free one of which is merge_schema_from -> read_schema_for_keyspace -> db::system_keyspace::query -> storage_proxy::query -> query_partition_key_range_concurrent in the latter function the proxy._token_metadata is accessed, while the respective object can be already free (unlike the storage_proxy itself that's still leaked on shutdown). Related bug: #5903, #5999 (cannot reproduce though) Tests: unit(dev), manual start-stop dtest(consistency.TestConsistency, dev) dtest(schema_management, dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Reviewed-by: Pekka Enberg <penberg@scylladb.com> Message-Id: <20200316150348.31118-1-xemul@scylladb.com>	2020-03-16 17:41:23 +01:00
Avi Kivity	342c967b6a	Merge "Introduce compacting reader" from Botond " Allow adding compacting to any reader pipeline. The intended users are streaming and repair, with the goal to prevent wasting transfer bandwidth with data that is purgeable. No current user in the tree. Tests: unit(dev), mutation_reader_test.compacting_reader_(debug) " 'compacting-reader/v3' of https://github.com/denesb/scylla: test: boost/mutation_reader_test: add unit test for compacting_reader test: lib/flat_mutation_reader_assertions: be more lenient about empty mutations test: lib/mutation_source_test: make data compaction friendly test: random_mutation_generator: add generate_uncompactable mode mutation_reader: introduce compacting_reader	2020-03-16 16:41:50 +02:00
Botond Dénes	837b79c265	test: boost/mutation_reader_test: add unit test for compacting_reader	2020-03-16 13:58:13 +02:00
Botond Dénes	3b482af33d	test: lib/flat_mutation_reader_assertions: be more lenient about empty mutations When expecting a mutation that compacts to an empty one, allow it to be not produced at all. After all, compaction normally doesn't even emits empty partitions.	2020-03-16 13:58:13 +02:00
Botond Dénes	1ab45e15a0	test: lib/mutation_source_test: make data compaction friendly Currently the mutation source test suite may generate data that is compactable. This poses a problem for the next patch, where we want to use it to test `compacting_reader` a reader which compacts data as it reads it. When the input is compactable, this will introduce artificial differences, failing the tests. To allow also testing such readers, make sure data is not compactable, i.e. compacting it will not change it. The goal of the mutation source test suite is not to exercise compaction logic, so this will not take anything away from its value.	2020-03-16 13:58:13 +02:00
Botond Dénes	c4fab16723	test: random_mutation_generator: add generate_uncompactable mode The random mutation generator currently generates data and tombstones with random timestamps selected from a pre-determined range. This results in mutations where tombstones often cover each other and data. There is nothing wrong with this, as this is how real data is too. However for certain tests this is problematic, as compacting the mutations will result in a different mutations. To cater for these users too, introduce a `generate_uncompactable` option. When set to `yes`, the generated mutations will be uncompactable, i.e. no tombstone will cover lower-level tombstones and no tombstone will cover data. The mutations will not change after compacted.	2020-03-16 13:58:13 +02:00
Botond Dénes	8286a0b1bd	mutation_reader: introduce compacting_reader Compacting reader compacts the output of another reader on-the-fly. Performs compaction-type compaction (`compact_for_sstables::yes`). It will be used in streaming and repair to eliminate purgeable data from the stream, thus prevent wasting transfer bandwidth.	2020-03-16 13:58:13 +02:00
Nadav Har'El	35d95d6887	merge: Add postimage implementation Merged pull request https://github.com/scylladb/scylla/pull/5996 from Calle Wilund: Fixes #4992 Implements post-image support by synthesizing it from pre-image + delta. Post-image data differs from the delta data in two ways: 1.) It merges non-atomics into an actual result value 2.) It contains all columns of the row, not just those affected by the update. For a non-atomic field, the post-image value of a column is either the pre-image or the delta (maybe null) Tested by adding post-image checks to pre-image test and collection/udt tests	2020-03-16 13:42:07 +02:00
Calle Wilund	0a3383c090	cdc: Add postimage implementation Fixes #4992 Implements post-image support by synthesizing it from pre-image + delta. Post-image data differs from the delta data in two ways: 1.) It merges non-atomics into an actual result value 2.) It contains _all_ columns of the row, not just those affected by the update. For a non-atomic field, the post-image value of a column is either the pre-image or the delta (maybe null) Tested by adding post-image checks to pre-image test and collection/udt tests	2020-03-16 09:21:06 +00:00
Calle Wilund	40114f8233	cql3::untyped_result_set: Add bytes_view_opt access to fields For quick access and convenient live-checks	2020-03-16 09:21:06 +00:00
Calle Wilund	ca7046256f	schema: Add "columns" accessor for columns by kind To prevent switch-code everywhere.	2020-03-16 09:21:06 +00:00
Avi Kivity	ee9df91a76	Merge "Allow setting partitioner per table" from Piotr " This PR makes it possible to enable the usage of different partitioner for each table. If no table-specific partitioner is set for a given table then a default partitioner is used. The PR is composed of the following parts: - Introduction of schema::get_partitioner that still returns dht::global_partitioner - Replacement of all the usage of dht::global_partitioner with schema::get_partitioner - Making it possible to set table-specific partitioner in a schema_builder - Remove all the places that were setting default partitioner except for main.cc (mostly tests) - Move default partitioner from i_partitioner to schema.cc and hide it from the rest of the codebase - Remove dht::global_partitioner After this PR there's no such thing as global partitioner at all. There is only a default partitioner but it still has to be accessed through schema::get_partitioner. There are some intermediate states in which i_partitioner is stored as shared_ptr in the schema but the final version keeps it by const&. The PR does not enable per table partitioner end-to-end. Just the internals of the single node are covered. I still have to deal with: - Making sure a table has the same partitioner on each node - Allowing user to set up a table-specific partitioner on table - Signal driver about what partitioner is used by a given table - Persist partitioner info for each table that does not use default partitioner. Fixes #5493 Tests: unit(dev, release, debug), dtest(byo) " * 'per_table_partitioner' of https://github.com/haaawk/scylla: schema: drop optional from _partitioner field make_multishard_combining_reader: stop taking partitioner split_range_to_single_shard: stop taking partitioner as argument tests: remove unused murmur3 includes partitioner: move default_partitioner to schema.cc partitioner: hide dht::default_partitioner schema: include partitioner name in scylla tables mutation schema: make it possible to set custom partitioner scylla_tables: add partitioner column schema_features: add PER_TABLE_PARTITIONERS feature features: add PER_TABLE_PARTITIONERS feature	2020-03-16 11:13:47 +02:00
Avi Kivity	cb523c48cd	Update seastar submodule * seastar 47d929dd1...3c498abca (5): > reactor: Use do_with to save stack space > reactor: Extract code into a schedule_retry helper > reactor: Move an io_event buffer out of the stack > temporary_buffer: fix typo in argument type in comparison operators > tests: tls_test: add missing include <iostream>	2020-03-16 11:02:50 +02:00
Rafael Ávila de Espíndola	69874f4330	feature_service: Remove default constructor This makes user that feature_config_from_db_config is used for both tests and main.cc. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200312153453.37282-2-espindola@scylladb.com>	2020-03-16 11:01:15 +02:00
Rafael Ávila de Espíndola	7c26eb61a3	feature_service: Initialize local variable The use of an uninitialized variable was not being noticed because this is only used by main.cc. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200312153453.37282-1-espindola@scylladb.com>	2020-03-16 11:01:15 +02:00
Rafael Ávila de Espíndola	517a01a3f6	utils: Use sstring as keys in nonstatic_class_registry Now that seastar::string::compare has been updated, it is possible to use sstring for this. This reverts commit `01fe766f1f`. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200311005219.280737-1-espindola@scylladb.com>	2020-03-16 11:01:15 +02:00
Rafael Ávila de Espíndola	624573a219	configure: Warn on large stacks This adds a warning with a different limit in each mode. The limit is picked as 1KiB lower than the value where no warning would be print. This makes it easy to spot the worse offender. With that we can either fix it or silence the warning once we are sure we can handle large frames in that context. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200311205300.324383-1-espindola@scylladb.com>	2020-03-16 11:01:15 +02:00
Piotr Sarna	f43e68b383	alternator: hook admission control to alternator server From now on, alternator requests use the memory limiter semaphore to control the amount of memory used by alternator requests.	2020-03-16 08:43:49 +01:00
Piotr Sarna	7eb6d5545d	alternator: add addmission control stats entry The entry will be bumped if admission control was forced to block the request from being served.	2020-03-16 07:44:26 +01:00
Piotr Sarna	a1ea650d83	alternator: add memory limiter to alternator server With the memory limiter semaphore, the server will be able to apply admission control to alternator requets.	2020-03-16 07:44:26 +01:00
Piotr Sarna	781fbe8070	alternator: add service permit to callbacks As a first step towards introducing admission control, the API of alternator callbacks is extended with an additional 'permit' parameter.	2020-03-16 07:44:25 +01:00
Piotr Sarna	cb5fded9c2	storage_service: add memory limiter semaphore getter The memory limiter semaphore is going to be useful for limiting alternator memory as well, so it's hereby exposed via a getter.	2020-03-16 07:34:23 +01:00
Raphael S. Carvalho	458ef4bb06	sstables: Fix incorrect calculation of Compaction Backlog The bug is that we failed to implement this part of the formula: (T - C) * log4(T) We were incorrectly implementing it as: (T - C) * log4(T - C) So it could result in a backlog being calculated as negative when it should actually be positive, or backlog being lower than expected. BTW, we do protect against negative backlog after commit `3e08bd17f0`. Given that STCS backlog tracker is inherited by TWCS and LCS trackers, all compaction strategies are affected. The formula to calculate the aggregate backlog is: A = (T - C) * log4(T) - Sum(i = 0...N) { (Si - Ci)* log4(Si) }. For example, negative backlog is calculated on a tested scenario where T was 3129, C was 2337 and Sum(i = 0...N) { (Si - Ci)* log4(Si) } resulted in 4222.53. (T - C) * log4(T - C) = (3129 - 2337) * log4(3129 - 2337) = 3813.23 So backlog is negative because A = 3813.23 - 4222.53 = -409.302 But it should actually be calculated as follow: (T - C) * log4(T) = (3129 - 2337) * log4(3129) = 4598.15 And the correct backlog is positive, as A = 4598.15 - 4223.53 = 375.621 Fixes #6021. tests: unit(dev) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200315153711.23302-1-raphaelsc@scylladb.com>	2020-03-15 18:16:01 +02:00
Kamil Braun	aa72a1c556	cql3: when altering table, keep old values of unchanged extensions When the user performed alter ks.t with compaction = {...} the values of most other options, which were not specified in the statement, e.g. compression, were left unchanged. That wasn't true for extension options however: for example, the "cdc" option was removed. This commit fixes the behavior to keep the old values of extension options not specified in the alter statement.	2020-03-15 17:45:30 +02:00
Piotr Dulikowski	b1e8170bf9	cdc: add tracing Adds information about the stages of CDC mutation augmentation to tracing sessions.	2020-03-15 11:54:10 +01:00
Asias He	7ac9e0f2a1	gossip: Print CDC_STREAMS_TIMESTAMP correctly I saw UNKNOWN application state in the logs: INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=CACHE_HITRATES, versioned_value=Value(,14) INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=SCHEMA_TABLES_VERSION, versioned_value=Value(3,15) INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=RPC_READY, versioned_value=Value(0,16) INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(,17) INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=SHARD_COUNT, versioned_value=Value(1,30) INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=IGNOR_MSB_BITS, versioned_value=Value(12,31) INFO 2020-03-06 11:09:48,931 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=UNKNOWN, versioned_value=Value(1583371936128,20) It turned out it was CDC_STREAMS_TIMESTAMP. $ nodetool gossipinfo\|grep 1583371936128 X8:1583371936128 X8:1583371936128 Fixes #5992	2020-03-15 11:51:35 +01:00
Piotr Jastrzebski	5bbb826c49	schema: drop optional from _partitioner field Always set the field to the default value if no table specific partitioner has been set. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:21 +01:00
Piotr Jastrzebski	924ed7bb1c	make_multishard_combining_reader: stop taking partitioner The function already takes schema so there's no need for it to take partitioner. It can be obtained using schema::get_partitioner Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	4b7fb323c3	split_range_to_single_shard: stop taking partitioner as argument The function already takes schema so we don't need partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	f99fd35f53	tests: remove unused murmur3 includes Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	22daa262ee	partitioner: move default_partitioner to schema.cc Make it inaccessible to other compilation units. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	7064f6b831	partitioner: hide dht::default_partitioner Remove last usage of this global outside i_partitioner.cc and hide it inside the compilation unit. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	57b69fb804	schema: include partitioner name in scylla tables mutation There are two results of this patch: 1. New partitioner name column is persited on node's disk in scylla_tables 2. New partitioner name column is included into schema digest This is achieved by including this new column in scylla tables mutation. For that we: 1. Add partitioner name to the result of make_scylla_tables_mutation. If table does not have a specific partitioner set and uses default partitioner then we don't include the name of such default partitioner. Only the name of custom partitioner is added if a table has one. 2. In create_table_from_mutations we check whether scylla tables mutation has a partitioner name set. If so then we use it as a parameter for schema_builder. Note that previous patches have ensured that this new column will be included into schema digest only after the whole cluster supports per table partitioners. Before that, during rolling upgrade, new partitioner name column is hidden and not shared with other nodes. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	1d6cec1b0a	schema: make it possible to set custom partitioner schema_builder::with_partitioner can be used now to set custom partitioner on a table. If no such partitioner is set, global partitioner is still used. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	f83ff8fda1	scylla_tables: add partitioner column Following commits make it possible to set a specific partitioner for a table. We want to persist that information and include it into schema digest. For that a new column in scylla_tables is needed. This commit adds such column. We add the new column to scylla_tables because it's a Scylla specific extension. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	782f2caf41	schema_features: add PER_TABLE_PARTITIONERS feature With per table partitioners, partitioner name will be a part of table schema. To allow rolling upgrade we need to perform special logic that hides new partitioner name schema column during the upgrade. This commit adds new schema feature that controls this logic. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	90df9a44ce	features: add PER_TABLE_PARTITIONERS feature This new feature is required because we now allow setting partitioner per table. This will influence the digest of table schema so we must not include partitioner name into the digest unless we know that the whole cluster already supports per table partitioners. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Botond Dénes	5207f530ba	scylla-gdb.py: scylla smp-queues: ignore unresolvable/unmatching symbols Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200313160444.320253-1-bdenes@scylladb.com>	2020-03-15 10:41:16 +02:00
Botond Dénes	a85c3aa839	scylla-gdb.py: introduce sharded_local convenience function To conveniently retrieve the local instance of a sharded object. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200313160106.319743-1-bdenes@scylladb.com>	2020-03-15 10:41:16 +02:00
Botond Dénes	0e9df01ba3	scylla-gdb.py: downcast_vptr(): make compatible with python < 3.6 Subscript operation `__getitem__()` was only added to re.match objects in 3.6. To support previous versions, use `groups()` method to obtain the desired group. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200313160025.319464-1-bdenes@scylladb.com>	2020-03-15 10:41:15 +02:00
Nadav Har'El	635e6d887c	materialized views: fix corner case of view updates used by Alternator While CQL does not allow creation of a materialized view with more than one base regular column in the view's key, in Alternator we do allow this - both partition and clustering key may be a base regular column. We had a bug in the logic handling this case: If the new base row is missing a value for one of the view key columns, we shouldn't create a view row. Similarly, if the existing base row was missing a value for one of the view key columns, a view row does not exist and doesn't need to be deleted. This was done incorrectly, and made decisions based on just one of the key columns, and the logic is now fixed (and I think, simplified) in this patch. With this patch, the Alternator test which previously failed because of this problem now passes. The patch also includes new tests in the existing C++ unit test test_view_with_two_regular_base_columns_in_key. This tests was already supposed to be testing various cases of two-new-key-columns updates, but missed the cases explained above. These new tests failed badly before this patch - some of them had clean write errors, others caused crashes. With this patch, they pass. Fixes #6008. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200312162503.8944-1-nyh@scylladb.com>	2020-03-15 07:57:33 +01:00
Avi Kivity	07ddbf6e54	Merge "Reduce our dependence on sstring" from Rafael " It doesn't look like we will be able to switch to std::string just yet, but when it is not too inconvenient we can try to reduce our dependence so that attempting the switch again in the future is easier. " * 'espindola/sstring-api' of https://github.com/espindola/scylla: redis: Use scattered_message::append(std::string_view) everywhere: Use uninitialized_string instead of sstring::initialized_later compressor: Add an explicit cast to const sstring& everywhere: Be more explicit that we don't want std::make_shared cql3: Don't use sstring::reset everywhere: Don't assume sstring::begin() and sstring::end() are pointers	2020-03-14 16:29:42 +02:00
Avi Kivity	6b747f4673	database: avoid creating thread in make_directory_for_column_family() make_directory_for_column_family() is used in a parallel_for_each() in parse_system_tables(). Because parallel_for_each does not preempt in the initial execution of its input function, and because each thread allocates 128k for the stack, we end up allocating many hundreds of megabytes if there are many tables. This happens early during boot and will only cause problems if there are 5,000 tables per gigabyte of shard memory, and unlikely combination that will probably fail later, but still it is better to avoid unnecessary large allocations. This was developed in order to fix #6003, until it was discovered that `c020b4e5e2` ("logalloc: increase capacity of _regions vector outside reclaim lock") is the real fix. Message-Id: <20200313093603.1366502-1-avi@scylladb.com>	2020-03-13 13:46:45 +02:00
Rafael Ávila de Espíndola	a1ca83b067	gms: Fix static initialization order problem In test_services.cc there is gms::feature_service test_feature_service; And the feature_service constructor has , _lwt_feature(*this, features::LWT) But features::LWT is a global sstring constructed in another file. Solve the problem by making the feature strings constexpr std::string_view. I found the issue while trying to benchmark the std::string switch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Acked-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200309225749.36661-1-espindola@scylladb.com>	2020-03-13 12:37:22 +02:00
Botond Dénes	13e20fe6be	scylla-gdb.py: scylla_memory: update w.r.t. per-sg coordinator stats Since `b783d40aa` storage-proxy maintains separate coordinator stats per scheduling group. This broke scylla_memory, which was still trying to access the old global stats. Update it to print the new per-scheduling group stats when they are available and the old global ones when not. Scheduling groups for which all relevant metrics are 0 are omitted from the printout to reduce noise.	2020-03-13 10:57:51 +02:00
Botond Dénes	ca84c2f566	scylla-gdb.py: scylla_memory: move coordinator code to print_coordinator_stats() This code will have to be revamped. While at it move it to its own method to reduce the clutter in `invoke()`.	2020-03-13 10:54:01 +02:00
Avi Kivity	7311d1b177	Update seastar submodule * seastar 664c911b4c...47d929dd1b (6): > sstring: Simplify operator= > sstring: Deprecate reset > sstring: Pass string_view to compare > sstring: Move exception code out of line > reactor: remove unused variable > reactor: always initialize smp_poller	2020-03-12 21:37:05 +01:00
Piotr Sarna	e8871181eb	scripts: add a script for pulling GitHub pull requests In order to avoid the UI merge button which tends to mess up commit authors, a simple script for pulling a PR from GitHub is added. Example usage: $ git fetch; git checkout origin/next $ ./scripts/pull_github_pr.sh 6007 Message-Id: <1fa79c8be47b5660fc24a81fc0ab381aa26d98af.1584014944.git.sarna@scylladb.com>	2020-03-12 21:37:05 +01:00
Raphael S. Carvalho	34426d1497	sstables: Fix off-by-one when checking for max_data_segregation_window_count Make sure max size of known windows will respect max_d_s_w_c. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200305165014.16022-1-raphaelsc@scylladb.com>	2020-03-12 14:11:18 +02:00
Nadav Har'El	8e4520b2b3	alternator-test: add xfailing test for issue 6008 This patch adds a test, test_gsi.py::test_gsi_missing_attribute_3, reproducing issue #6008. The issue is about a GSI with two regular base columns becoming key columns in a view, and we have a write failure when writing an item with one of these attributes missing. The test passes on DynamoDB, currently xfails on Alternator. Refs #6008. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200312064131.16046-1-nyh@scylladb.com>	2020-03-12 10:07:58 +01:00
Nadav Har'El	77444a38a1	alternator: allow consistent reads on LSI - but not on GSI Recently, Materialized Views were modified (see issue #4365) so that local view updates (when both base and view replicas are the same node) are synchronous. In particular, when the view's partition key is the same as the base table's, view writes are synchronous: A write now only returns after CL copies of the view data have been written. Alternator's LSI have exactly this case (same partition key as the base). This makes strongly-consistent (CL=LOCAL_QUORUM) reads in Alternator work correctly, so we update the documentation accordingly to no longer say that we don't support this DynamoDB feature. However unlike LSIs, for GSIs strongly-consistent reads are still not supported, and should not be supported (they are also not supported by DynamoDB). Such reads should generate an error. So this patch fixes this too. A GSI test which tested that strongly consistent reads are forbidden, which used to xfail, now passes so the patch removes the "xfail". Finally, we can simplify the LSI tests by using consistent reads instead of eventually-consistent reads with retries. Beyond simplifying the test, it's also an opportunity to use strongly-consistent reads and make sure that they work (while, as mentioned above, similar reads for GSIs are refused). Fixes #5007 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200311170446.28611-1-nyh@scylladb.com>	2020-03-12 09:18:00 +01:00
Takuya ASADA	086f0ffd5a	scylla_raid_setup: create missing directories We need to create hints, view_hints, saved_caches directories on RAID volume. Fixes #5811	2020-03-12 09:29:29 +02:00
Takuya ASADA	399ff24efd	docker: apply scylla-jmx sysconfig file on scylla-jmx service Apply scylla-jmx sysconfig file on scyla-jmx service, to allow customize jmx parameter. Fixes #5939	2020-03-12 09:27:23 +02:00
Avi Kivity	86415cf98a	Update seastar submodule * seastar 95f4277c16...664c911b4c (4): > tls_test: Use uninitialized_string instead of initialized_later > tls: Fix race and stale memory use in delayed shutdown Fixes #5759 (maybe) > tls: Re-enable TLS test and fix build+run > tls: Set server name for client connection if available	2020-03-11 19:25:36 +02:00
Avi Kivity	c020b4e5e2	logalloc: increase capacity of _regions vector outside reclaim lock Reclaim consults the _regions vector, so we don't want it moving around while allocating more capacity. For that we take the reclaim lock. However, that can cause a false-positive OOM during startup: 1. all memory is allocated to LSA as part of priming (`2baa16b371`) 2. the _regions vector is resized from 64k to 128k, requiring a segment to be freed (plenty are free) 3. but reclaiming_lock is taken, so we cannot reclaim anything. To fix, resize the _regions vector outside the lock. Fixes #6003. Message-Id: <20200311091217.1112081-1-avi@scylladb.com>	2020-03-11 12:29:31 +02:00
Botond Dénes	931d2fca45	scylla-gdb.py: std_list: __len__(): support C++11 ABI In theory the C++11 ABI should already have a size field but it does not in the version of the C++ standard library shipped with scylla 2019.1. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200225162337.112582-1-bdenes@scylladb.com>	2020-03-11 10:51:05 +02:00
Botond Dénes	0909dd3d11	scylla-gdb.py: scylla_sstables: fix copypasta in name passed to argparse The description is probably from the command this snippet was copied from originally. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200310141025.90051-1-bdenes@scylladb.com>	2020-03-11 10:49:34 +02:00
Botond Dénes	10944689bc	scylla-gdb.py: resolve(): don't attempt to match failed symbols Currently if `startswith` is passed to `resolve()` it will unconditionally try to match the resolved symbol name against it. This will of course fail when the symbols fails to resolve and `name` is `None`. Return early when this happens to prevent the unnecessary prefix matching. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200310140918.88928-1-bdenes@scylladb.com>	2020-03-11 10:48:44 +02:00
Botond Dénes	0da517ca93	scylla-gdb.py: get_text_range(): make compatible with >=3.0 The current method of obtaining the text range based on a known vptr (`reactor::_backend`) was based on branch-2019.1, where `reactor::_backend` is a value member. However in >=3.0 `reactor::_backend` is a `std::unique_ptr<>`. Adapt the code to work for both. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200310135957.86261-1-bdenes@scylladb.com>	2020-03-11 10:46:40 +02:00
Nadav Har'El	8d161cac87	merge: Allow synchronous view updates for local views Merged patch series by Piotr Sarna: This series makes view updates synchronous, as long as the update is going to be applied locally. With this feature, local secondary indexes and, more generally, materialized views with partition keys same as in the base table could enjoy more robust consistency. This series comes with a cql test, not common for materialized views, which usually require eventual consistency checks. With synchronous updates however, the test can simply check view values right after updating the base table. Fixes #4365 Refs #5007 Tests: unit(dev), manually via inserting sleeps and debug messages, to make sure that local view updates are actually waited for Piotr Sarna (4): db,view: drop default parameter for mutate_MV::allow_hints db,view: move putting view updates to background to mutate_MV db,view: perform local view updates synchronously test: add a simple test for synchronous local view updates	2020-03-11 10:29:16 +02:00
Piotr Sarna	8d2555673f	test: add a simple test for synchronous local view updates With synchronous local view updates enabled, local materialized views can be queried right after base table insertions, without the risk of reading stale values.	2020-03-11 09:15:57 +01:00
Piotr Sarna	2061e6a9cc	db,view: perform local view updates synchronously Local view updates (updates applied to a local node, without remote communication) are from now on performed synchronously - which adds consistency guarantees, as a local write failure will be returned to the client instead of being silently ignored.	2020-03-11 09:05:56 +01:00
Piotr Sarna	fd49fd773c	db,view: move putting view updates to background to mutate_MV Currently, launching view updates as an asynchronous background job is done via not waiting for mutate_MV() future in table::generate_and_propagate_view_updates. That has a big downside, since mutate_MV() handles all view updates for all views of a table, so it's not possible to wait for each view independently. Per-view granularity is required in order to implement synchronous view updates of local views - because then we'll synchronously wait for all views that write to a local node (due to having a matching partition key with the base), while remote view updates will still be sent asynchronously. In order to do that, instead of not waiting for mutate_MV, we do wait for it properly, but instead launch the asynchronous, unwaited-for futures inside mutate_MV. Effectively that means no changes for view updates so far - all updates will be fired in the background. Later, another patch will introduce a way to wait for selected updates to finish.	2020-03-11 09:05:56 +01:00
Piotr Sarna	3b3659e8cd	db,view: drop default parameter for mutate_MV::allow_hints Default parameters are considered harmful, and as part of a cleanup before editing view.cc code, a default value for allow_hints parameter is removed.	2020-03-11 09:05:56 +01:00
Rafael Ávila de Espíndola	d5bcb5a974	redis: Use scattered_message::append(std::string_view) This just moves the copy to append instead of doing it in the caller. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:18:54 -07:00
Rafael Ávila de Espíndola	80d969ce31	everywhere: Use uninitialized_string instead of sstring::initialized_later This is just a trivial wrapper over initialized_later when using sstring, but also works when std::string is used. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:17:49 -07:00
Rafael Ávila de Espíndola	76f4fee65b	compressor: Add an explicit cast to const sstring& Some difference on how exactly the operator== is declared for sstring versus std::string requires this change if we convert from sstring to std::string. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:13:48 -07:00
Rafael Ávila de Espíndola	c0072eab30	everywhere: Be more explicit that we don't want std::make_shared If sstring is made an alias to std::string ADL causes std::make_shared to be found. Explicitly ask for ::make_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:13:48 -07:00
Rafael Ávila de Espíndola	ad9f17bd92	cql3: Don't use sstring::reset There is no reset in std::string, so don't depend on a sstring only feature. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:13:48 -07:00
Rafael Ávila de Espíndola	caef2ef903	everywhere: Don't assume sstring::begin() and sstring::end() are pointers If we switch to using std::string we have to handle begin and end returning iterators. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:13:48 -07:00
Avi Kivity	0cb7182768	Update seastar submodule * seastar 5eaec672a2...95f4277c16 (1): > Merge "Add an option for making sstring an alias to std::string" from Rafael	2020-03-10 18:38:37 +02:00
Gleb Natapov	cd73f552b9	storage_service, database: do not move sharded services It may be not safe to move sharded services, so it will be prohibited in the future seastar update. Remove all current cases where we do it. Fixes #5814. Message-Id: <20200301095423.GY434@scylladb.com>	2020-03-10 12:51:02 +02:00
Tomasz Grabiec	3548e85ff7	Merge "features: Properly resolve when_enabled futures on stop" from Pavel E. If the feature service is stopped without enabling some features, the latrer may end up with "broken promise" exception on futures attached to the _pr promise. Fix this by switching the only user of it onto 'listener' API and remove future-based one. Tests: unit(debug), manual start-stop and aborted-start	2020-03-10 10:09:24 +02:00
Juliusz Stasiewicz	3cc3233281	test/cdc: test that LWT generates CDC logs Tests #5952 Refs #5869	2020-03-10 08:33:49 +01:00
Raphael S. Carvalho	899bb230e2	sstable_resharding_test: fix sstable_resharding_strategy_tests with odd smp count leveled_compaction_strategy_strategy::get_resharding_jobs() returns compaction jobs, each containing at most smp::count ssts, so calculation is wrong if smp count is an odd number. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Acked-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200305161003.14424-1-raphaelsc@scylladb.com>	2020-03-09 17:52:53 +02:00
Raphael S. Carvalho	d895f5e131	sstables/stcs: kill FIXME For the purpose of determining size tiers, it doesn't matter whether bytes_on_disk() or data_size() is used. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200302142513.10136-1-raphaelsc@scylladb.com>	2020-03-09 15:47:48 +02:00
Avi Kivity	8af6dabbf0	Merge "Decouple cql_config from storage_service" from Pavel E " The cql_configu is needed by storage_service to feed it to thrift/transport servers. These servers, in turn, put the config onto query_options. The final goal of this config reference is the guts of query_processor (but currently it's only used by restrictions) This way is rather long and confusing. It seems more natural to keep the cql_config on it's main "user" -- query processor. This patch set does so. However, in order to push the config into its current usage places a huge refactoring is needed -- most of the classes in cql3/statements and cql3/restrictions. It's much more handy to contunue keeping it via query_options, so the query_processor is equipped with the method to return the reference on the config to those initializing query_options. Tests: unit(debug) " * 'br-clean-client-services-from-cql-config-2' of https://github.com/xemul/scylla: storage_service: Forget cql_config transport: Forget cql_config thrift: Forget cql_config query_processor: Carry reference on cql_config	2020-03-09 15:06:59 +02:00
Calle Wilund	5c743bfd53	cdc: rename inner "process_cells" to avoid confusion Two lambdas should not share name in same function.	2020-03-09 13:06:32 +00:00
Konstantin Osipov	9c009441e0	test.py: do not override environment options Do not reset user-defined environment options for ASAN with test.py flags. Message-Id: <20200306135714.3380-1-kostja@scylladb.com>	2020-03-09 14:56:09 +02:00
Piotr Dulikowski	5f652e58c1	cdc: allow dropping manually created tables with cdc log suffix The is_log_for_some_table function incorrectly assumed that database::find_schema would return a null pointer in case the queried schema does not exist. This patch fixes that, and now this function checks for existence of the schema using database::has_schema. Tests: unit(dev)	2020-03-09 12:17:13 +01:00
Asias He	6a7c3f0af0	repair: Stop the nodes that have run repair_row_level_start It is ok to run repair_row_level_stop unconditionally. The node that hasn't received the repair_row_level_start will simply return an error that the repair_meta_id is not found. To avoid the unnecessary repair_row_level_stop verb, we can stop the nodes have run repair_row_level_start. This also makes the error message less confusing. For example: Before: INFO 2020-03-09 15:55:43,369 [shard 0] repair - repair id 1 on shard 0 failed: std::runtime_error (get_repair_meta: repair_meta_id 8 for node 127.0.0.4 does not exist) INFO 2020-03-09 15:55:43,369 [shard 0] repair - repair id 1 failed: std::runtime_error ({shard 0: std::runtime_error (get_repair_meta: repair_meta_id 8 for node 127.0.0.4 does not exist)}) WARN 2020-03-09 15:55:43,369 [shard 0] repair - repair id 1 to sync data for keyspace=ks, status=failed, keyspace does not exist any more, ignoring it: std::runtime_error ({shard 0: std::runtime_error (get_repair_meta: repair_meta_id 8 for node 127.0.0.4 does not exist)}) After: INFO 2020-03-09 16:09:09,217 [shard 0] repair - repair id 1 on shard 0 failed: std::runtime_error (Failed to repair for keyspace=ks, cf=cf, range=(9041860168177642466, 9044815446631222376]) INFO 2020-03-09 16:09:09,217 [shard 0] repair - repair id 1 failed: std::runtime_error ({shard 0: std::runtime_error (Failed to repair for keyspace=ks, cf=cf, range=(9041860168177642466, 9044815446631222376])}) WARN 2020-03-09 16:09:09,217 [shard 0] repair - repair id 1 to sync data for keyspace=ks, status=failed, keyspace does not exist any more, ignoring it: std::runtime_error ({shard 0: std::runtime_error (Failed to repair for keyspace=ks, cf=cf, range=(9041860168177642466, 9044815446631222376])}) Refs #5942	2020-03-09 18:24:02 +08:00
Asias He	75cf255c67	repair: Ignore keyspace that is removed in sync_data_using_repair When a keyspace is removed during node operations, we should not fail the whole operation. Ignore the keyspace that is removed. Fixes #5942	2020-03-09 18:24:02 +08:00
Pavel Emelyanov	0298a6270e	storage_service: Forget cql_config It needs the config purely to feed one into thrift/transport server, since the latter two no longer needs one, neither does the former. As a nice side effect -- some tests no longer have to carry the cql_config on board. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-09 11:58:06 +03:00
Pavel Emelyanov	1af8ab80eb	transport: Forget cql_config The cql_server already works with query_processor from which it can get the cql_configu. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-09 11:57:30 +03:00
Pavel Emelyanov	d551f0323a	thrift: Forget cql_config The thrift handlers already mess with query_processor which has the config in question. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-09 11:57:30 +03:00
Pavel Emelyanov	0a9a5a2dd7	query_processor: Carry reference on cql_config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-09 11:57:28 +03:00
Pavel Emelyanov	7f2fc837cb	config: Place timeout_config() into own .cc file It's a generic helper that's used by transport, thrift and redis (this guy has own copy of the code). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200306114022.8070-1-xemul@scylladb.com>	2020-03-08 17:57:58 +02:00
Avi Kivity	de1b20ff7c	Update seastar submodule * seastar affc3a5107...5eaec672a2 (12): > test_thread_custom_stack_size_failure: Use a larger custom stack > test_thread_custom_stack_size: Use a larger custom stack > log: correct help message > perftune.py: verify NIC existence > Merge "Fix various memory issues in http" from Rafael > build: Fix IN_LIST usage > future: Disable -Wuninitialized on a particular memcpy > build: use IN_LIST for shorter cmake > build: check support of "-fstack-clash-protection" before using it > configure.py: Add "--verbose" flag > configure.py: Make "cmake" command line human-readable > net: dynamically adjust buffer sizes for posix connected_socket read operations	2020-03-08 17:34:16 +02:00
Benny Halevy	a89fb0abd9	main: log "Startup failed" message as error To make it stand out and be detectable by dtests. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Acked-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303160725.235959-1-bhalevy@scylladb.com>	2020-03-08 17:33:50 +02:00
Konstantin Osipov	ac6f64a885	locator: correctly select endpoints if RF=0 SimpleStrategy creates a list of endpoints by iterating over the set of all configured endpoints for the given token, until we reach keyspace replication factor. There is a trivial coding bug when we first add at least one endpoint to the list, and then compare list size and replication factor. If RF=0 this never yields true. Fix by moving the RF check before at least one endpoint is added to the list. Cassandra never had this bug since it uses a less fancy while() loop. Fixes #5962 Message-Id: <20200306193729.130266-1-kostja@scylladb.com>	2020-03-08 16:53:01 +02:00
Calle Wilund	0b34d88957	db::commitlog: Don't write trailing zero block unless needed Fixes #5899 When terminating (closing) a segment, we write a trailing block of zero so reader can have an empty region after last used chunk as end marker. This is due to using recycled, pre-allocated segments with potentially non-zero data extending over the point where we are ending the segment (i.e. we are not fully filling the segment due to a huge mutation or similar). However, if we reach end of segment writing the final block (typically many small mutations), the file will end naturally after the data written, and any trailing zero block would in fact just extend the file further. While this will only happen once per segment recycled (independent on how many times it is recycled), it is still both slightly breaking the disk usage contract and also potentially causing some disk stalls due to metadata changes (though of course very infrequent). We should only write trailing zero if we are below the max_size file size when terminating Adds a small size check to commitlog test to verify size bounds. (Which breaks without the patch) Message-Id: <20200226121601.15347-2-calle@scylladb.com>	2020-03-08 16:51:53 +02:00
Konstantin Osipov	b4b08be0e1	test: add a test case for rare replication configurations Introduce a test which checks how different CQL features (DML, LWT, MV) work when no replicas are available (e.g. because they are all in an unavailable data center). Specifically the test checks that when we SELECT with IN clause and there are no available replicas, there is no crash (#5935). Message-Id: <20200306192521.73486-3-kostja@scylladb.com>	2020-03-08 15:11:08 +02:00
Konstantin Osipov	9827efe554	storage_proxy: do not touch all_replicas.front() if it's empty. The list of all endpoints for a query can be empty if we have replication_factor 0 or there are no live endpoints for this token. Do not access all_replicas.front() in this case. Fixes #5935. Message-Id: <20200306192521.73486-2-kostja@scylladb.com>	2020-03-08 15:11:02 +02:00
Nadav Har'El	6febd4199e	merge: cdc: on row delete, show the whole row as preimage Merged pull request https://github.com/scylladb/scylla/pull/5980 by Piotr Jastrzębski, based on https://github.com/scylladb/scylla/pull/5976 by Juliusz Stasiewicz: "If base mutation has at least one row tombstone, its preimage log entry displays all the base columns." Fixes #5709 Tests: unit(dev)	2020-03-08 14:54:59 +02:00
Juliusz Stasiewicz	49f1a24472	tests/cdc: test preimage on row delete Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-08 13:27:49 +01:00
Juliusz Stasiewicz	68071d35ce	cdc: on row delete display the entire row as preimage If base mutation has at least one row tombstone, its preimage log entry is constructed from all the base columns. Fixes #5709 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-08 12:11:07 +01:00
Piotr Dulikowski	0e413efb48	cdc: correct static row preimage for case with no clustering row In case a static and a clustering row is written at the same time, but a clustering row with given key was not present, the preimage query was incorrectly configured and no rows were returned. This resulted in an empty preimage, while a preimage for static row should be present. This patch fixes this and now the static row is correctly written to cdc log in the case above. Tests: unit(dev)	2020-03-08 09:25:45 +01:00
Piotr Sarna	395c7eeb98	Merge ' cdc: disallow creating nested cdc logs' from Piotr This change disallows creating CDC log tables for already existing CDC log tables. CDC logs nested in that way are not really useful and do not work at the moment, therefore disallowing their creation prevents confusion. Fixes #5967 Tests: unit(dev) * piodul/5967-disallow-nested-cdc-logs: cdc: disallow creating nested CDC logs cql_repl: register schema extensions	2020-03-08 09:22:59 +01:00
Juliusz Stasiewicz	e2b76fd559	cdc: move the extractor of `pirow` columns into separate method Because it will be used more than once.	2020-03-06 17:54:42 +01:00
Piotr Sarna	be293523bd	Merge 'Replace dht::global_partitioner() calls with... ... schema::get_partitioner and make schema::get_partitioner return const&' from Piotr Partitioners returned from get_partitioner are shared and not supposed to be changed so let's use the type system to enforce that. dht::global_partitioner() is deprecated and will be removed as soon as custom partitioners are implemented so it's best to replace it with schema::get_partitioner. Tests: unit(dev) * hawk/global_partitioner_cleanup: schema: get_partitioner return const& compaction_manager: stop calling dht::global_partitioner() sstable_datafile_test: stop calling dht::global_partitioner()	2020-03-06 14:36:03 +01:00
Piotr Jastrzebski	54d24553bb	schema: get_partitioner return const& Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 13:33:53 +01:00
Piotr Jastrzebski	22fac03184	compaction_manager: stop calling dht::global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 13:33:53 +01:00
Piotr Jastrzebski	08ebf1f69d	sstable_datafile_test: stop calling dht::global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 13:33:53 +01:00
Piotr Jastrzebski	968177da04	cdc: store tokens in cdc description as longs Previously the tokens were stored as strings because token could have been represented in multiple ways. Now token representation is always int64_t so we can store them as ints in cdc description as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 11:59:59 +01:00
Piotr Dulikowski	f317283578	cdc: disallow creating nested CDC logs This change disallows creating CDC log tables for already existing CDC log tables. CDC logs nested in that way are not really useful and do not work at the moment, therefore disallowing their creation prevents confusion.	2020-03-06 10:47:13 +01:00
Piotr Dulikowski	75284eb2a5	cql_repl: register schema extensions Alternator and CDC, apart from enabling their experimental features, need to have their schema extensions registered. This patch adds missing registration of schema extensions to cql_repl, so that cql tests written with Alternator or CDC in mind will properly work.	2020-03-06 10:31:07 +01:00
Piotr Sarna	d1db198211	Merge ' Allow repeated LIKE on same column' from Dejan Fixes #5902 by making the LIKE restriction keep a vector of matchers and apply them all to the column value. Tests: unit (dev) * dekimir/multiple-likes: cql3: Allow repeated LIKE on same column cql3: Forbid calling LIKE::values() cql3: Move LIKE::_last_pattern to matcher	2020-03-06 09:55:54 +01:00
Piotr Sarna	22798f7b7b	locator: fix validating replication factor In order to properly validate not only network topology strategy, but also other strategies, the checks are moved straight to validate_replication_factor(). Also, the test case is extended with a too long integer and a check for SimpleStrategy replication factor. Fixes #3801 Tests: unit(dev) Message-Id: <e0c3c3c36c589e1d440c9708a6dce820c111b8da.1583483602.git.sarna@scylladb.com>	2020-03-06 10:39:34 +02:00
Konstantin Osipov	848195125c	test.py: check test xml output Check that XML output of a test is valid and warn otherwise. The following tests currently produce a warning: boost/multishard_mutation_query_test Message-Id: <20200305213501.52279-2-kostja@scylladb.com>	2020-03-06 10:05:28 +02:00
Piotr Sarna	6df132436f	cql3: disallow range deletions for specific columns Range deletions of specific columns are not well-defined (range tombstones cover entire rows) and are forbidden in Cassandra, so we follow suit. This commit comes with a simple test. Fixes #5728 Tests: unit(dev) Message-Id: <896264f5f5790b9f96fcc18655ac3248a6abf37a.1583424131.git.sarna@scylladb.com>	2020-03-06 10:04:05 +02:00
Piotr Sarna	5b7a35e02b	network_topology_strategy: validate integers In order to prevent users from creating a network topology strategy instance with invalid inputs, it's not enough to use std::stol() on the input: a string "3abc" still returns the number '3', but will later confuse cqlsh and other drivers, when they ask for topology strategy details. The error message is now more human readable, since for incorrect numeric inputs it used to return a rather cryptic message: ServerError: stol() This commit fixes the issue and comes with a simple test. Fixes #3801 Tests: unit(dev) Message-Id: <7aaae83d003738f047d28727430ca0a5cec6b9c6.1583478000.git.sarna@scylladb.com>	2020-03-06 09:50:33 +02:00
Piotr Sarna	30d2826358	Merge 'cdc: use `cdc` schema extension for storing... ... and reading cdc metadata' from Piotr Currently, information on what cdc options are enabled in a table - cdc metadata in short - is stored in two places: In cdc column of the system_schema.scylla_tables, In a cdc schema extension. The former is used as a source of truth, i.e. a node reads cdc metadata from that column, while the latter is used for cosmetic purposes (e.g. cqlsh displays info on cdc based on this extension) and is only written, but never read by the node. Introducing the cdc column to scylla_tables made the logic of schema agreement more complicated. As a first step of removing this column, this PR makes the cdc schema extension as the "source of truth" - a node will from now on read cdc metadata from that extension. The cdc column will be deprecated and removed in subsequent releases, but it is left for now and will still be written to in order not to break the logic of schema agreement. Acked-by: Nadav Har-El <nyh@scylladb.com> Refs: #5737 Tests: unit(dev), 2-node cluster upgrade under write load to a cdc-enabled table * piodul/5737-cdc-schema-extension: schema: get cdc options from schema extensions alter_table_statement: fix indentation cf_prop_defs: initialize schema extensions externally cf_prop_defs: move checking of cdc support to ::validate cf_prop_defs: pass database& to ::validate, not db::extensions& unit tests: register cdc extension before tests cdc: construct cdc_options directly inside cdc_extension db::extensions: add shorthands for add_schema_extension	2020-03-05 16:31:40 +01:00
Piotr Dulikowski	861c7b5626	schema: get cdc options from schema extensions Removes logic responsible for setting cdc_options from dedicated column in scylla_tables, and uses the "cdc" schema extension instead.	2020-03-05 16:11:21 +01:00
Piotr Dulikowski	e98766dd81	alter_table_statement: fix indentation	2020-03-05 16:11:21 +01:00
Piotr Dulikowski	828077be5e	cf_prop_defs: initialize schema extensions externally Moves initialization of schema extensions outside of cf_prop_defs. This allows to construct these extensions once, and use them several times in cd_prop_defs' methods without caching or recalculating them several times.	2020-03-05 16:11:21 +01:00
Piotr Dulikowski	0bdc22e33b	cf_prop_defs: move checking of cdc support to ::validate Validation of CDC options fits better into the `validate` method rather than `apply_to_builder`.	2020-03-05 16:11:21 +01:00
Piotr Dulikowski	260c47d758	cf_prop_defs: pass database& to ::validate, not db::extensions& Changes cf_prop_defs::validate function to take database& as an argument instead of db::extensions&. This change will allow us to move the check which asserts that the cluster supports CDC from `apply_to_builder` to `validate` method.	2020-03-05 16:11:21 +01:00
Piotr Dulikowski	38b7f1ad45	unit tests: register cdc extension before tests In the following commits, using cdc in tests will require registering cdc extension explicitly in db config.	2020-03-05 16:11:20 +01:00
Piotr Dulikowski	0f4f48ef76	cdc: construct cdc_options directly inside cdc_extension Instead of storing a raw map of options inside `cdc_extension`, the extension now converts them into `cdc_options` directly on construction. This removes the need to construct `cdc_options` object multiple times.	2020-03-05 16:09:44 +01:00
Piotr Dulikowski	6895b0e395	db::extensions: add shorthands for add_schema_extension This abstract away a pattern used everywhere when adding a schema extension.	2020-03-05 16:09:44 +01:00
Piotr Sarna	c35160457b	Merge 'Clean up stream_id representation' from Piotr With #5950 we changed the representation of stream_id in CDC Log from two int columns to a single blob column. This PR cleans up stream_id representation internally. Now stream_id is stored as blob both in-memory and in internal CDC tables. Tests: unit(dev) * hawk/stream_id_representation: cdc: store stream_ids as blobs in internal tables cdc: improve do_update_streams_description cdc: Fix generate_topology_description cdc: add stream_id::operator< cdc: change stream_id representation	2020-03-05 14:14:29 +01:00
Tomasz Grabiec	d5557023f6	Merge "Stop using BOOST_TEST_MESSAGE() in unit tests" from Kostja Stop using BOOST_TEST_MESSAGE() in unit tests, it bloats test XML output. Use Scylla logger instead. Test: unit (debug, dev, release)	2020-03-05 13:27:30 +01:00
Calle Wilund	b48255a4cd	db::commitlog: Only zero disk blocks not already allocated in segment Fixes #5891 Refs #5899 When creating segments with the o_dsync option active, we write max_size zeros to disk, to ensure actual disk blocks are allocated. However, if we recycle a segment, we should, when not actually creating a new file, check the existing size on disk, and only zero any blocks not already allocated (i.e. if recycled file was smaller than max_size, due to segement truncation on close). test: unit Message-Id: <20200226121601.15347-1-calle@scylladb.com>	2020-03-05 13:27:08 +01:00
Piotr Sarna	875d230298	Merge "CDC: use a single `cdc$time` value for a batch of changes" from Kamil. If a batch update is performed with a sequence of changes with a single timestamp, they will now show up in CDC with a single timeuuid in the cdc$time column, distinguished by different cdc$batch_seq_no values. Fixes #5953. Tests: unit(dev) * haaawk/splitbatch: cdc: use a single timeuuid value for a batch of changes cdc: replace `split` with `for_each_change`	2020-03-05 13:17:34 +01:00
Pavel Emelyanov	7bc34c17eb	range-streamer: Tune the progress message Now it will show the full info about range being streamed, like range_streamer - Rebuild with 127.0.0.2 for keyspace=ks2, streaming [72, 96) out of 248 ranges The [x, y) range is semi-open one, the full streaming progress then can be logged like ... streaming [0, 16) out of 36 ranges <- first send ... streaming [16, 24) out of 36 ranges ... streaming [24, 36) out of 36 ranges <- last send Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200304101505.5506-1-xemul@scylladb.com>	2020-03-05 12:56:29 +01:00
Kamil Braun	3200d415da	cdc: use a single timeuuid value for a batch of changes If a batch update is performed with a sequence of changes with a single timestamp, they will now show up in CDC with a single timeuuid in the `time` column, distinguished by different `batch_seq_no` values. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 12:32:57 +01:00
Konstantin Osipov	94ee511f6a	lwt: implement cas_failed_read_round_optimization metric Presently lightweight transactions piggy back the old row value on prepare round response. If one of the participants did not provide the old value or the values from peers don't match, we perform a full read round which will repair the Paxos table and the base table, if necessary, at all participants. Capture the fact that read optimization has failed in a metric. Message-Id: <20200304192955.84208-2-kostja@scylladb.com>	2020-03-05 12:20:45 +01:00
Kamil Braun	292eba9da0	cdc: replace `split` with `for_each_change` `for_each_change` is like `split` but it doesn't return a vector of mutations representing each change; instead, it takes as a parameter a function which gets called on each mutation. This reduced the memory usage and allows to preserve common context when handling each change (will be useful in next commits). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 12:05:08 +01:00
Pekka Enberg	0beb45faf3	build: Use reloc dynamic linker unconditionally The relocatable package requires a magic dynamic linker path for "patchelf" to work correctly. Therefore, use the "get-dynamic-linker.sh" script to unconditionally define a magic dynamic linker path to ensure that building the relocatable package with ninja build ("ninja-build build/<mode>/scylla-package.tar.gz") is always correct. Although the path looks odd with a lot of leading slashes, it works outside relocatable package too. Message-Id: <20200305091919.6315-2-penberg@scylladb.com>	2020-03-05 12:53:28 +02:00
Pekka Enberg	8a810cc41a	reloc: Move dynamic linker magic to get-dynamic-linker.sh In preparation for moving dynamic linker flags to ninja build, move the magic dynamic linker path generation to "reloc/get-dynamic-linker.sh" script that configure.py can call. Message-Id: <20200305084331.5339-1-penberg@scylladb.com>	2020-03-05 12:53:22 +02:00
Konstantin Osipov	ac0717fb64	test: consistently use a global testlog object in all tests Use test/lib/log.hh in all tests now that we have it.	2020-03-05 13:34:24 +03:00
Piotr Jastrzebski	57cfe6d0e1	cdc: store stream_ids as blobs in internal tables In new CDC Log format stream_id is represented by a single blob column so it makes sense to store it in the same form everywhere - including internal CDC tables. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:22 +01:00
Piotr Jastrzebski	b2acdc9307	cdc: improve do_update_streams_description Use std::set::insert that takes range instead of looping through elements and adding them one by one. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:22 +01:00
Piotr Jastrzebski	446722d6ed	cdc: Fix generate_topology_description In new CDC Log format we store only a single stream_id column. This means generate_topology_description has to use appropriate schema for generating tokens for stream_ids. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:22 +01:00
Piotr Jastrzebski	9a212dcaef	cdc: add stream_id::operator< Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:21 +01:00
Piotr Jastrzebski	f317a659d9	cdc: change stream_id representation New CDC Log format stores stream ids as blobs. It makes sense to keep them internally in the same form. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:30:10 +01:00
Piotr Sarna	f21bd57058	Merge "cdc: log static rows correctly" from Piotr Currently, writes to a static row in a base table are not reflected at all in the corresponding cdc log. This patch causes such writes to be properly logged. Fixes: #5744 Tests: unit(dev) * piodul/5744-handle-static-row-correctly-in-cdc: cdc_test: add tests for handling static row cdc: fix indentation in transformer::transform cdc: handle static rows separately in transformer::transform cdc: move process_cells higher (and fix captured variables) cdc: reduce dependencies on captured variables in process_cells cdc: fix preimage query for static rows	2020-03-05 10:42:15 +01:00
Nadav Har'El	96ca5ac2c8	alternator: use separate smp_service_group for bouncing requests Until this patch, we used the default_smp_service_group() when bouncing Alternator requests between shards (which is needed for LWT). This patch creates a new smp_service_group for this purpose, which is limited to 5000 concurrent requests (the same limit used for CQL's bounce_request_smp_service_group). The purpose of this limit is to avoid many shards admitting a huge number of requests and bouncing all of them to the same shard who now can't "unadmit" these requests. Fixes #5664. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200304170825.27226-1-nyh@scylladb.com>	2020-03-05 10:17:51 +01:00
Konstantin Osipov	ff3f9cb7cf	test: stop using BOOST_TEST_MESSAGE() for logging We use boost test logging primarily to generate nice XML xunit files used in Jenkins. These XML files can be bloated with messages from BOOST_TEST_MESSAGE(), hundreds of megabytes of build archives, on every build. Let's use seastar logger for test logging instead, reserving the use of boost log facilities for boost test markup information.	2020-03-05 11:38:11 +03:00
Juliusz Stasiewicz	c8527f20b0	CDC+LWT: fix missing CDC entries for successful LWTs Now, if CDC is enabled, `paxos_response_handler::learn_decision()` augments the base table mutation. The differences in logic between: (1) `mutate_internal<std::vector<mutation>>()` and (2) `mutate_internal<std::vector<std::tuple<paxos::proposal, schema_ptr, ...>>>()` make it necessary to separate "CDC mutations" from "base mutation" and send them, respectively, to (1) and (2). Gleb explained in #5869 why it became necessary to add CDC code to LWT writes specifically, instead of doing it somewhere central that affects all writes: "All paths that do write goes through mutate_internally() eventually so it would have been best to do augmentations there, but cdc chose to log only certain writes and not others (unlike MV that does not care how write happened) and mutate_internal have no idea which is which so I do not have other choice but code duplication. ... paxos_response_handler::learn_decision is probably the place to add cdc augmentation." Fixes #5869	2020-03-05 09:49:19 +02:00
Piotr Dulikowski	204e204586	cdc: do not attempt to log empty mutations It is possible to produce an empty mutation using CQL. For example, the following query: DELETE FROM ks.tbl WHERE pk = 0 AND ck < 1 AND ck > 2; will attempt to delete from an empty range of rows. This is translated to the following mutation: {ks.tbl {key: pk{000400000000}, token:-3485513579396041028} {mutation_partition: static: cont=1 {row: }, clustered: {}}} Such mutation does not contain any timestamp, therefore it is difficult to determine what timestamp was used while making the query. This is problematic for CDC, because an entry in CDC log should be written with the same timestamp as a part of the mutation. Because an empty mutation does not modify the table in any way, we can safely skip logging such mutations in CDC and still preserve the ability to reconstruct the current state of the base table from full CDC log. Tests: unit(dev)	2020-03-05 08:32:54 +01:00
Piotr Dulikowski	e6751fad62	cdc_test: add tests for handling static row	2020-03-05 00:16:17 +01:00
Piotr Dulikowski	39519ce923	cdc: fix indentation in transformer::transform	2020-03-05 00:16:17 +01:00
Piotr Dulikowski	0d05b17881	cdc: handle static rows separately in transformer::transform Before this patch, `transform` did not generate any log rows about static row change. This commit fixes that - now, a log row is created if a static row is changed, and this row is separate from the rows that describe changes to the clustering rows.	2020-03-05 00:16:17 +01:00
Piotr Dulikowski	6a0b0b5786	cdc: move process_cells higher (and fix captured variables) The `process_cells` lambda is moved outside the loop, because it will be used by other code in subsequent commits.	2020-03-05 00:15:57 +01:00
Piotr Dulikowski	f136f6e02c	cdc: reduce dependencies on captured variables in process_cells This is a preparation for moving the lambda outside the for loop. - `log_ck`, `pikey`, `pirow` are now passed as arguments, - `value` is now a variable local to the lambda, - `ttl` is now a variable local to the lambda that is returned.	2020-03-05 00:14:05 +01:00
Piotr Dulikowski	a7f51449c3	cdc: fix preimage query for static rows For static rows, we need to fetch at least one row from its partition in order to compute its preimage.	2020-03-04 18:43:55 +01:00
Botond Dénes	8b908a9aba	test: lib/mutation_source_test: log the name of the test-method Most test-methods log a message with their names upon entering them. This helps in identifying the test-method a failure happened in in the logs. Two methods were missing this log line, so add it. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200304155235.46170-1-bdenes@scylladb.com>	2020-03-04 18:16:21 +02:00
Pekka Enberg	7fde2e28da	dist/redhat: Specify files once in scylla.spec file Silences the following warnings when building an RPM: warning: File listed twice: /opt/scylladb/scripts/libexec/hex2list.py warning: File listed twice: /opt/scylladb/scripts/libexec/node_exporter_install warning: File listed twice: /opt/scylladb/scripts/libexec/perftune.py warning: File listed twice: /opt/scylladb/scripts/libexec/scylla-blocktune warning: File listed twice: /opt/scylladb/scripts/libexec/scylla-housekeeping warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_bootparam_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_config_get.py warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_coredump_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_cpuscaling_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_cpuset_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_dev_mode_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_ec2_check warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_fstrim warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_fstrim_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_io_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_kernel_check warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_ntp_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_prepare warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_raid_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_selinux_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_setup warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_stop warning: File listed twice: /opt/scylladb/scripts/libexec/scylla_sysconfig_setup warning: File listed twice: /opt/scylladb/scripts/libexec/seastar-addr2line warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/LICENSE-crc32-vpmsum.TXT warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/README.md warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/apache-license-2.0.txt warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/boost-license-1.0.txt warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/date-license.txt warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/git-archive-all-license.txt warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/libdeflate-license.txt warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/xxhash-license.txt warning: File listed twice: /opt/scylladb/share/doc/scylla/licenses/zstd-license.txt I verified that the files are in the generated RPMs after the change: [penberg@nero scylla]$ rpm -ql build/dist/dev/redhat/RPMS/x86_64/scylla-server-666.development-0.20200304.2bc700b008.x86_64.rpm \| grep scripts.*libexec /opt/scylladb/scripts/libexec /opt/scylladb/scripts/libexec/hex2list.py /opt/scylladb/scripts/libexec/node_exporter_install /opt/scylladb/scripts/libexec/perftune.py /opt/scylladb/scripts/libexec/scylla-blocktune /opt/scylladb/scripts/libexec/scylla-housekeeping /opt/scylladb/scripts/libexec/scylla_bootparam_setup /opt/scylladb/scripts/libexec/scylla_config_get.py /opt/scylladb/scripts/libexec/scylla_coredump_setup /opt/scylladb/scripts/libexec/scylla_cpuscaling_setup /opt/scylladb/scripts/libexec/scylla_cpuset_setup /opt/scylladb/scripts/libexec/scylla_dev_mode_setup /opt/scylladb/scripts/libexec/scylla_ec2_check /opt/scylladb/scripts/libexec/scylla_fstrim /opt/scylladb/scripts/libexec/scylla_fstrim_setup /opt/scylladb/scripts/libexec/scylla_io_setup /opt/scylladb/scripts/libexec/scylla_kernel_check /opt/scylladb/scripts/libexec/scylla_ntp_setup /opt/scylladb/scripts/libexec/scylla_prepare /opt/scylladb/scripts/libexec/scylla_raid_setup /opt/scylladb/scripts/libexec/scylla_selinux_setup /opt/scylladb/scripts/libexec/scylla_setup /opt/scylladb/scripts/libexec/scylla_stop /opt/scylladb/scripts/libexec/scylla_sysconfig_setup /opt/scylladb/scripts/libexec/seastar-addr2line [penberg@nero scylla]$ rpm -ql build/dist/dev/redhat/RPMS/x86_64/scylla-server-666.development-0.20200304.2bc700b008.x86_64.rpm \| grep license /opt/scylladb/share/doc/scylla/licenses /opt/scylladb/share/doc/scylla/licenses/LICENSE-crc32-vpmsum.TXT /opt/scylladb/share/doc/scylla/licenses/README.md /opt/scylladb/share/doc/scylla/licenses/apache-license-2.0.txt /opt/scylladb/share/doc/scylla/licenses/boost-license-1.0.txt /opt/scylladb/share/doc/scylla/licenses/date-license.txt /opt/scylladb/share/doc/scylla/licenses/git-archive-all-license.txt /opt/scylladb/share/doc/scylla/licenses/libdeflate-license.txt /opt/scylladb/share/doc/scylla/licenses/xxhash-license.txt /opt/scylladb/share/doc/scylla/licenses/zstd-license.txt Message-Id: <20200304150057.2621-1-penberg@scylladb.com>	2020-03-04 17:25:53 +02:00
Tomasz Grabiec	da4bd3d2e6	Merge "Clean cql3 usage of storage_proxy and _service" from Pavel E. This set removes _all_ mentionings of storage_service and _all_ calls for global storage_proxy instances from cql3/ code. Tests: unit(dev)	2020-03-04 15:20:24 +01:00
Raphael S. Carvalho	3ba3ee2a7b	distributed_loader: trigger regular compaction on resharding completion Regular compaction relies on compaction manager to run compaction jobs until compaction strategy is satisfied. Resharding, on the other hand, is an one-off operation which runs only once in compaction manager, and leave the sstable set in such a way that the strategy is very likely unsatisfied. We need to trigger regular compaction whenever a resharding job replaces a shared sstable by an unshared sstable, so that compaction will not fall way behind due to lots of new sstables created by resharding process. Fixes #5262. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200217144946.20338-1-raphaelsc@scylladb.com>	2020-03-04 16:08:13 +02:00
Nadav Har'El	f67a402c48	merge: Remove treewide dependency on boost/multiprecision Merged patch series from Avi Kivity: boost/multiprecision is a heavyweight library, pulling in 20,000 lines of code into each header that depends on it. It is used by converting_mutation_partition_applier and types.hh. While the former is easy to put out-of-line, the latter is not. All we really need is to forward-declare boost::multiprecision::cpp_int, but that is not easy - it is a template taking several parameters, among which are non-type template parameters also defined in that header. So it's quite difficult to disentangle, and fragile wrt boost changes. This patchset introduces a wrapper type utils::multiprecision_int which _can_ be forward declared, and together with a few other small fixes, manages to uninclude boost/multiprecision from most of the source files. The total reduction in number of lines compiled over a full build is 324 * 23,227 or around 7.5 million. Tests: unit (dev) Ref #1 https://github.com/avikivity/scylla uninclude-boost-multiprecision/v1 Avi Kivity (5): converting_mutation_partition_applier: move to .cc file utils: introduce multiprecision_int tests: cdc_test: explicitly convert from cdc::operation to uint8_t treewide: use utils::multiprecision_int for varint implementation types: forward-declare multiprecision_int configure.py \| 2 + concrete_types.hh \| 2 +- converting_mutation_partition_applier.hh \| 163 ++------------- types.hh \| 12 +- utils/big_decimal.hh \| 3 +- utils/multiprecision_int.hh \| 256 +++++++++++++++++++++++ converting_mutation_partition_applier.cc \| 188 +++++++++++++++++ cql3/functions/aggregate_fcts.cc \| 10 +- cql3/functions/castas_fcts.cc \| 28 +-- cql3/type_json.cc \| 2 +- lua.cc \| 38 ++-- mutation_partition_view.cc \| 2 + test/boost/cdc_test.cc \| 6 +- test/boost/cql_query_test.cc \| 16 +- test/boost/json_cql_query_test.cc \| 12 +- test/boost/types_test.cc \| 58 ++--- test/boost/user_function_test.cc \| 2 +- test/lib/random_schema.cc \| 14 +- types.cc \| 20 +- utils/big_decimal.cc \| 4 +- utils/multiprecision_int.cc \| 37 ++++ 21 files changed, 627 insertions(+), 248 deletions(-) create mode 100644 utils/multiprecision_int.hh create mode 100644 converting_mutation_partition_applier.cc create mode 100644 utils/multiprecision_int.cc	2020-03-04 15:13:42 +02:00
Avi Kivity	5dee627f73	types: forward-declare multiprecision_int This reduces the number of translation units that depend on boost/multiprecision from 354 to 30, and reduces the size of database.i (as an example) from 406160 to 382933 (smaller files will benefit more, relatively). Ref #1	2020-03-04 13:28:16 +02:00
Avi Kivity	3c772757c0	treewide: use utils::multiprecision_int for varint implementation The goal is to forward-declare utils::multiprecision_int, something beyond my capabilities for boost::multiprecision::cpp_int, to reduce compile time bloat. The patch is mostly search-and-replace, with a few casts added to disambiguate conversions the compiler had trouble with.	2020-03-04 13:28:16 +02:00
Avi Kivity	874f65c58c	tests: cdc_test: explicitly convert from cdc::operation to uint8_t After the varint data type starts using the new multiprecision_int type, this code fails to compile. I expect that somehow the conversion from enum class to cpp_int was allowed to succeed, and we ended up with a data_value of type varint. The tests succeeded because the serialized representation happened to be the same.	2020-03-04 13:28:16 +02:00
Piotr Jastrzebski	354e3c34c8	cdc log: merge stream_id columns into a single column Previously we had stream_id_1 and stream_id_2 columns of type long each. They were forming a partition key. In a new format we want a single stream_id column that forms a partition key. To be able to still store two longs, the new column will have type blob and its value will be concatenated bytes of two longs that partition key is composed of. We still want partition key to logically be two longs because those two values will be used by a custom partitioner later once we implement it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-04 13:27:48 +02:00
Avi Kivity	7434c81a29	utils: introduce multiprecision_int multiprecision_int is a wrapper around boost::multiprecision::cpp_int that adds no functionality. The intent is to allow forward declration; cpp_int is so complicated that just finding out what its true type is a difficult exercise, as it depends on many internal declarations. Because cpp_int uses expression templates, the implementation has to explicitly cast to the desired type in many places, otherwise the C++ compile is presented with too many choices, especially in conjunction with data_value (which can convert from many different types too).	2020-03-04 12:42:57 +02:00
Avi Kivity	414ec8c68e	converting_mutation_partition_applier: move to .cc file converting_mutation_partition_applier is a heavyweight class that is not used in the hot path, so it can be safely out-of-lined. This moves some includes to boost/multiprecision out of header files, where they can infect a lot of code. mutation_partition_view.cc's includes were adjusted to recover missing dependencies.	2020-03-04 12:42:57 +02:00
Pavel Emelyanov	35b0e6dd7f	repair_writer: Use db from repair_meta (2nd try) The previous version errorneously used local db reference which was propagated into another shard. This time carry the sharded instance and use .local() as before. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303221729.31261-1-xemul@scylladb.com>	2020-03-04 11:31:52 +01:00
Tomasz Grabiec	477dadc062	Merge "cql_test_env: Drop a few shared_ptr<sharded<...>>" from Rafael I found that a few variables in cql_test_env were wrapping sharded in shared_ptr for no apparent reason. These patches convert them to plain sharded<...>.	2020-03-04 11:31:52 +01:00
Yaron Kaikov	de19496ff7	dist/docker: Add VERSION argument to Dockerfile (#5845 ) Currently, the Dockerfile installs the latest version of Scylla. Let's add a VERSION argument to Dockerfile, which explicitly specifies the version to ensure scripts, for example, always build the expected version. If no VERSION is specified for "docker build", use the default value of "666.development", which is the version number for latest nightly.	2020-03-04 12:20:24 +02:00
Pekka Enberg	e76b5bdf7b	Merge 'Cleanup test.py output' from Kostja "These two patches were made suspect of failing next promotion and excluded from the original series." * 'test.py.log' of https://github.com/kostja/scylla: test.py: remove log output on success unless -s is specified test.py: do not store entire log output in junit report.	2020-03-04 11:58:46 +02:00
Eliran Sinvani	99cedf737c	docker: rsyslog configuration fixes The introduction of rsyslog had two errors in it. Both errors are non fatal and the docker still works, however, the system is left in a wrong state in which supervisord marks rsyslogd service as failed (after several failed retry attempts). Another bug in the configuration causes rsyslog to output an error. 1) An inclusion command from a newer version was used in rsyslogs main configuration file. This caused to rsyslog to complain during startup but it didn't do much damage since rsyslog converts every unrecognised command to a message command. 2) in the supervisord definition of the service, rsyslogd is ran without the -n option which means it defaults to automatically switch to the background. Supervisord interpret this as an unexpected process termination and retries to start the process (unsuccessfully because rsyslog protects itself from having multiple processes of itself) and eventually marks it as down although it is fully up and running. This commit fixes both configuration problems. Tests: Build and run docker and validate the errors are gone. Fixes #5937	2020-03-04 11:56:30 +02:00
Pekka Enberg	325c3e13eb	build: Switch to SHA1 build IDs Currently, you have to build the relocatable package tarball with ./reloc/build_reloc.sh to be able to build an RPM out of it. You need to do this because RPMS require SHA1 build-ids, but the build system does not enforce that. To prepare for adding RPM target to the ninja build, let's switch to SHA1 build ID conditionally, because the performance difference between xxhash and SHA1 is neglible. Rafael Avila de Espindola writes: [...] the sha1 implementation in current lld is pretty fast. Linking release scylla the times I get are lld in fedora fast 2.83739 sha1 3.51990 current lld fast 2.6936 sha1 2.90250 And the sha1 implementation might get even faster: https://bugs.llvm.org/show_bug.cgi?id=44138. Message-Id: <20200303131806.22422-1-penberg@scylladb.com>	2020-03-04 11:00:43 +02:00
Tomasz Grabiec	82b76163e3	utils/small_vector: Add missing include Needed for std::uninitialized_move() et al Message-Id: <20200303191148.11716-1-tgrabiec@scylladb.com>	2020-03-03 21:23:40 +02:00
Tomasz Grabiec	5dfefc0a85	Revert "repair_writer: Use db from repair_meta" This reverts commit `c6ddd21c50`. Uses database& instance across shards, which causes repair writer to use the table object from the wrong shard. Fixes #5907	2020-03-03 19:50:53 +01:00
Avi Kivity	906784639d	Merge "Clean sstables from using global objects" from Pavel E " This set cleans sstable_writer_config and surrounding sstables code from using global storage_ and feature_ service-s and database by moving the configuration logic onto sstables_manager (that was supposed to do it since `eebc3701a5`). Most of the complexity is hidden around sstable_writer_config creation, this set makes the sstables_manager create this object with an explicit call. All the rest are consequences of this change. Tests: unit(debug), manual start-stop " * 'br-clean-sstables-manager-2' of https://github.com/xemul/scylla: sstables: Move get_highest_supported_format sstables: Remove global get_config() helper sstables: Use manager's config() in .new_sstable_component_file() sstable_writer_config: Extend with more db::config stuff sstables_manager: Don't use global helper to generate writer config sstable_writer_config: Sanitize out some features fields initialization sstable_writer_config: Factor out some field initialization sstables: Generate writer config via manager only sstables: Keep reference on manager test: Re-use existing global sstables_manager table: Pass sstable_writer_config into write_memtable_to_sstable	2020-03-03 18:33:01 +02:00
Nadav Har'El	750fe9585a	alternator: change rjson::get() to take std::string_view Change rjson::get() to take std::string_view, instead of RapidJson's version of that type, "StringRef". We already did the same change for rjson::find() in a previous patch. Not only is std::string_view more convenient for potential callers in Scylla, this change also avoids a bug in FindMember() on StringRef where the length is ignored (and instead, null-termination of the string is assumed). This patch doesn't require any changes to callers, because we actually had just a handful of remaining callers (most call sites switched to rjson::find()), and all of them used string constants which could be implicitly converted to StringRef or std::string_view just the same. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200303161019.1456-1-nyh@scylladb.com>	2020-03-03 17:13:40 +01:00
Nadav Har'El	91d9632909	alternator: add rjson::remove_member() convenience function This patch adds a rjson::remove_member() wrapper to the RemoveMember method, which takes a std::string_view. But beyond the convenience, this actually works around a subtle bug in RemoveMember where, if given a StringRef parameter, ignores its length (see upstream issue https://github.com/Tencent/rapidjson/issues/1649). In the one place we used RemoveMember, it forced us to copy the string because it wasn't null-terminated. The solution proposed here involves wrapping the string view in a GenericValue - which no longer needs to copy the string, but still works around the bug. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200303143524.28300-1-nyh@scylladb.com>	2020-03-03 16:35:41 +01:00
Nadav Har'El	0fcb226412	alternator: switch rjson::find() to use std::string_view Our rjson::find() convenience function used RapidJson's "StringRef" type, which is almost exactly like std::string_view. If we switch to use string_view as we do in this patch, a lot of call sites become much simpler. Moreover, there was an even more important motivation for this patch: the RapidJson FindMember() function we used in rjson::find() has a bug when given a StringRef - although a StringRef contains a length, the FindMember() code ignores it and expects the string to be null-terminated (see: https://github.com/Tencent/rapidjson/issues/1649). In this patch, we wrap the pointer and length of a std::string_view in an rjson::value, a code path which bypasses the FindMember bug, and yet does not require copying the string. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200303141814.26929-1-nyh@scylladb.com>	2020-03-03 16:35:41 +01:00
Nadav Har'El	2ea0b9d226	Merge branch 'split-mutations' of github.com:haaawk/scylla into next Merged pull request https://github.com/scylladb/scylla/pull/5940 from Kamil Braun: Add a bunch of new structs describing a change made to a table, and an extract_changes function which takes a mutation and returns the set of changes contained in this mutation, separated by timestamp and ttl. Add a split function which uses extract_changes to split a mutation into separate mutations, each describing a single change. Static rows are put into separate changes now. The pre_image_select function was fixed to select pre_image data always when there is a static row/clustered row change, even if there were e.g. additional range tombstones. Fixes: #5719. Tests: unit(dev)	2020-03-03 17:27:21 +02:00
Botond Dénes	103bf50e18	storage_proxy: add timeouts to smp calls on the write path When a node is overloaded requests usually start to queue up. Timeouts are supposed to prevent queues from exploding and causing an OOM. One prominent queue that tends to explode is the smp queue as it didn't support timeouts and so requests would sit in the queue until the target shard would process them. If the target shard is heavily overloaded requests might accumulate faster then they are processed, surely leading to an OOM. To prevent this use the recently introduces timeout to `seastar::smp::submit_to()` and derived APIs to time out write requests sitting in the smp queue. We simply use the request's own timeout for this purpose. Fixes: #5055 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200303131658.741720-1-bdenes@scylladb.com>	2020-03-03 15:39:58 +02:00
Kamil Braun	5de9b5b566	cdc: add change splitting test Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-03 13:31:19 +01:00
Kamil Braun	5c4a237c12	cdc: split the mutation before passing it into `transform` If the mutation contains separate logical changes (e.g. with different timestamps and/or ttls), it will be split into multiple mutations, each passed into transform.	2020-03-03 13:17:51 +01:00
Kamil Braun	9924e3aa34	cdc: reduce code duplication in augment_mutation_call Now there's only one call to `transform`.	2020-03-03 13:17:51 +01:00
Kamil Braun	24a32a13b5	cdc: retrieve preimage anytime there are static/clustered row updates Previously we wouldn't retrieve the preimage if the mutation contained something different than static/clustered row updates, e.g. if it contained a partition deletion. However, there are mutations created from batch statements which can contain both a partition deletion and a set of row updates with a later timestamp. We want to retrieve the preimage too in this case.	2020-03-03 13:17:51 +01:00
Kamil Braun	529d30ef66	cdc: add `split` function This function takes a mutation and returns a set of mutations, each representing a separate change with a single timestamp and ttl.	2020-03-03 13:17:51 +01:00
Kamil Braun	132ea89c32	cdc: add `extract_changes` function This commit introduces a bunch of new structs describing a change made to a table, and an `extract_changes` function which takes a mutation and returns the set of changes contained in this mutation, separated by timestamp and ttl.	2020-03-03 13:17:51 +01:00
Kamil Braun	b5c944370e	cdc: add `should_split` function The function checks if there are multiple timestamps and/or ttls inside a mutation, which means separate changes should be created for this mutation in CDC.	2020-03-03 13:17:50 +01:00
Konstantin Osipov	48f09b95d0	test.py: remove log output on success unless -s is specified Log output is saved by the build system and can take a lot of space. Remove it unless -s is specified.	2020-03-03 13:59:14 +03:00
Konstantin Osipov	ae2820a1c7	test.py: do not store entire log output in junit report. This makes report very heavy and is suspected to corrupt XML output.	2020-03-03 13:59:14 +03:00
Nadav Har'El	359b32fb63	merge: CDC: implement new column format and naming Merged pull request https://github.com/scylladb/scylla/pull/5910 by Calle Wilund: Rename metadata and data columns according to new spec Also use transformation methods for names in all code + tests to make switching again easier Break up data column tuple Data column is now pure frozen original type. If column is deleted (set to null), a metadata column cdc$deleted_ is set to true, to distinguish null column == not involved in row operation For non-atomic collections, a cdc$deleted_elements_ column is added, and when removing elements from collection this is where they are shown. For non-atomic assign, the "cdc$deleted_" is true, and is set to new value. column_op removed.	2020-03-03 12:36:16 +02:00
Pavel Emelyanov	4fa12f2fb8	header: De-bloat schema.hh The header sits in many other headers, but there's a handy schema_fwd.hh that's tiny and contains needed declarations for other headers. So replace shema.hh with schema_fwd.hh in most of the headers (and remove completely from some). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303102050.18462-1-xemul@scylladb.com>	2020-03-03 11:34:00 +01:00
Piotr Jastrzebski	f105f43008	commitlog: remove FIXME In segment_manager::on_timer() there's a FIXME to stop discarding future returned from sync() but sync() does not return any future so it's safe to remove the FIXME and stop casting to (void). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <6d6d819cb2972e47e5f3fbe7b896499c64b09e53.1583230579.git.piotr@scylladb.com>	2020-03-03 12:21:56 +02:00
Calle Wilund	ed0d1c5fe2	cdc: Break up data column tuple According to "new" spec: Data column is now pure frozen original type. If column is deleted (set to null), a metadata column cdc$deleted_<name> is set to true, to distinguish null column == not involved in row operation For non-atomic collections, a cdc$deleted_elements_<name> column is added, and when removing elements from collection this is where they are shown. For non-atomic assign, the "cdc$deleted_<name>" is true, and <name> is set to new value. column_op removed.	2020-03-03 08:52:20 +00:00
Rafael Ávila de Espíndola	28e59566a8	cql_test_env: Don't use a shared_ptr for token_metadata Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:52:23 -08:00
Rafael Ávila de Espíndola	47f8a63279	cql_test_env: Don't use a shared_ptr for migration_notifier Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:51:45 -08:00
Rafael Ávila de Espíndola	ed0c4d2801	cql_test_env: Don't use a shared_ptr for view_update_generator Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:51:25 -08:00
Rafael Ávila de Espíndola	ff2edd15d4	cql_test_env: Don't use a shared_ptr for view_builder Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:50:48 -08:00
Rafael Ávila de Espíndola	9375478803	cql_test_env: Don't use a shared_ptr for feature_service Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:50:25 -08:00
Rafael Ávila de Espíndola	5e87562f33	cql_test_env: Don't use a shared_ptr for database Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:50:08 -08:00
Rafael Ávila de Espíndola	a4b7de4d5d	cql_test_env: Don't use a shared_ptr for auth::service Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-02 13:49:46 -08:00
Botond Dénes	8a1c8ce8a6	mutation_partition: make query_result_builder safely movable `query_result_builder` is movable but if you actually try to move it after it having consumed some fragments it will blow up in your face when you try to use it again. This is because its `mutation_querier` member received a reference to its `query::result::partition_writer`. Of course the reference to the latter was invalidated on move so the former accessed invalid memory. Since `query::result::partition_writer` wasn't actually used for anything other, just move it into the `mutation_querier`, making `query_result_builder` actually safe to move. Fixes: #3158 Message-Id: <20190830142601.51488-1-bdenes@scylladb.com>	2020-03-02 18:46:59 +01:00
Botond Dénes	4da0a1d397	docs/debugging.md: mention another method of helping gdb find sources Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200225124701.80706-1-bdenes@scylladb.com>	2020-03-02 18:26:29 +01:00
Pavel Emelyanov	86ca4b83d0	Revert "Revert "features: Stop on shutdown"" This reverts commit `165913598b`.	2020-03-02 19:56:18 +03:00
Pavel Emelyanov	0a10e9787e	features: Remove future-based when_enabled() This API is considered to be error-prone, all users of it are reworked, so let's drop it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-02 19:55:52 +03:00
Pavel Emelyanov	e63f5187b2	system_keyspace: Rework migrate_truncation_records feature subscription The function in question uses future-based .when_enabled() subscription on cluster_supports_truncation_table feature. This method is considered to be unsafe, so here's the patch that changes it onto feature::listener. The completion of the migration is only awaited by a single test, so this waiting mechanism is also slightly simplified. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-02 19:55:28 +03:00
Tomasz Grabiec	e17db536fd	Merge "lwt: support LIKE operator in conditional expressions" from Alejo Support LIKE operator condition on column expressions. NOTE: following the existing code, the LIKE pattern value is converted to raw bytes and passed straight as bytes_view to like_matcher without type checking; it should be checked/sanitized by caller. Refs: #5777 Branch URL: https://github.com/alecco/scylla/tree/as_like_condition_2 Tests: unit ({dev}), unit ({debug}) NOTE: fail for unrelated test test_null_value_tuple_floating_types_and_uuids	2020-03-02 17:36:57 +01:00
Botond Dénes	6218153543	scylla-gdb.py: introduce collection_element() Extracting a certain element from a collection is a common task I have to do while debugging cores. For certain collections (c-array, std::array) this is trivial, for others it is easy enough (std::vector), but for some (std::list) this is a tiresome work-intensive process. This convenience function allows getting a reference to any element of the supported container types, returning them for further use in the interactive session. Currently only `std::list` and `std::vector` are supported.	2020-03-02 16:28:49 +01:00
Botond Dénes	94352b3426	scylla-gdb.py: generalize dereference_lw_shared_ptr() To be a generic convenience function for dereferencing all sorts of smart pointers. For now `std::unique_ptr`, `seastar::lw_shared_ptr` and `seastar::foreign_ptr` are supported.	2020-03-02 16:28:04 +01:00
Botond Dénes	b6f8a6fbd3	test/boost: sstable_datafile_test: sstable_scrub_test: stop table `table` is not registered with the database, and hence will not be waited on during shutdown. Stop it explicitly to prevent any asynchronous operation on it racing with shutdown. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200302142845.569638-1-bdenes@scylladb.com>	2020-03-02 16:20:00 +01:00
Calle Wilund	b6443e44b9	set: Make set_type_impl::serialize_partially_deserialized_form static Conform with map + does not require any instance info.	2020-03-02 14:43:34 +00:00
Pavel Solodovnikov	64451e5f51	cql3: minor cleanups regarding cql3::attributes::raw class * Mark cql3::attributes::raw class as final * Change every occurrence of ::shared_ptr<attributes::raw> to std::unique_ptr<...> * Mark all methods in cql3::attributes::raw as const * Remove redundant "_attrs" ptr copy in insert_json_statement, use one from raw::modification_statement * Fix odd indentation in cql3/statements/update_statement.cc Tests: unit-tests (dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200301223708.99883-1-pa.solodovnikov@scylladb.com>	2020-03-02 13:26:01 +01:00
Tomasz Grabiec	51cfd13f8c	gdb: Fix get_local_tasks() chunked_vector holds task* directly after seastar commit bcb5cf3a8dca19be0e577ee4e3bcd246f949dce6. Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200227171722.7189-1-tgrabiec@scylladb.com>	2020-03-02 12:02:19 +02:00
Tomasz Grabiec	57a3f3e36b	gdb: Fix std_variant::get() when index is > 0 _get_next() was recursively calling itself with index - 1 if index was > 0. When we reached the desired element we always tried to use member_types[0] as the type, which is incorrect since member_types contains all types and doesn't change in get(). Fix by replacing recursion with iteration so that we keep the original index. Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1582900804-18681-1-git-send-email-tgrabiec@scylladb.com>	2020-03-02 11:59:19 +02:00
Tomasz Grabiec	4942c4c22b	gdb: Drop class keyword when constructing type name in seastar_lw_shared_ptr I encountered a case when template type name is not resolved when "class " is present. Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1582900998-19267-1-git-send-email-tgrabiec@scylladb.com>	2020-03-02 11:58:44 +02:00
Calle Wilund	1085860c62	cdc: Rename metadata and data columns according to new spec Also use transformation methods for names in all code + tests to make switching again easier	2020-03-02 09:34:51 +00:00
Piotr Sarna	c62863cf69	alternator: restore verbose parsing error messages When wrapping rapidjson routines with safer, yieldable code, parsing information was lost, because the JSON reader was not checked for parsing errors before further processing. That resulted in nearly all parsing errors being reduced to "Assertion failed: StackSize() != 1". After this patch, all various errors (missing quotations, colons, object names, etc.) are properly returned for the user. Message-Id: <968ce2f7539bf33d3eb829f0ab431b788d291602.1583134221.git.sarna@scylladb.com>	2020-03-02 11:29:09 +02:00
Tomasz Grabiec	4c0ddf3a2d	gdb: Introduce 'scylla features' command Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1582901194-19903-1-git-send-email-tgrabiec@scylladb.com>	2020-03-02 11:28:13 +02:00
Nadav Har'El	ba536dbc95	alternator-test: don't warn about not verifying SSL certificates When running the Alternator tests, we don't care about verifying the pedigree of the SSL certificates - we actually know the ones we use in our test setups are fake, and not signed by any respectable certificate authority. We already use "verify=False" in many requests to avoid the certificate checking, but then we start getting scary-looking warning messages about an "Unverified HTTPS request is being made.". There's a way to disable these warnings, but we only did in some cases, and there were still some tests that show these warnings. Let's do it once, in a way that affects all tests. Message-Id: <20200301175607.8841-1-nyh@scylladb.com>	2020-03-01 22:59:20 +01:00
Juliusz Stasiewicz	cf24ae86f3	cdc: distinguishing update from insert When incoming mutation contains live row marker the `operation` is described as "insert", not as an "update". Also, I extended the test case "test_row_delete" with one insert, which is expected to log different value of `operation` than update or delete. Renamed the test case accordingly. Test cases that relied on "update" being the same as "insert" are updated accordingly (`test_pre_image_logging`, `test_cdc_across_shards`, `test_add_columns`). Fixes #5723	2020-03-01 17:50:08 +02:00
Avi Kivity	157fe4bd19	Merge "Remove default timeouts" from Botond " Timeouts defaulted to `db::no_timeout` are dangerous. They allow any modifications to the code to drop timeouts and introduce a source of unbounded request queue to the system. This series removes the last such default timeouts from the code. No problems were found, only test code had to be updated. tests: unit(dev) " * 'no-default-timeouts/v1' of https://github.com/denesb/scylla: database: database::query(), database::apply(): remove default timeouts database: table::query(): remove default timeout mutation_query: data_query(): remove default timeout mutation_query: mutation_query(): remove default timeout multishard_mutation_query: query_mutations_on_all_shards(): remove default timeout reader_concurrency_semaphore: wait_admission(): remove default timeout utils/logallog: run_when_memory_available(): remove default timeout	2020-03-01 17:29:17 +02:00
Alejo Sanchez	c3b157a80b	lwt: support LIKE operator in conditional expressions Adds support of LIKE operator in conditional column expressions. Refs: #5777 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-03-01 14:22:10 +01:00
Piotr Sarna	2137017bc3	alternator: revert to ValidationException for JSON errors Both rapidjson library and DynamoDB induce enough corner cases for incorrect JSON, that the simplest way out is to simply conform back to ValidationException in all cases. This commit comes with an updated test, which is now aware of 3 possible outcomes for an incorrect JSON: a ValidationException, a SerializationException and HTTP 404. Message-Id: <5e39d2dc077f4ea5ce360035a4adcddaf3a342a0.1582876734.git.sarna@scylladb.com>	2020-03-01 14:35:20 +02:00
Avi Kivity	1ed06cdb7c	Revert "dist/common/scripts/scylla_coredump_setup: bind-mount coredump directory, add coredump test" This reverts commit `65aadad9a6`. It causes crashes (due to the coredump test) during package install, since scylla_coredump_setup is called from rpm postinstall. The test should be done only from scylla_setup (and the user should be warned). Fixes #5916.	2020-03-01 14:32:31 +02:00
Avi Kivity	db544db5e2	Merge "Convert a few APIs to std::string_view" from Rafael " As part of avoiding static initialization order problems I want to switch a few global sstring to constexpr std::string_view. The advantage being that a constexpr variable doesn't need runtime initialization and therefore cannot be part of a static initialization order problem. In order to do the conversion I needed to convert a few APIs to use std::string_view instead of sstring and const sstring&. These patches are the simple cases that are also an improvement in their own right. " * 'espindola/string_view' of https://github.com/espindola/scylla: (22 commits) test: Pass a string_view to create_table's callback Pass string_view to the schema constructor cql3: Pass string_view to the column_specification constructor Pass string_view to keyspace_metadata::new_keyspace Pass string_view to the keyspace_metadata constructor utils: Use std::string as keys in nonstatic_class_registry utils: Pass a string_view to class_registry::to_qualified_class_name auth: Return a string_view from authorizer::qualified_java_name auth: Return a string_view from authenticator::qualified_java_name utils: Pass string_view to is_class_name_qualified test: Pass a string_view to create_keyspace Pass string_view to no_such_column_family's constructor perf_simple_query: Pass a string_view to make_counter_schema Pass string_view to the schema_builder constructor types: Add more data_value constructors transport: Pass a string_view to cql_server::connection::make_autheticate transport: Pass a string_view to cql_server::response::write_string cql3: Pass std::string_view to query_processor::compute_id cql3: Remove unused variable cql3: Pass a string_view to cf_statement::prepare_keyspace ...	2020-03-01 14:22:28 +02:00
Rafael Ávila de Espíndola	b3d396ea1f	utils: Use on_internal_error from seastar With this change abort_on_internal_error is enable on every SEASTAR_TEST_CASE. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200227164823.21021-1-espindola@scylladb.com>	2020-02-29 19:28:57 +02:00
Pavel Emelyanov	3ab43eba01	validation: Cleanup validate_keyspace helpers One of them uses global storage_proxy instance, but since it is not used -- remove it not to encourage anybody to start calling one. Another call uses the db.find_keyspace to check if a keyspace exists, while there's a nicer db.has_keyspace helper (which doesn't throw exceptions) so use it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200228123644.13931-1-xemul@scylladb.com>	2020-02-29 19:28:57 +02:00
Rafael Ávila de Espíndola	80bfe91a20	test: Pass a string_view to create_table's callback This gives more flexibility to the create_table implementation. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	151f5e723f	Pass string_view to the schema constructor This moves string copies from the callers of the constructor to the implementation. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	fba071163e	cql3: Pass string_view to the column_specification constructor This moves sstring copies from the callers to the constructor implementation. While at it, move the implementation out-of-line. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	ba453d832b	Pass string_view to keyspace_metadata::new_keyspace This avoids a few sstring copies. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	94d07fba07	Pass string_view to the keyspace_metadata constructor This avoids a few sstring copies when constructing keyspace_metadata. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	01fe766f1f	utils: Use std::string as keys in nonstatic_class_registry The sstring::compare functions was never updated to work with std::string_view. We could fix that, but it seems better to just switch to std::string. With a working compare function we can avoid copying the argument passed to to_qualified_class_name when an entry is found in the map. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:08 -08:00
Rafael Ávila de Espíndola	31985d3c28	utils: Pass a string_view to class_registry::to_qualified_class_name This just moves a string copy from the caller to the implementation. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 13:30:00 -08:00
Rafael Ávila de Espíndola	df4f1a3bc3	auth: Return a string_view from authorizer::qualified_java_name This gives more flexibility to the implementations as they now don't need to construct a sstring. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 11:45:22 -08:00
Rafael Ávila de Espíndola	c29f8caafc	auth: Return a string_view from authenticator::qualified_java_name This gives more flexibility to the implementations as they now don't need to construct a sstring. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 11:32:36 -08:00
Rafael Ávila de Espíndola	fae05e9268	utils: Pass string_view to is_class_name_qualified With this we don't need to construct a sstring just to call is_class_name_qualified. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	0b57bddb3e	test: Pass a string_view to create_keyspace With this we don't need to construct a sstring just to call create_keyspace. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	2b96abcece	Pass string_view to no_such_column_family's constructor With this we don't have to construct a sstring to construct a no_such_column_family. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	2679c0cc87	perf_simple_query: Pass a string_view to make_counter_schema With this we don't need to construct a sstring just to call make_counter_schema. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	9ab2346e7f	Pass string_view to the schema_builder constructor With this we don't need to construct a sstring just to construct a schema_builder. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	93de9597bf	types: Add more data_value constructors With this we can construct a data_value from any string type. This also avoids a few sstring copies. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	c51d81341b	transport: Pass a string_view to cql_server::connection::make_autheticate With this we don't need to construct a sstring just to call make_autheticate. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	c2c44f4778	transport: Pass a string_view to cql_server::response::write_string With this we don't need to construct a sstring just to call write_string. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	4adefd9a76	cql3: Pass std::string_view to query_processor::compute_id With this we don't need to construct a sstring just to call compute_id. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	f44a5255da	cql3: Remove unused variable Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	9e00f1e23b	cql3: Pass a string_view to cf_statement::prepare_keyspace This avoids a copy in the callers. While at it, also make this function non-virtual since it is never overwritten. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	2fd3ec8d6f	cql3: Pass a string_view to keyspace_element_name::set_keyspace With this we don't need to construct a sstring just to call set_keyspace. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Rafael Ávila de Espíndola	35089447cd	cql3: Pass a string_view to keyspace_element_name::to_internal_name This moves the string copy from the callers to the implementation of to_internal_name. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Botond Dénes	5b0cfbb51f	test/boost/mutation_reader_test: test_multishard_streaming_reader: use caller's priority class Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200228073239.475778-1-bdenes@scylladb.com>	2020-02-28 16:39:30 +01:00
Avi Kivity	134d5a5f75	Merge "flat_mutation_reader: abort reverse reads when size of mutation exceeds limit" from Botond " Reverse queries work by reading an entire partition into memory, then start emitting its rows in reverse order. It is easy to see how this can lead to disasters combined with large partitions. In fact a handful of such reverse queries on large partitions is enough to bring a node down. To prevent this, abort reverse queries, when we find out that the size of the partition is larger than a limit. This might be annoying to users, but I'm sure it is not as annoying as their nodes going down. The limit is configurable via `max_memory_for_unlimited_query` configuration option, which is 1MB by default. This limit is propagated to each table, system tables having no limit. This limit is planned to be used by other queries capable of consuming unlimited amount of memory, like unpaged queries. Not in this series. The proper solution would be to read the data in reverse (#1413), but that is a major effort. In the meanwhile make sure the unsuspecting user won't bring their nodes down with an innocent looking ordering directive. Note that for calculating the memory footprint of the partition-in-question, only the clustering rows are used. This should be fine, the 1MB limit is conservative enough that an eventual overshoot caused by the omitted range tombstones and the static row would not make a big difference. Fixes: #5804 " * 'limit-reverse-query-memory-consumption/v3' of https://github.com/denesb/scylla: flat_mutation_reader: make_reversing_reader(): add memory limit db/config: add config memory limit of otherwise unlimited queries utils::updateable_value: add operator=(T) flat_mutation_reader: expose reverse reader as a standalone reader	2020-02-28 07:57:13 +02:00
Rafael Ávila de Espíndola	e670dfc0cd	auth: Fix static initialization order problem A static constructor was used to initialize update_row_query. That constructor would call meta::roles_table::qualified_name() which would access AUTH_KS which is also initialized by a static constructor in another file, so the construction order is not guaranteed. This change turns update_row_query into a function with a static local variable in it. The static local is initialized at first use, fixing the problem. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200227163916.19761-1-espindola@scylladb.com>	2020-02-28 07:57:13 +02:00
Nadav Har'El	7953f7c65f	merge "alternator: Make parsing yieldable" Merged patch series by Piotr Sarna: This series makes json parsing yieldable in order to prevent reactor stalls. It's done by: 1. Extracting the parsing stage out of alternator executor 2. Moving the parsing stage to a separate service, which uses a static seastar thread (parallelism: 1) 3. Wrapping rjson parsing routines with a yieldable parser, which takes advantage of running in a seastar thread and occasionally performs maybe_yield() Step 2 above is only used for JSON's big enough to potentially create stalls - small requests will be parsed immediately, without being redirected to a static thread. Handling a PutItem operation with large JSONs on my machine takes approximately: 1MB doc: ~30ms 3MB doc: ~90ms 12MB doc: ~350ms out of which parsing itself is around: 1MB doc: ~7ms 3MB doc: ~20ms 12MB doc: ~80ms (bonus: 400KiB doc: ~2ms) ; the document was a single object full of small items, which triggers many allocations during parsing. The above numbers were roughly the same before and after the series, but the 12MB document did not cause reactor stalls after the patch. Note: writing the JSON can still be a source of stalls, especially for large documents. Note2: DynamoDB limits single value size to 400KiB, but for batches it will be 16MiB total request size Note3: If parallelism ever proves to be an issue, it's easily increasable by spawning more static threads. Refs: #5742 Tests: alternator(local) manual Piotr Sarna (12): alternator: break lines in server callbacks alternator: allow moving the request from rmw operation alternator: move parsing in front of executor alternator: convert parse to std::string_view alternator: implement json parser inside the server alternator: remove rjson::parse_raw alternator: make rjson yieldable in thread context alternator: fix returning raw JSON errors alternator: change json errors class to SerializationException alternator-test: rename large requests test to 'manual requests' alternator-test: extract getting signed request helper alternator-test: add tests for incorrect JSON documents ...ge_requests.py => test_manual_requests.py} \| 53 +++-- alternator/executor.cc \| 203 ++++++++---------- alternator/executor.hh \| 33 +-- alternator/rjson.cc \| 47 +++- alternator/rjson.hh \| 7 +- alternator/rmw_operation.hh \| 1 + alternator/serialization.cc \| 9 +- alternator/server.cc \| 111 ++++++++-- alternator/server.hh \| 20 +- 9 files changed, 310 insertions(+), 174 deletions(-) rename alternator-test/{test_large_requests.py => test_manual_requests.py} (70%)	2020-02-28 07:57:13 +02:00
Benny Halevy	b31867eafa	types: tri_compare: turn marshal_exception to on_internal_error We see this exception on gemini testing with large number of pk, ck, columns, for example: 2020-02-19T17:52:54+00:00 gemini-8h-large-num-columns-GeminiL-db-node-f2d6a8e0-3 !ERR \| scylla: [shard 0] storage_proxy - Exception when communicating with 10.0.207.169: std::runtime_error (marshaling error: read_simple_exactly - size mismatch (expected 4, got 1) Backtrace: 0x2c4f08d#012 0x9fcd3e#012 0x444b28#012 0x4d8fe5#012 0xa78e8b#012 0xeab269#012 0xc27a67#012 0xc28239#012 0xc600e3#012 0xadebf3#012 0xae14c1#012 0x29ff291#012 0x29ff49f#012 0x2a3fc65#012 0x29a5d6f#012 0x29a6e9e#012 0x72a4e3#012 /opt/scylladb/libreloc/libc.so.6+0x271a2#012 0x77548d#012) Decoded backtrace: seastar::current_backtrace() at crtstuff.c:? seastar::internal::backtraced<marshal_exception>::backtraced<seastar::basic_sstring<char, unsigned int, 15u, true> >(seastar::basic_sstring<char, unsigned int, 15u, true>&&) at crtstuff.c:? void seastar::throw_with_backtrace<marshal_exception, seastar::basic_sstring<char, unsigned int, 15u, true> >(seastar::basic_sstring<char, unsigned int, 15u, true>&&) at crtstuff.c:? abstract_type::compare(std::basic_string_view<signed char, std::char_traits<signed char> >, std::basic_string_view<signed char, std::char_traits<signed char> >) const [clone .cold] at types.cc:? bound_view::tri_compare::operator()(clustering_key_prefix const&, int, clustering_key_prefix const&, int) const at crtstuff.c:? sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at crtstuff.c:? mutation_reader_merger::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at crtstuff.c:? combined_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at crtstuff.c:? restricting_mutation_reader::fast_forward_to(position_range, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at crtstuff.c:? cache::cache_flat_mutation_reader::do_fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at crtstuff.c:? This patch should help us get a core dump if this happens again. Ref #5856 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200227131939.388770-1-bhalevy@scylladb.com>	2020-02-28 07:57:13 +02:00
Piotr Sarna	b461750ae3	alternator-test: add tests for incorrect JSON documents The test case sends incorrectly formed JSON documents to alternator, expecting a serialization exception as a response.	2020-02-28 07:57:12 +02:00
Raphael S. Carvalho	40e75fb109	streaming/stream_transfer_task: avoid pointless iterations in has_relevant_range_on_this_shard() When has_relevant_range_on_this_shard() found a relevant range, it will unnecessarily iterate through the end. Verified manually that this could be thousands of pointless iterations when streaming data to a node just added. The relevant code could be simplified by de-futurizing it but I think it remains so to allow task scheduler to preempt it if necessary. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200220224048.28804-2-raphaelsc@scylladb.com>	2020-02-28 07:57:12 +02:00
Piotr Sarna	79b04aeba9	alternator-test: extract getting signed request helper A helper function for getting custom requests is extracted to top-level, in order to be used later by other test cases.	2020-02-28 07:57:12 +02:00
Raphael S. Carvalho	8a986bc23b	streaming/stream_transfer_task: avoid unecessary copies of ranges Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200220224048.28804-1-raphaelsc@scylladb.com>	2020-02-28 07:57:12 +02:00
Piotr Sarna	ad48328407	alternator-test: rename large requests test to 'manual requests' This test suite can then be the parent of tests which use custom, potentially not validated input in order to test alternator against data not easy to push via boto3 or Python, due to their implementation details.	2020-02-28 07:57:12 +02:00
Piotr Sarna	ccdf519829	alternator: make alternator server sharded Previously, alternator server was not directly sharded - and instead kept a helper http server control class, which stored sharded http server inside. That design is confusing and makes it hard to expand alternator server with new sharded attributes, so from now on the alternator server is itself sharded<>. Tests: alternator-test(local, smp==1&smp==4) Fixes #5913 Message-Id: <b50e0e29610c0dfea61f3a1571f8ca3640356782.1582788575.git.sarna@scylladb.com>	2020-02-28 07:57:12 +02:00
Piotr Sarna	c370586189	alternator: change json errors class to SerializationException In order to be consistent with DynamoDB - a parsing error on incorrect JSON input is reported as SerializationException instead of ValidationException.	2020-02-28 07:57:12 +02:00
Piotr Sarna	6f8c70d54b	alternator: fix returning raw JSON errors A couple of places in executor code leaked raw JSON errors to the user instead of formulating a proper ValidationException message. These places are now fixed, and the next patch in this series will act as a regression checker, since all JSON errors will be returned as SerializationException, not ValidationException instances.	2020-02-28 07:57:12 +02:00
Piotr Sarna	1be1cfc5d8	alternator: make rjson yieldable in thread context In order to fight reactor stalls, rjson parsing and writing routines can now yield if they run in seastar thread context. In order to run a yieldable version of the parser which needs to be run in seastar thread context, use parse_yieldable() instead of parse().	2020-02-28 07:57:12 +02:00
Piotr Sarna	0af8516675	alternator: remove rjson::parse_raw With parse() being based on std::string_view, there's not much sense in keeping a separate parse_raw function, so it's deleted.	2020-02-28 07:57:12 +02:00
Piotr Sarna	aad6c01b98	alternator: implement json parser inside the server The json parser runs in a static thread which accepts and parses documents. Documents smaller than a parsing threshold (currently: 16KiB) will be parsed in place without yielding. The assumption is that most alternator requests are small and there's no need to parse them in a yieldable way, which also induces overhead. For reference, parsing a 128KiB document made of many small objects with rapidjson takes around 0.5 millisecond, and a 16KiB document is parsed in around 0.06ms - a value small enough not to disturb Seastar's current value of 0.5ms task quota too much.	2020-02-28 07:57:12 +02:00
Piotr Sarna	ffdbbc0ad0	alternator: convert parse to std::string_view The original implementation used const std::string&, which is less versatile.	2020-02-28 07:57:12 +02:00
Piotr Sarna	2402955d45	alternator: move parsing in front of executor Parsing a request string into JSON happens as a first thing in every request, so it can be performed before calling any executor callbacks. The most important thing however, is that making parsing a separate stage allows certain optimizations, e.g. running all parsing in a single seastar thread, which allows adding yields to rjson parsing later.	2020-02-28 07:57:12 +02:00
Piotr Sarna	c20432bcac	alternator: allow moving the request from rmw operation In order to elide copying the JSON value when rerouting the operation to another shard - a way to move the parsed request from the operation is added.	2020-02-28 07:57:12 +02:00
Piotr Sarna	c7a8549270	alternator: break lines in server callbacks The lines are about to get longer, so they are broken as a first step, to make the next commits more clear.	2020-02-28 07:57:12 +02:00
Botond Dénes	1073094f04	database: database::query(), database::apply(): remove default timeouts	2020-02-27 19:14:12 +02:00
Botond Dénes	2c1ee7b9cd	database: table::query(): remove default timeout	2020-02-27 19:14:09 +02:00
Botond Dénes	8da88e6cb9	mutation_query: data_query(): remove default timeout	2020-02-27 19:02:40 +02:00
Botond Dénes	fdb45d16de	mutation_query: mutation_query(): remove default timeout	2020-02-27 18:56:30 +02:00
Botond Dénes	72509911d9	multishard_mutation_query: query_mutations_on_all_shards(): remove default timeout	2020-02-27 18:45:15 +02:00
Botond Dénes	f6013a39ec	reader_concurrency_semaphore: wait_admission(): remove default timeout	2020-02-27 18:43:12 +02:00
Botond Dénes	93039a085d	utils/logallog: run_when_memory_available(): remove default timeout	2020-02-27 18:36:32 +02:00
Botond Dénes	7bdeec4b00	flat_mutation_reader: make_reversing_reader(): add memory limit If the reversing requires more memory than the limit, the read is aborted. All users are updated to get a meaningful limit, from the respective table object, with the exception of tests of course.	2020-02-27 18:11:54 +02:00
Botond Dénes	75efa707ce	db/config: add config memory limit of otherwise unlimited queries We have a few kind of queries whose memory consumption is not limited at all. One of these is reverse queries, which reads entire partitions into memory, before reversing them. These partitions can be larger than memory and thus such a query can single-handedly cause OOM. This patch introduces a configuration for a memory limit for such queries. This will serve as a hard limit and queries which attempt to use more memory than this, will be aborted. The limit is propagated to table objects, with the intention of keeping system tables unlimited. These tables are usually small and initiators of system queries are not prepared for failures.	2020-02-27 18:11:54 +02:00
Botond Dénes	d1194da98d	utils::updateable_value: add operator=(T) Allow assigning a const value.	2020-02-27 18:11:54 +02:00
Botond Dénes	091d80e8c3	flat_mutation_reader: expose reverse reader as a standalone reader Currently reverse reads just pass a flag to `flat_mutation_reader::consume()` to make the read happen in reverse. This is deceptively simple and streamlined -- while in fact behind the scenes a reversing reader is created to wrap the reader in question to reverse partitions, one-by-one. This patch makes this apparent by exposing the reversing reader via `make_reversing_reader()`. This now makes how reversing works more apparent. It also allows for more configuration to be passed to the reversing reader (in the next patches). This change is forward compatible, as in time we plan to add reversing support to the sstable layer, in which case the reversing reader will go.	2020-02-27 18:11:54 +02:00
Dejan Mircevski	0d7457946f	cql3: Allow repeated LIKE on same column No reason to disallow this. We still forbid mixing LIKE and non-LIKE relations on the same column. Fixes #5902. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-02-27 09:34:51 -05:00
Pekka Enberg	109bb1baa6	cql3: Switch from distributed<> to seastar::sharded<> Convert the last instance of "distributed<>" in cql3 to seastar::sharded<>. Message-Id: <20200227092804.27374-1-penberg@scylladb.com>	2020-02-27 12:09:59 +02:00
Pekka Enberg	123b50cdb9	configure.py: Disable package registry when building Seastar The CMake build system in seastar.git exports the package to CMake package registry. However, we don't use it when building from scylla.git (we link to seastar directly) and get the following warning when building with "dbuild" (that does not bind mount $HOME/.cmake): CMake Warning at CMakeLists.txt:1180 (export): Cannot create package registry file: /home/penberg/.cmake/packages/Seastar/3b6ede62290636bbf1ab4f0e4e6a9e0b No such file or directory Let's just disable the package registry for our builds by setting the CMAKE_EXPORT_NO_PACKAGE_REGISTRY CMake option as discussed here to make the warning go away: https://cmake.org/cmake/help/v3.4/variable/CMAKE_EXPORT_NO_PACKAGE_REGISTRY.html Message-Id: <20200227092743.27320-1-penberg@scylladb.com>	2020-02-27 12:09:59 +02:00
Takuya ASADA	01a03c4d69	install.sh: run post-install script just like .rpm/.deb package To install scylla using install.sh easily, we need to run following things: - add scylla user/group - configure scylla.yaml - run scylla_post_install.sh But we don't want to run them when we build .rpm/.deb package, we also need to add --packaging option to skip them. Fixes #5830	2020-02-27 11:17:24 +02:00
Dejan Mircevski	acccab31f7	cql3: Forbid calling LIKE::values() We were incorrectly returning the LIKE pattern as if it were a column value. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-02-26 14:07:46 -05:00
Dejan Mircevski	fd583196ce	cql3: Move LIKE::_last_pattern to matcher Instead of keeping the LIKE pattern in a restriction object (as we currently do), keep it in like_matcher. Also move the pattern-idempotence check from the restriction to the matcher. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-02-26 14:00:04 -05:00
Avi Kivity	956b092012	Merge "Repair based node operation" from Asias " Here is a simple introduction to the node operations scylla supports and some of the issues. - Replace operation It is used to replace a dead node. The token ring does not change. It pulls data from only one of the replicas which might not be the latest copy. - Rebuild operation It is used to get all the data this node owns form other nodes. It pulls data from only one of the replicas which might not be the latest copy. - Bootstrap operation It is used to add a new node into the cluster. The token ring changes. Do no suffer from the "not the latest replica” issue. New node pulls data from existing nodes that are losing the token range. Suffer from failed streaming. We split the ranges in 10 groups and we stream one group at a time. Restream the group if failed, causing unnecessary data transmission on wire. Bootstrap is not resumable. Failure after 99.99% of data is streamed. If we restart the node again, we need to stream all the data again even if the node already has 99.99% of the data. - Decommission operation It is used to remove a live node form the cluster. Token ring changes. Do not suffer “not the latest replica” issue. The leaving node pushes data to existing nodes. It suffers from resumable issue like bootstrap operation. - Removenode operation It is used to remove a dead node out of the cluster. Existing nodes pulls data from other existing nodes for the new ranges it own. It pulls from one of the replicas which might not be the latest copy. To solve all the issues above. We could use repair based node operation. The idea behind repair based node operations is simple: use repair to sync data between replicas instead of streaming. The benefits: - Latest copy is guaranteed - Resumable in nature - No extra data is streamed on wire E.g., rebuild twice, will not stream the same data twice - Unified code path for all the node operations - Free repair operation during bootstrap, replace operation and so on. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test " * 'repair_for_node_ops' of https://github.com/asias/scylla: docs: Add doc for repair_based_node_ops storage_service: Enable node repair based ops for bootstrap storage_service: Enable node repair based ops for decommission storage_service: Enable node repair based ops for replace storage_service: Enable node repair based ops for removenode storage_service: Enable node repair based ops for rebuild storage_service: Use the same tokens as previous bootstrap storage_service: Add is_repair_based_node_ops_enabled helper config: Add enable_repair_based_node_ops repair: Add replace_with_repair repair: Add rebuild_with_repair repair: Add do_rebuild_replace_with_repair repair: Add removenode_with_repair repair: Add decommission_with_repair repair: Add do_decommission_removenode_with_repair repair: Add bootstrap_with_repair repair: Introduce sync_data_using_repair repair: Propagate exception in tracker::run	2020-02-26 20:37:25 +02:00
Avi Kivity	35e5772b94	Update seastar submodule * seastar 7a3b4b4e4e...affc3a5107 (6): > Merge "Add the possibility to remove rules from routes" from Pavel > stall_detector: expose correct clock type to use > queue: add has_blocked_consumer() function > Merge "core: reduce memory use for idle connections" from Avi > testing: Enable abort_on_internal_error on tests > core: Add a on_internal_error helper	2020-02-26 19:21:24 +02:00
Rafael Ávila de Espíndola	17f12a8197	perf_simple_query: Call set_abort_on_internal_error(true) We should never ignore an internal error in a perf test. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200225055745.321086-2-espindola@scylladb.com>	2020-02-26 18:22:05 +02:00
Rafael Ávila de Espíndola	c6897dcbea	perf_simple_query: Simplify with seastar::thread There is no reason not to use a seastar::thread in setup code. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200225055745.321086-1-espindola@scylladb.com>	2020-02-26 18:22:04 +02:00
Nadav Har'El	3e44356c9f	alternator-test: fix tests failing with HTTPS When we test Alternator on its HTTPS port (i.e., pytest --https), we don't want requests to verify the pedigree of the SSL certificate. Our "dynamodb" fixture (conftest.py) takes care of this for most of the tests, but a few tests create their own requests and need to pass the "verify=False" option on their own. In some tests, we forgot to do this, and this patch fixes three tests which failed with "pytest --https". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200226142330.27846-1-nyh@scylladb.com>	2020-02-26 15:29:24 +01:00
Nadav Har'El	cf8354f703	merge "cdc: Fix `operation` value for row deletes" Merged pull request https://github.com/scylladb/scylla/pull/5897 from Juliusz Stasiewicz: Column operation now contains operation::row_delete (== 2) after queries like delete from tbl where pk=x and ck=y;. Before this patch row deletes were treated as updates, which was incorrect because updates do not contain row tombstones (and row deletes do). Refs #5709	2020-02-26 16:26:34 +02:00
Juliusz Stasiewicz	f425f7d217	tests/cdc: added test for row delete <-> update differentiation	2020-02-26 12:32:16 +01:00
Juliusz Stasiewicz	836183b847	cdc: fix `operation` value for row deletes Column `operation` now contains `operation::row_delete` (== 2) after queries like `delete from tbl where pk=x AND ck=y;`. Before this patch row deletes were treated as updates, which was incorrect because updates do not contain row tombstones (and row deletes do). Refs #5709	2020-02-26 11:58:50 +01:00
Nadav Har'El	6da4d65f12	merge: Fix alternator decommision/shutdown Merged patch series from Piotr Sarna: Alternator shutdown routines were only registered in main.cc, but it's not enough - other operations, like decommision, also rely on shutting down client servers. In order to remedy the situation, a notion of client shutdown listeners is introduced to storage service. A shutdown listener implements a callback used by the storage service when client servers need to shut down, and at the same time it does not force storage service to keep a reference for the client service itself. NOTE: the interface can also be used later to provide proper shutdown routines for redis and any other future APIs. Fixes #5886 Tests: alternator-test(local, including a shutdown during the run) Piotr Sarna (4): storage_service: make shutdown_client_servers() thread-only storage_service: add client shutdown hook main: make alternator shutdown hook-based main: reduce scope of alternator services main.cc \| 18 +++++++++--------- service/storage_service.cc \| 22 +++++++++++++++++----- service/storage_service.hh \| 15 ++++++++++++++- 3 files changed, 40 insertions(+), 15 deletions(-)	2020-02-26 12:45:30 +02:00
Botond Dénes	a83cca93ff	scylla-gdb.py: introduce std_deque A python read-only container wrapper for std::deque. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200225184951.125129-1-bdenes@scylladb.com>	2020-02-26 11:20:50 +01:00
Takuya ASADA	65aadad9a6	dist/common/scripts/scylla_coredump_setup: bind-mount coredump directory, add coredump test On some environment systemd-coredump does not work with symlink directory, we can use bind-mount instead. Also, it's better to check systemd-coredump is working by generating coredump. Fixes #5753	2020-02-26 11:21:48 +02:00
Takuya ASADA	8e901636fc	scylla_setup: fix --nic option on non-interactive mode scylla_setup should not shows up NIC selection prompt on non-interactive mode. Fixes #5725 Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2020-02-26 11:13:53 +02:00
Piotr Sarna	148456a741	main: reduce scope of alternator services With the new shutdown routines in place, alternator executor and server do not need to be declared outside of the `if` clause which conditionally sets up alternator.	2020-02-26 08:45:07 +01:00
Piotr Sarna	33ce8379ba	main: make alternator shutdown hook-based In order to properly handle not only shutdown, but also decommission, drain and similar operations, alternator shutdown is now registered as a client shutdown hook, which allows storage service to trigger its shutdown routines. Fixes #5886	2020-02-26 08:44:56 +01:00
Piotr Sarna	8d499603aa	storage_service: add client shutdown hook The shutdown hook interface can be used later by additional client interfaces (e.g. alternator, redis) to register shutdown routines for various operations: Scylla shutdown, node decommission, drain, etc. It also decouples the services themselves from being part of the storage service, since it's huge enough as it is.	2020-02-26 08:44:35 +01:00
Piotr Sarna	171bc9a3df	storage_service: make shutdown_client_servers() thread-only The function is only ever called in thread context, so it's moved from being future<>-based in order to ease future changes.	2020-02-26 08:18:42 +01:00
Nadav Har'El	0ab6c7fcef	alternator: stricter checks for user-supplied attribute values Until now, PutItem or UpdateItem could be used to insert almost any JSON as an attribute's value - even those that do not match DynamoDB's typed value specification. Among other things, the new validation allows us to reject empty sets, strings or byte arrays - which are (somewhat artificially) forbidden in DynamoDB. Also added tests for the empty sets, strings and byte arrays that should be rejected. Fixes #5896 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200225150525.4926-1-nyh@scylladb.com>	2020-02-26 08:12:26 +01:00
Nadav Har'El	6339f419ac	alternator: removing all elements from a set should delete it DynamoDB does not support empty sets. Operations which remove elements from a set attribute should remove the attribute when the last item is removed - not leave an empty set as it incorrectly does now. Incidentally, the same patch fixes another bug - deleting elements from a non-existent set attribute should be allowed (and do nothing), not fail as it does now. This patch also includes tests for both bugs. Fixes #5895 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200225125343.31629-1-nyh@scylladb.com>	2020-02-26 08:12:19 +01:00
Nadav Har'El	acb7f45ca7	alternator-test: add tests for UpdateItem's AttributeUpdates DELETE and ADD We have not yet implemented the DELETE-with-value and ADD operations in UpdateItem's old-style "AttributeUpdates" parameter - see issue #5864 and issue #5893, respectively This patch include comprehensive tests for both features. The new tests pass on DynamoDB, but currently xfails on Alternator - until these features will be implemented. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200225105546.25651-1-nyh@scylladb.com>	2020-02-26 08:12:10 +01:00
Botond Dénes	ea08d7a0df	scylla-gdb.py: make get_text_range() more reliable Currenly `get_text_range()` uses heuristics about which ELF section actually contains the text for the main executable. It appears that this fails from time-to-time and we have to adjust the heuristics. We don't really have to guess however, a much better method of determining the section hosting text is to find a vtable pointer and locate the section it resides in. For this, we use the `reactor::_backend` as a canary. When this is not available, we fall back to the pre-existing heuristics. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200225164719.114500-1-bdenes@scylladb.com>	2020-02-25 19:02:26 +01:00
Calle Wilund	a3a764fd10	cdc: Handle non-atomic columns Fixes #5669 This implements non-atomic collection and UDT handling for both cdc preimage + delta. To be able to express deltas in a meaningful way (and reconstruct using it), non-atomic values are represented somewhat differently from regular values: * maps - stored as is (frozen) * sets - stored as is (frozen) * lists - stored as map<timeuuid, value> (frozen) this allows reconstructing the list, as otherwise things like list[0] = value cannot be represented in a meaningful way * udt - stored as tuple<tuple<field0>, tuple<field1>...> (frozen) UDTs are normally just tuples + metadata, but we need to distinguish the case of outer tuple element == null, meaning "no info/does not partake in mutation" from tuple element being a tuple(null) (i.e. empty tuple), meaning "set field to null"	2020-02-25 19:34:54 +02:00
Avi Kivity	d17ebde46b	Update seastar submodule * seastar 8b6bc659c7...7a3b4b4e4e (3): > Merge "Add custom stack size to seastar threads" from Piotr Ref #5742. > expiring_fifo: Optimize memory usage for single-element lists Ref #4235. > Close connection, when reach to max retransmits	2020-02-25 18:02:25 +02:00
Pavel Emelyanov	7363d56946	sstables: Move get_highest_supported_format The global get_highest_supported_format helper and its declaration are scattered all over the code, so clean this up and prepare the ground for moving _sstables_format from the storage_service onto the sstables_manager (not this set). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:45 +03:00
Pavel Emelyanov	792cec39df	sstables: Remove global get_config() helper Finally, the thing is not used by anyone and can be removed. This greatly relaxes the sstables -> storage_service dependency. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Applauded-by: Benny Halevy <bhalevy@scylladb.com>	2020-02-25 14:31:45 +03:00
Pavel Emelyanov	1af065296e	sstables: Use manager's config() in .new_sstable_component_file() This is the last place left that calls for global get_config(), switch it onto _sst_manager.config(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:43 +03:00
Pavel Emelyanov	5dea657991	sstable_writer_config: Extend with more db::config stuff The enable_sstable_key_validation and summary_bytes_cost are used in sstables writing code, keeping them on sstable_writer_config removes more calls to global get_config(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:34 +03:00
Pavel Emelyanov	85d9326d70	sstables_manager: Don't use global helper to generate writer config The main goal of this patch is to stop using get_config() glbal when creating the sstable_writer_config instance. Other than being global the existing get_config() is also confusing as it effectively generates 3 (three) sorts of configs -- one for scylla, when db config and features are ready, the other one for tests, when no storage service is at hands, and the third one for tests as well, when the storage service is created by test env (likely intentionally, but maybe by coincidence the resulting config is the same as for no-storage-service case). With this patch it's now 100% clear which one is used when. Also this makes half the work of removing get_config() helper. The db::config and feature_service used to initialize the managers are referenced by database that creates and keeps managers on, so the references are safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:04 +03:00
Pavel Emelyanov	3a603729d4	sstable_writer_config: Sanitize out some features fields initialization Similar to previous patch -- initialize config fields from features in configurator, not in default initializers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:04 +03:00
Pavel Emelyanov	34302a3e1c	sstable_writer_config: Factor out some field initialization The promoted_index_block_size is taken from db config in two places. Factor this out and, at the same time, stop keeping it as std::optional. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:04 +03:00
Pavel Emelyanov	5adce3390c	sstables: Generate writer config via manager only The sstable_writer_config creation looks simple (just declare the struct instance) but behind the scenes references storage and feature services, messes with database config, etc. This patch teaches the sstables_manager generate the writer config and makes the rest of the code use it. For future safety by-hands creation of the sstable_writer_config is prohibited. The manager is referenced through table-s and sstable-s, but two existing sstables_managers live on database object, and table-s and sstable-s both live shorter than the database, this reference is save. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:04 +03:00
Pavel Emelyanov	f289da1e3b	sstables: Keep reference on manager This is needed for further patching. The sstables_manager outlives all sstables objects, so it's safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:03 +03:00
Pavel Emelyanov	e73e923e95	test: Re-use existing global sstables_manager The sstables_manager in scylla binary outlives the sstables objects created by it, this makes it possible to add sstable->manager reference and use it. In unit tests there are cases when sstables::test_env that keeps manager in _mgr field is destroyed right after sstable creation (e.g. -- in the boost/sstable_mutation_test.cc ka_sst() helper). Fix this by chaning the _mgr being reference on the manager and initialize it with already existing global manager. Few exceptions from this rule that need to set own large data handler will create the sstable_manager their own. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 13:54:41 +03:00
Pavel Emelyanov	961f1642c7	table: Pass sstable_writer_config into write_memtable_to_sstable The latter creates the config by hands, but the plan is to create it via sstables_manager. Callers of this helper are the final frontiers where the manager will be safely accessible. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 13:54:40 +03:00
Asias He	aaa1f3ce7b	docs: Add doc for repair_based_node_ops This patch adds a doc for the repair based node operations.	2020-02-25 08:54:35 +08:00
Asias He	ac90c1c184	storage_service: Enable node repair based ops for bootstrap - Bootstrap operation It is used to add a new node into the cluster. The token ring changes. Do not suffer from the "not the latest replica” issue. New node pulls data from existing nodes that are losing the token range. Suffer from failed streaming. We split the ranges in 10 groups and we stream one group at a time. Restream the group if failed, causing unnecessary data transmission on wire. Bootstrap is not resumable. Failure after 99.99% of data is streamed. If we restart the node again, we need to stream all the data again even if the node already has 99.99% of the data. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test	2020-02-25 08:54:33 +08:00
Asias He	62f056c022	storage_service: Enable node repair based ops for decommission - Decommission operation It is used to remove a live node form the cluster. Token ring changes. Do not suffer “not the latest replica” issue. The leaving node pushes data to existing nodes. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test	2020-02-25 08:53:37 +08:00
Asias He	a38916121c	storage_service: Enable node repair based ops for replace - Replace operation It is used to replace a dead node. The token ring does not change. It pulls data from only one of the replicas which might not be the latest copy. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test	2020-02-25 08:53:36 +08:00
Glauber Costa	628dd16519	compaction: deprecate DTCS. Step 1. This patch adds a warning of deprecation to DTCS. In a follow up step, we will start requiring a flag for it to be enabled to make sure users notice. For now we'll just be nice and add a warning for the log watchers. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200224164405.9656-1-glauber@scylladb.com>	2020-02-24 20:26:24 +02:00
Takuya ASADA	5a7beef6a0	dist/common/scripts/scylla_coredump_setup: don't create /etc/sysctl.d/99-scylla-coredump.conf on CentOS8 We don't need to create 99-scylla-coredump.conf on CentOS8, the file is only needed for CentOS7. Fixes #5818	2020-02-24 17:38:47 +02:00
Takuya ASADA	fa423e25d4	scylla_setup: shows up usage when --nic is not specified & eth0 is not available Since we set 'eth0' as default NIC name, we get following error when running scylla_setup in non-interactive mode without --nic parameter: $ sudo scylla_setup --setup-nic-and-disks --no-raid-setup --no-verify-package --no-io-setup NIC eth0 doesn't exist. It looks strange since user actually does not specified 'eth0', they might forget to specify --nic. I think we should shows up usage, when eth0 is not available on the system. Fixes #5828	2020-02-24 17:35:40 +02:00
Piotr Dulikowski	41d82e39ea	storage proxy: rename mutate_hint_from_scratch Changes the name of storage_proxy::mutate_hint_from_scratch function to another name, whose meaning is more clear: send_hint_to_all_replicas. Tests: unit(dev)	2020-02-24 17:30:22 +02:00
Takuya ASADA	29285b28e2	dist/debian: fix "unable to open node-exporter.service.dpkg-new" error It seems like .service is conflicting on install time because the file installed twice, both debian/.service and debian/scylla-server.install. We don't need to use *.install, so we can just drop the line. Fixes #5640	2020-02-24 17:28:14 +02:00
Juliusz Stasiewicz	127e258ade	cql3: Fix missing aggregate functions for counters Aggregate functions on counters do not exist. Until now counters could, at best, fall back to blob->blob overloads, e.g.: ``` cqlsh> select max(cnt) from ks.tbl; system.max(cnt) ---------------------- 0x000000000000000a (1 rows) cqlsh> select sum(entities) from ks.tbl; InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid call to function sum, none of its type signatures match [...] ``` Meanwhile, counters are compatible with bigints (aka. `long_type'), so bigint overloads can be used on them (e.g. sum(bigint)->bigint). This is achieved here by a special rule in overload resolution, which makes `selector' perceive counters as an `EXACT_MATCH' to counter's underlying type (`long_type', aka. bigint).	2020-02-24 17:14:44 +02:00
Juliusz Stasiewicz	0ea17216fe	atomic_cell: special rule for printing counter cells Until now, attempts to print counter update cell would end up calling abort() because `atomic_cell_view::value()` has no specialized visitor for `imr::pod<int64_t>::basic_view<is_mutable>`, i.e. counter update IMR type. Such visitor is not easy to write if we want to intercept counters only (and not all int64_t values). Anyway, linearized byte representation of counter cell would not be helpful without knowing if it consists of counter shards or counter update (delta) - and this must be known upon `deserialize`. This commit introduces simple approach: it determines cell type on high level (from `atomic_cell_view`) and prints counter contents by `counter_cell_view` or `atomic_cell_view::counter_update_value()`. Fixes #5616	2020-02-24 17:11:34 +02:00
Benny Halevy	25a763a187	dist/redhat: scylla.spec.mustache: set _no_recompute_build_ids By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with the binary's build-id when stripping its debug info as it is passed the `--build-id-seed <version>.<release>` option. To prevent that we need to set the following macros as follows: unset `_unique_build_ids` set `_no_recompute_build_ids` to 1 Fixes #5881 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-02-24 11:50:20 +02:00
Nadav Har'El	4b7577e429	alternator-test: correct typo "existant" The official documentation language of Scylla is English, not French. So correct the word "existant", which appeared several times throughout Alternator's tests, to "existent". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200221224221.31237-6-nyh@scylladb.com>	2020-02-24 10:40:53 +01:00
Nadav Har'El	e075eff915	alternator: complete implementation of ReturnValues parameter This patch completes the support for the ReturnValues parameter for the UpdateItem operation. This parameter has five settings - NONE, ALL_OLD, ALL_NEW, UPDATED_OLD and UPDATED_NEW. Before this patch we already supported NONE and ALL_OLD - and this patch completes the support for the three remaining modes: ALL_NEW, UPDATED_OLD and UPDATED_NEW. The patch also continues to improve test_returnvalues.py with additional corner cases discovered during the development. After this patch, only one xfailing test remains - testing updates to nested document paths, which we do not yet support (even without the ReturnValues parameter). After this patch, the support of ReturnValues is complete - for all operations (UpdateItem, PutItem and DeleteItem) and all of its possible settings. Fixes #5053 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200221224221.31237-5-nyh@scylladb.com>	2020-02-24 10:40:53 +01:00
Nadav Har'El	1e500a2a34	alternator: rjson: another variant of set_with_string_name() utility The rjson::set_with_string_name() utility function copies the given string into the JSON key. The existing implementation required that this input string be an std::string&, but a std::string_view would be fine too, and I want to use it in new code to avoid yet another unnecessary copy. Adding the overloads also exposes a few places where things were implicitly converted to std::string and now cause an ambiguity - and clearing up this ambiguity also allowed me to find places where this conversion was unnecessary. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200221224221.31237-4-nyh@scylladb.com>	2020-02-24 10:38:54 +01:00
Nadav Har'El	fa5c2a4f58	alternator: UpdateItem only deleting attribute shouldn't create item UpdateItem operations usually need to add a row marker: * An empty UpdateItem is supposed to create a new empty item (row). Such an empty item needs to have a row marker. * An UpdateItem to add an attribute x and then later an UpdateItem to remove this attribute x should leave an empty item behind. This means the first UpdateItem needed to add a row marker, so it will be left behind after the second UpdateItem. So the existing code always added a row marker in UpdateItem. However, there is one case where we should NOT create the row marker: When the UpdateItem operation only has attribute deletions, and nothing else, and it is applied to a key with no pre-existing item, DynamoDB does not create this item. So neither should we. This patch includes a new test for this test_update_item_non_existent, which passes on DynamoDB, failed on Alternator before this patch, and passes after the patch. Fixes #5862. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200221224221.31237-3-nyh@scylladb.com>	2020-02-24 10:38:10 +01:00
Nadav Har'El	3cde949980	alternator-test: test for BatchWriteItem same key in two tables In issue #5698 I raised a theory that we might have a bug when BatchWriteItem is given two writes to the same key but in two different tables. The test added here verifies that this theory was wrong, and this case already works correctly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200221224221.31237-2-nyh@scylladb.com>	2020-02-24 10:37:23 +01:00
Piotr Sarna	5e07c00eeb	Merge 'Delete table snapshot' from Amnon This series adds an option to the API that supports deleting a specific table from a snapshot. The implementation works in a similar way to the option to specify specific keyspaces when deleting a snapshot. The motivation is to allow reducing disk-space when using the snapshot for backup. A dtest PR is sent to the dtest repository. Fixes #5658 Original PR #5805 Tests: (database_test) (dtest snapshot_test.py:TestSnapshot.test_cleaning_snapshot_by_cf) * amnonh/delete_table_snapshot: test/boost/database_test: adopt new clear_snapshot signature api/storage_service: Support specifying a table when deleting a snapshot storage_service: Add optional table name to clear snapshot * amnonh/delete_table_snapshot: test/boost/database_test: adopt new clear_snapshot signature api/storage_service: Support specifying a table when deleting a snapshot storage_service: Add optional table name to clear snapshot	2020-02-24 09:38:57 +01:00
Pekka Enberg	263261fa15	README: Remove out-of-date package build instructions The package build instructions in README.md are out-of-date so let's remove them. Message-Id: <20200224064632.3285-1-penberg@scylladb.com>	2020-02-24 10:25:07 +02:00
Pekka Enberg	684e4602dc	redis: Fix DB index error message The error message (silently) changed to "DB index is out of range" the following commit: `c7a4e694ad` The new error message is part of Redis 4.0, released in 2017, so let's switch Scylla to use the new one. Message-Id: <20200211133946.746-1-penberg@scylladb.com>	2020-02-24 10:22:27 +02:00
Pavel Emelyanov	60bdf0685c	cql3: Clean cql3/ from remaining storage_service mentionings These are several #include-s and the no longer valid comment. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 11:17:47 +03:00
Pavel Emelyanov	d639d4ed5f	cql3: Parse cf name in drop_index_satement::validate The patch `759752947b` explains why the .column_family method of this statament implementation must be tuned to calculate the column_family in some cases. However, to do this the global storage_proxy is needed. The proposal is to calculate the column_family in .validate method, like it's done e.g. for function_statement-s, which has storage_proxy reference at hands. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 11:17:47 +03:00
Pavel Emelyanov	a0a0d40267	cql3: Use proxy arg in batch_statement::verify_batch_size Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 11:17:47 +03:00
Pavel Emelyanov	bf7004326e	cql3: Use proxy arg in drop_index_statement::lookup_indexed_table Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 11:17:47 +03:00
Pavel Emelyanov	9bb67b5771	cql3: Don't get global storage_proxy Get rid of numerous calls to get_local_stroage_proxy().get_db() and use the storage proxy argument that's already avaliable in most of them. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 11:17:47 +03:00
Pavel Emelyanov	6892dbdde7	cql3: Add storage_proxy argument to .check_access method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-24 11:17:19 +03:00
Asias He	f4b4192c91	storage_service: Enable node repair based ops for removenode - Removenode operation It is used to remove a dead node out of the cluster. Existing nodes pulls data from other existing nodes for the new ranges it own. It pulls from one of the replicas which might not be the latest copy. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test	2020-02-24 11:11:41 +08:00
Asias He	cf0601735e	storage_service: Enable node repair based ops for rebuild - Rebuild operation It is used to get all the data this node owns form other nodes. It pulls data from only one of the replicas which might not be the latest copy. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test	2020-02-24 11:11:41 +08:00
Asias He	3b64b4bb17	storage_service: Use the same tokens as previous bootstrap With repair based node operations, we can resume previous failed bootstrap. In order to do that, we need the bootstrap node uses the same tokens as previous bootstrap. Currently, we always use new tokens when we bootstrap, because we need to stream all the ranges anyway. It does not matter if we use the same tokens or not.	2020-02-24 11:11:41 +08:00
Asias He	a4c614914a	storage_service: Add is_repair_based_node_ops_enabled helper It is used to check if repair based node operations are enabled or not.	2020-02-24 11:11:40 +08:00
Asias He	cb4045e11d	config: Add enable_repair_based_node_ops An option to enable the repair based node operations.	2020-02-24 11:11:40 +08:00
Asias He	1672f64add	repair: Add replace_with_repair It is used to replace a dead node using repair instead of using stream_plan.	2020-02-24 11:11:40 +08:00
Asias He	960ce7ab54	repair: Add rebuild_with_repair It is used to rebuild a node using repair instead of using stream_plan.	2020-02-24 11:11:40 +08:00
Asias He	b488ab7d11	repair: Add do_rebuild_replace_with_repair The rebuild and replace operations are similar because the token ring does not change for both of them. Add a common helper to do rebuild and replace with repair. It will be used by rebuild and replace operation shortly.	2020-02-24 11:11:40 +08:00
Asias He	b18e078ca2	repair: Add removenode_with_repair It is used to remove a dead node from a cluster using repair instead of using stream_plan.	2020-02-24 11:11:40 +08:00
Asias He	e9a9fde1f7	repair: Add decommission_with_repair It is used to decommission a node using repair instead of using stream_plan.	2020-02-24 11:11:40 +08:00
Asias He	569c126a84	repair: Add do_decommission_removenode_with_repair It will be used by decommission and removenode operation shortly.	2020-02-24 11:11:40 +08:00
Asias He	9c67389cc8	repair: Add bootstrap_with_repair It is used to bootstrap a node using repair instead of using stream_plan.	2020-02-24 11:11:40 +08:00
Asias He	198cad6179	repair: Introduce sync_data_using_repair It is used to sync data for node operations like bootstrap, decommission and so on. Unlike plain repair operation, the user of sync_data_with_repair() can pass repair_neighbors object to specify the pre-calculated neighbors for a range. If a mandatory neighbor is not available, the repair will fail so that the upper layer can fail the node operation.	2020-02-24 11:11:40 +08:00
Asias He	1038e375af	repair: Propagate exception in tracker::run In sync_data_with_repair, we depends on return future of tracker::run to tell if the repair is successful or not.	2020-02-24 11:11:40 +08:00
Piotr Sarna	14dfa3c0c3	alternator: change keyspace prefix to alternator_ The original idea of prefixing alternator keyspace names with 'a#' leveraged the fact that '#' is not a legal CQL character for keyspace names. The idea is flawed though, since '#' proved to confuse existing Scylla tools (e.g. nodetool). Thus, the prefix is changed to more orthodox 'alternator_'. It is possible to create such keyspaces with CQL as well, but then the alternator CreateTable request would simply fail, because the keyspace already exists, which is graceful enough. Hiding alternator keyspaces and tables from CQL is another issue, but there are other ways to distinguish them than a non-standard prefix, e.g. tags. Fixes #5883	2020-02-23 23:32:29 +02:00
Pavel Emelyanov	049b549fdc	api: Register /v2/config stuff after database is started The set_config registers lambdas that need db.local(), so these routes must be registered after database is started. Fixes: #5849 Tests: unit(dev), manual wget on API Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200219130654.24259-1-xemul@scylladb.com>	2020-02-23 17:09:03 +02:00
Takuya ASADA	3d1154272f	dist/debian: remove unused dependencies Since we moved relocatable package, almost all dependencies are not needed now.	2020-02-23 15:36:13 +02:00
Takuya ASADA	98c182ec67	dist/redhat: align dependencies with debian On Debian, we don't add xfsprogs/mdadm on package dependency, install on scylla_raid_setup script instead. Since xfsprogs/mdadm only needed for constructing RAID, we can move dependencies to scylla_raid_setup too.	2020-02-23 15:34:35 +02:00
Piotr Sarna	4ad577b40c	alternator: add content length limit to alternator servers This patch adds a 16MB content length limit to alternator HTTP(S) servers. It also comes with a test, which verifies that larger requests are refused. Fixes #5832 Tests: alternator-test(local,remote) Message-Id: <29d5708f4bf9f41883d33d21b9cca72b05170e6c.1582285070.git.sarna@scylladb.com>	2020-02-23 14:34:20 +02:00
Piotr Sarna	085cd857ab	alternator-test: limit the number of retries to 3 In order to decrease the developer's time spent on waiting for boto3 to retry the request many times, the retry count is configured to be 3. Two major benefits: - vastly decrease wait time when debugging a failing test - for requests which are expected to fail, but return results not compatible with boto3, execution time is decreased Tests: alternator-test(local,remote) Message-Id: <46a3a9344d9427df7ea55c855f32b8f0e39c9b79.1582285070.git.sarna@scylladb.com>	2020-02-23 14:19:38 +02:00
Pavel Emelyanov	f4e789a9c2	range_streamer: Fix off-by-size in stream progress log The nr_ranges_streamed denotes the number of ranges streamed so far, but by the time the sending lambda is called this counter is already incremented by the number of ranges to be streamed in this call. And the variable is not used for anything else but logging. Fix this by swapping logging with incrementing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200221101601.18779-1-xemul@scylladb.com>	2020-02-23 11:20:17 +02:00
Tomasz Grabiec	3e83d30daf	gdb: scylla sstables: Fix for older versions of GDB Some GDB versions complain about subscript being a gdb.Value Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1582308177-24893-1-git-send-email-tgrabiec@scylladb.com>	2020-02-23 11:17:20 +02:00
Tomasz Grabiec	e7dece7f1e	gdb: scylla sstables: Allow locating sstables attached to tables This patch adds an alternative way to locate sstables by looking at sstable sets in table objects: scylla sstables -t This may be useful for several things. One is to identify sstables which are not attached to tables. Another use case is to be able to use the command on older versions of scylla which don't have sstable tracking. Message-Id: <1582308099-24563-1-git-send-email-tgrabiec@scylladb.com>	2020-02-23 11:16:20 +02:00
Piotr Sarna	e1ecd0d637	doc: refer to dev build mode instead of release The paragraph about adding `Tests:` footer imply that it's preferred to run tests in release mode, while dev is equally good and compiles faster. Message-Id: <9e1ad1a4e1529d30abb3adb1923b007c52ccf955.1582282066.git.sarna@scylladb.com>	2020-02-23 11:11:44 +02:00
Rafael Ávila de Espíndola	fc018a73bb	build: Add the --enable-stack-guards and --disable-stack-guards options I neither is used, we get the default behavior: only release is built without stack guards. With --disable-stack-guards all modes are built without stack guards. With --enable-stack-guards all modes are built with stack guards. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200222012732.992380-1-espindola@scylladb.com>	2020-02-23 11:05:13 +02:00
Avi Kivity	197adf4c0d	Update seastar submodule * seastar cdda3051e3...8b6bc659c7 (2): > core/file-types.hh: Fix missing header > cmake: Add a Seastar_STACK_GUARDS cmake option	2020-02-23 11:03:59 +02:00
Tomasz Grabiec	3a4597f8f3	Merge remote-tracking branch 'xemul/br-repair-remove-storage-service' into next	2020-02-23 10:29:34 +02:00
Pavel Emelyanov	897bbeabea	storage_service: Relax _is_bootstrap_mode The variable in question was used to check that the bootstrap mode finishes correctly, but it was removed, becase this check was for self-evident code and thus useless (`dbca327b`) Later, the patch was reverted to keep track the bootstrap mode for API is_cleanup_allowed call (`a39c8d0e`) This patch is a reworked combination of both -- the variable is kept for API sake, but in a much simpler manner. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200221101813.18945-1-xemul@scylladb.com>	2020-02-23 10:26:50 +02:00
Pavel Emelyanov	a364190700	storage_service: Remove if-0-ed-out Java code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200221101704.18868-1-xemul@scylladb.com>	2020-02-23 10:26:50 +02:00
Pavel Emelyanov	38143a76c7	main: Register stop_gossiping earlier The _scheduled_gossip_task timer needs token_metadata and thus should be stopped before. However, this is not always the case. The timer is armed in start_gossiping, which is called by storage_service init_server_without_the_messaging_service_part, and is canceled inside stop_gossiping, which in turn is called by drain_on_shutdown, which in turn is registered too late. If something fails between the internals of the init_server_... and defered registration of drain_on_shutdown (lots of reasons) the timer is not stopped and may run, thus accessing the freed token_metadata. Bandaid this by scheduling stop_gossiping right after the gossiper instances are created. This can be too early (before storage_service starts gossiping) or too late (after drain_on_shutdown stops it), but this function is re-entrable. Fixes #5844 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200221085226.16494-1-xemul@scylladb.com>	2020-02-23 10:26:50 +02:00
Pavel Emelyanov	72a6d38e6c	storage_service: Merge identical branches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200210185011.25244-1-xemul@scylladb.com>	2020-02-23 10:26:49 +02:00
Piotr Sarna	dae86849a2	Update seastar submodule * seastar 2b510220...cdda3051 (10): > core: discard unused variable / function > pollable_fd: use boost::intrusive_ptr rather than std::unique_ptr for lifecycle management > build: check for pthread_setname_np() > build: link against Threads::Threads > future: Avoid recursion in do_for_each > future: Expand description of parallel_for_each > merge: Add content length limit to httpd > tests/scheduling_group_test: verify current scheduling group is inherited as expected > net: return future<> instead of subscription<> > cmake: be more verbose when looking for libraries	2020-02-23 10:26:49 +02:00
guy9	a7586c6f7d	added training section to readme file	2020-02-21 11:36:18 +01:00
Nadav Har'El	e8cbbba653	alternator: partial implementation of ReturnValues parameter Before this patch, we only supported the ReturnValues=NONE setting of the PutItem, UpdateItem and DeleteItem operations. This patch also adds full support for the ReturnValues=ALL_OLD option in all three operation. This option directs Alternator to return the full old (i.e., pre-modification) contents of the item. We implement this as a RMW (read-modify-write) operation just as we do other RMW operations - i.e., by default we use LWT, to ensure that we really return the value of the item directly before the modification, the same value that would have been used in a conditional expression if there was one. NOTE: This implementation means one cannot use ReturnValues=ALL_OLD in forbid_rmw write isolation mode. One may theorize that if we only need the read-before-write for ReturnValues and not for a conditional expression, it should have been enough to use a separate read (as we do in unsafe_rmw isolation mode) before the write. But we don't have this "optimization" yet and I'm not sure it's a valid optimization at all - see discussion in a new issue #5851. This patch completes the ReturnValues support for the PutItem and DeleteItem operations. However, the third operation, UpdateItem, supports three more ReturnValues modes: UPDATED_OLD, ALL_NEW and UPDATED_NEW. We do not yet support those in this patch. If a user tries to use one of these three modes, an informative error message will be returned. The three tests for these three unimplemented settings continue to xfail, but the rest of the tests in test_returnvalues.py (except one test of nested attribute paths) now pass so their xfail flag is dropped. Refs #5053 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200219135658.7158-1-nyh@scylladb.com>	2020-02-21 08:32:47 +01:00
Tomasz Grabiec	d0b6be0820	Merge "Don't return stale data by properly invalidating row cache after cleanup" from Raphael Row cache needs to be invalidated whenever data in sstables changes. Cleanup removes data from sstables which doesn't belong to the node anymore, which means cache must be invalidated on cleanup. Currently, stale data can be returned when a node re-owns ranges which data are still stored in the node's row cache, because cleanup didn't invalidate the cache." Fixes #4446. tests: - unit tests (dev mode) - dtests: update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test cleanup_test.py	2020-02-20 18:20:56 +01:00
Pavel Solodovnikov	8efb02146f	cql3: const cleanups and API de-pointerization * Pass raw::select_statement::parameters as lw_shared_ptr * Some more const cleanups here and there * lists,maps,sets::equals now accept const-ref to _type_impl instead of shared_ptr Remove unused `get_column_for_condition` from modification_statement.hh * More methods now accept const-refs instead of shared_ptr Every call site where a shared_ptr was required as an argument has been inspected to be sure that no dangling references are possible. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200220153204.279940-1-pa.solodovnikov@scylladb.com>	2020-02-20 18:14:49 +02:00
Gleb Natapov	df2f67626b	commitlog: fix size of a write used to zero a segment Due to a bug the entire segment is written in one huge write of 32Mb. The idea was to split it to writes of 128K, so fix it. Fixes #5857 Message-Id: <20200220102939.30769-1-gleb@scylladb.com>	2020-02-20 17:22:21 +02:00
Gleb Natapov	6a78cc9e31	commitlog: use commitlog IO scheduling class for segment zeroing There may be other commitlog writes waiting for zeroing to complete, so not using proper scheduling class causes priority inversion. Fixes #5858. Message-Id: <20200220102939.30769-2-gleb@scylladb.com>	2020-02-20 17:15:13 +02:00
Raphael S. Carvalho	f93912f344	Revert "Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations"" With #4446 fixed, this commit can be reverted. This reverts commit `454e7e0109`.	2020-02-20 10:55:50 -03:00
Raphael S. Carvalho	fb81f2aa7c	table: Fix stale data being returned due to lack of cache invalidation Row cache needs to be invalidated whenever data in sstables changes. Cleanup removes data from sstables which doesn't belong to the node anymore, which means cache must be invalidated on cleanup. Currently, stale data can be returned when a node re-owns ranges which data are still stored in the node's row cache, because cleanup didn't invalidate the cache. To prevent data that belongs to the node from being purged from the row cache, cleanup will only invalidate the cache with a set of token ranges that will not overlap with any of ranges owned by the node. update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test now passes. Fixes #4446. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-20 10:55:50 -03:00
Raphael S. Carvalho	e81076b01c	compaction: Implement ranges for cache invalidation on behalf of cleanup This procedure will calculate ranges for cache invalidation by subtracting all owned ranges from the sstables' partition ranges. That's done so as to reduce the size of invalidated ranges. Refs #4446. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-20 10:55:49 -03:00
Raphael S. Carvalho	56f66cff9f	dht: Extract to_partition_ranges() from streaming to allow reuse Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-20 10:53:01 -03:00
Piotr Sarna	cbe6f260ef	alternator: add guarding stack height for JSON parsing In order to avoid stack overflow issues represented by the attached test case, rapidjson's parser now has a limit of nested level. Previous iterations of this patch used iterative parsing provided by rapidjson, but that solution has two main flaws: 1. While parsing can be done iteratively, printing the document is based on a recursive algorithm, which makes the iteratively parsed JSON still prone to stack overflow on reads. Documents with depth 35k were already prone to that. 2. Even if reading the document would have been performed iteratively, its destruction is stack-based as well - the chain of C++ destructors is called. This error is sneaky, because it only shows with depths around 100k with my local configuration, but it's just as dangerous. Long story short, capping the depth of the object to an arguably large value (39) was introduced to prevent stack overflows. Real life objects are expected to rarely have depth of 10, so 39 sounds like a safe value both for the clients and for the stack. DynamoDB has a nesting limit of 32. Fixes #5842 Tests: alternator-test(local,remote) Message-Id: <b083bacf9df091cc97e4a9569aad415cf6560daa.1582194420.git.sarna@scylladb.com>	2020-02-20 13:05:58 +02:00
Piotr Dulikowski	82a2bdf39f	cdc: distinguish open and closed ranges for range delete This patch causes inclusive and exclusive range deletes to be distinguished in cdc log. Previously, operations `range_delete_start` and `range_delete_end` were used for both inclusive and exclusive bounds in range deletes. Now, old operations were renamed to `range_delete__inclusive`, and for exclusive deletes, new operations `range_delete__exclusive` are used. Tests: unit(dev)	2020-02-20 11:39:06 +01:00
Asias He	62774ff882	gossiper: Always use the new generation number User reported an issue that after a node restart, the restarted node is marked as DOWN by other nodes in the cluster while the node is up and running normally. Consier the following: - n1, n2, n3 in the cluster - n3 shutdown itself - n3 send shutdown verb to n1 and n2 - n1 and n2 set n3 in SHUTDOWN status and force the heartbeat version to INT_MAX - n3 restarts - n3 sends gossip shadow rounds to n1 and n2, in storage_service::prepare_to_join, - n3 receives response from n1, in gossiper::handle_ack_msg, since _enabled = false and _in_shadow_round == false, n3 will apply the application state in fiber1, filber 1 finishes faster filber 2, it sets _in_shadow_round = false - n3 receives response from n2, in gossiper::handle_ack_msg, since _enabled = false and _in_shadow_round == false, n3 will apply the application state in fiber2, filber 2 yields - n3 finishes the shadow round and continues - n3 resets gossip endpoint_state_map with gossiper.reset_endpoint_state_map() - n3 resumes fiber 2, apply application state about n3 into endpoint_state_map, at this point endpoint_state_map contains information including n3 itself from n2. - n3 calls gossiper.start_gossiping(generation_number, app_states, ...) with new generation number generated correctly in storage_service::prepare_to_join, but in maybe_initialize_local_state(generation_nbr), it will not set new generation and heartbeat if the endpoint_state_map contains itself - n3 continues with the old generation and heartbeat learned in fiber 2 - n3 continues the gossip loop, in gossiper::run, hbs.update_heart_beat() the heartbeat is set to the number starting from 0. - n1 and n2 will not get update from n3 because they use the same generation number but n1 and n2 has larger heartbeat version - n1 and n2 will mark n3 as down even if n3 is alive. To fix, always use the the new generation number. Fixes: #5800 Backports: 3.0 3.1 3.2	2020-02-20 11:20:20 +01:00
Dejan Mircevski	8393ee2e54	cql3: Permit views sync when a table is modified Previously we required MODIFY permissions on all materialized views in order to modify a table. This is wrong, because the views should be synced to the table unconditionally. For the same reason, users shouldn't be granted MODIFY on views, to prevent them manually changing (and breaking) a view. This patch removes an explicit permissions check in modification_statement introduced by `65535b3`. It also tests that a user can indeed modify a table they are allowed to modify, regardless of lacking permissions on the table's views and indices. Fixes #5205. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-02-20 10:43:41 +01:00
Avi Kivity	4cc7f7e2af	Merge "Log CQL queries under "trace" level" from Kostja " This series ensures the server more often than not initializes raw_cql_statement, a variable responsible for holding the original CQL query, and adds logging events to all places executing CQL, and logs CQL text in them. A prepared statement object is the third incarnation of parser output in Scylla: - first, we create a parsed_statement descendent. This has ~20 call sites inside Cql.g - then, we create a cql_statement descendent, at ~another 20 call sites - finally, in ~5 call sites we create a prepared statement object, wrapping cql_statement. Sometimes we use cql_statement object without a prepared statement object (e.g. BATCHes). Ideally we'd want to capture the CQL text right in the parser, but due to complicated transformations above that would require patching dozens of call sites. This series moves raw_cql_statement from class prepared_statement to its nested object, cql_statement, batches, and initializes this variable in all major call sites. View prepared statements and some internal DDL statements still skip setting it. " * 'query_processor_trace_cql_v2' of https://github.com/kostja/scylla: query_processor: add CQL logging to all major execute call sites. query_procesor: move raw_cql_statement to cql_statement query_processor: set raw_cql_statement consistently	2020-02-20 11:07:52 +02:00
Nadav Har'El	7d545078ca	docs/alternator: remove incorrect comment on BatchWriteItem In the state of Alternator in docs/alternator/alternator.md, we said that BatchWriteItem doesn't check for duplicate entries. That is not true - we do - and we even have tests (test_batch_write_duplicate*) to verify that. So drop that comment. Refs #5698. (there is still a small bug in the duplicate checking, so still leaving that issue open). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200219164107.14716-1-nyh@scylladb.com>	2020-02-20 08:11:31 +01:00
Nadav Har'El	b8aed18a24	alternator: unzero "scylla_alternator_total_operations" metric In commit `388b492040`, which was only supposed to move around code, we accidentally lost the line which does _executor.local()._stats.total_operations++; So after this commit this counter was always zero... This patch returns the line incrementing this counter. Arguably, this counter is not very important - a user can also calculate this number by summing up all the counters in the scylla_alternator_operation array (these are counters for individual types of operations). Nevertheless, as long as we do export a "scylla_alternator_total_operations" metric, we need to correctly calculate it and can't leave it zero :-) Fixes #5836 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200219162820.14205-1-nyh@scylladb.com>	2020-02-20 08:11:15 +01:00
Raphael S. Carvalho	db4c3230f7	compaction: Add ranges for cache invalidation to compaction_completion_desc It will store the ranges to be invalidated in row cache on compaction completion. Intended to be used by cleanup compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-19 19:30:35 -03:00
Raphael S. Carvalho	51532b84f8	compaction: Make it possible for a compaction type to customize compaction_completion_desc compaction_completion_desc will eventually store more information that can be customized by the compaction type. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-19 19:30:35 -03:00
Raphael S. Carvalho	fa16845353	database: Fix on_compaction_completion doc Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-19 19:30:34 -03:00
Raphael S. Carvalho	65b4fc8bcd	sstables/compaction: Introduce compaction_completion_desc This descriptor contain all information needed for table to be properly updated on compaction completion. A new member will be added to it soon, which will store ranges to be invalidated in row cache on behalf of cleanup compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-02-19 19:29:32 -03:00
Piotr Sarna	4e95b67501	Merge 'cql3: do_execute_base_query: fix null deref ... ... when clustering key is unavailable' from Benny This series fixes null pointer dereference seen in #5794 `efd7efe` cql3: generate_base_key_from_index_pk; support optional index_ck `7af1f9e` cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable `7fe1a9e` cql3: do_execute_base_query: fixup indentation Fixes #5794 Branches: 3.3 Test: unit(dev) secondary_indexes_test:TestSecondaryIndexes.test_truncate_base(debug) * bhalevy/fix-5794-generate_base_key_from_index_pk: cql3: do_execute_base_query: fixup indentation cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable cql3: generate_base_key_from_index_pk; support optional index_ck	2020-02-19 13:30:30 +01:00
Tomasz Grabiec	884d5e2bcb	Merge "Fix use-after-frees in migration_manager and feature_service" from Pavel There has been recently discussed several problems when stopping migration manager and features. The first issue is with migration manager's schema pull sleeping and potentially using freed migration manager instances. Two others are with freeing database and migration manager before features they wait for are enabled.	2020-02-19 13:02:35 +01:00
Piotr Sarna	3315220aea	alternator: fix server when no authorization header is found A typo caused the code to check for wrong header and assume that Authorization header exists, even if it was not the case. The fix comes with a regression test. Message-Id: <58070abddae6359212aa399688e3e2704d52f419.1582108625.git.sarna@scylladb.com>	2020-02-19 13:39:50 +02:00
Benny Halevy	7fe1a9ec4a	cql3: do_execute_base_query: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-02-19 13:31:18 +02:00
Benny Halevy	7af1f9e26a	cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable 1. Only call base_ck = generate_base_key_from_index_pk<... if the base schema has a clustering key. 2. Only call command->slice.set_range(*_schema, base_pk, ... if the base schema has a clustering key, otherwise just create an open ended range. Proposed-by: Piotr Sarna <sarna@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-02-19 13:30:37 +02:00
Piotr Sarna	5f0d77b9a4	Merge 'mv: drop materialized views before its table' from Eliran When dropping a table, the table and its views are dropped in parallel, this is not a problem as for itself but we have mechanism to snapshot a deleted table before the actual delete. When a secondary index is removed, in the snapshot process it looks for it's schema for creating the schema part of the snapshot but if the main table is already gone it will not find it. This commit serializes views and main table removals and removes the views prior to the tables. See discussion on #5713 Tests: Unit tests (dev) dtest - A test that failed on "can't find schema" error Fixes #5614 * eliran/serialize_table_views_deletion: Materialized Views: serialize tables and views creation Materialized Views: drop materialized views before tables	2020-02-19 12:20:20 +01:00
Pavel Emelyanov	8435e93549	db: Move unbounded_range_tombstones listening from storage_service Now the database keeps reference on feature service, so we can listen on the feature in it directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-19 14:08:24 +03:00
Pavel Emelyanov	7aa7e4f550	migration_manager: Abort and wait cluster upgrade waiters The maybe_schedule_schema_pull waits for schema_tables_v3 to become available. This is unsafe in case migration manager goes away before the feature is enabled. Fix this by subscribing on feature with feature::listener and waiting for condition variable in maybe_schedule_schema_pull. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-19 14:08:24 +03:00
Nadav Har'El	405115fa5f	alternator: cleanup of get_string_attribute() function The get_string_attribute() function used attribute_value->GetString() to return an std::string. But this function does not actually return a std::string - it returns a char*, which gets implicitly converted to an std::string by looking for the first null character. This lookup is unnecessary, because rjson already knows the length of the string, and we can use it. This patch is just a cleanup and a very small performance improvement - I do not expect it fixes any bugs or changes anything functional, because JSON strings anyway cannot contain verbatim nulls. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200219101159.26717-1-nyh@scylladb.com>	2020-02-19 11:59:54 +01:00
Benny Halevy	efd7efe41e	cql3: generate_base_key_from_index_pk; support optional index_ck When called from indexed_table_select_statement::do_execute_base_query, old_paging_state->get_clustering_key() may return un-engaged optional<clustering_key>. Dereferencing it unconditionally crashes scylla as seen in https://github.com/scylladb/scylla/issues/5794 Fixes #5794 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-02-19 12:13:08 +02:00
Pavel Emelyanov	08363e5034	migration_manager: Abort and wait delayed schema pulls The sleep is interrupted with the abort source, the "wait" part is done with the existing _background_tasks gate. Also we need to make sure the gate stays alive till the end of the function, so make use of the async_sharded_service (migration manager is already such). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-19 11:55:27 +03:00
Eliran Sinvani	95724e1a66	Materialized Views: serialize tables and views creation This change serializes tables and views creation. The changes purpose is to avoid future possible races due to a view searching for its base table information while the later haven't been created yet.	2020-02-19 10:51:49 +02:00
Eliran Sinvani	923a46030b	Materialized Views: drop materialized views before tables When dropping a table, the table and its views are dropped in parallel, this is not a problem as for itself but we have mechanism to snapshot a deleted table before the actual delete. When a secondary index is removed, in the snapshot process it looks for its schema for creating the schema part of the snapshot but if the main table is already gone it will not find it. This commit serializes views and main table removals and removes the views prior to the tables. See discussion on https://github.com/scylladb/scylla/pull/5713 Tests: Unit tests (dev) dtest - A test that failed on "can't find schema" error Fixes #5614	2020-02-19 10:48:11 +02:00
Pavel Solodovnikov	a46f235092	cql3: prefer passing schema as const ref instead of shared_ptr De-pointerize cql3 code APIs further: change some call sites to pass `schema` as const-ref instead of `shared_ptr`. Affected functions known to be expecting always non-null pointer to schema and don't store or pass the pointer somewhere else, assuming it's safe to give them just a reference. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200218142338.69824-1-pa.solodovnikov@scylladb.com>	2020-02-18 20:13:10 +02:00
Piotr Dulikowski	4343471954	hh: handle counter update hints correctly This patch fixes a bug that appears because of an incorrect interaction between counters and hinted handoff. When a counter is updated on the leader, it sends mutations to other replicas that contain all counter shards from the leader. If consistency level is achieved but some replicas are unavailable, a hint with mutation containing counter shards is stored. When a hint's destination node is no longer its replica, it is attempted to be sent to all its current replicas. Previously, if the cluster did not have the feature HINTED_HANDOFF_SEPARATE_CONNECTION enabled, storage_proxy::mutate function would be used for the purpose of sending the hint. It was incorrect because that function treats mutations for counter tables as mutations containing only a delta (by how much to increase/decrease the counter). These two types of mutations have different serialization format, so in this case a "shards" mutation is reinterpreted as "delta" mutation, which can cause data corruption to occur. This patch fixes the case when HINTED_HANDOFF_SEPARATE_CONNECTION is disabled, and uses storage_proxy::mutate_internal, which treats "shards" mutation as regular mutations - which is the correct behavior. Refs #5833. Tests: unit(dev)	2020-02-18 20:13:10 +02:00
Avi Kivity	454e7e0109	Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations" This reverts commit `5e9925b9f0`. It causes data resurrection in simple_decommission_node_2_test. Fixes #5838.	2020-02-18 20:13:10 +02:00
Calle Wilund	d7a9fc3611	db::config: Adjust truncation timeout to match value in yaml example Refs #817 Truncation is potentially long. It has its own timeout in storage proxy/rpc. This value should probably also be higher than default timeout. Message-Id: <20200218135926.26522-1-calle@scylladb.com>	2020-02-18 20:13:10 +02:00
Amnon Heiman	30a7587963	test/boost/database_test: adopt new clear_snapshot signature The clear_snapshot method signature was modified and accept a table name parameter. This patch adds an empty table name to the clear_snapshot test so it would compile and pass. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-02-18 16:50:58 +02:00
Amnon Heiman	6b020e67ce	api/storage_service: Support specifying a table when deleting a snapshot This patch adds an optional parameter to DELETE /storage_service/snapshots After this patch the following will be supported: If a keyspace called keyspace1 and a table called standard1 exists. curl -X POST 'http://localhost:10000/storage_service/snapshots?tag=am1&kn=keyspace1' curl -X DELETE --header 'Accept: application/json' 'http://localhost:10000/storage_service/snapshots?tag=am1&kn=keyspace1&cf=standard1' Fixes #5658 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-02-18 16:34:10 +02:00
Amnon Heiman	c3260bad25	storage_service: Add optional table name to clear snapshot There are cases when it is useful to delete specific table from a snapshot. An example is when a snapshot is used for backup. Backup can take a long period of time, during that time, each of the tables can be deleted once it was backup without waiting for the entire backup process to completed. This patch adds such an option to the database and to the storage_service wrapping method that calls it. If a table is specified a filter function is created that filter only the column family with that given name. This is similar to the filtering at the keyspace level. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-02-18 16:34:10 +02:00
Nadav Har'El	e50e8a8432	alternator-test: improve ReturnValues tests This patch adds additional tests for the ReturnValues feature to make the test even more comprehensive. As this feature is not yet implemented in Alternator (see issue #5053), all tests XFAIL on Alternator - except two tests for the trivial "NONE" mode which is already supported. As usual all tests pass on DynamoDB. This patch also splits the tests for the ReturnValues parameter in the UpdateItem operation into multiple tests, each testing one of the different modes which DynamoDB supports - NONE, ALL_OLD, UPDATED_OLD, ALL_NEW and UPDATED_NEW. The separate tests will be useful if we implement this feature incrementally - so the separate modes can be tested separately. Refs #5053. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200218085618.5584-1-nyh@scylladb.com>	2020-02-18 16:16:20 +02:00
Alejo Sanchez	45a6cc5d53	cql3: single metric for range scan and full scan Combining both range and full table scans in a single metric as "partition range scans are used to implement full scans in scylla deployments." Requested by @bdenes and @avi Refs: #5209 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200211101221.690031-2-alejo.sanchez@scylladb.com>	2020-02-18 16:16:20 +02:00
Nadav Har'El	c8348bccc9	docs: new document about protocols and ports in Scylla This patch adds a new document, docs/protocols.md, about all the different protocols which Scylla supports - and the different ports which they use. This includes Scylla's internal protocol, user-facing protocols (CQL, Thrift, DynamoDB, Redis, JMX) and things inbetween (REST API, Prometheus). I wrote this document after being frustrated that when I see a port number (e.g., "7000") or a port option name (e.g., "storage_port") it's hard to figure out what they actually are - or why they are given such strange names. The intention is that this file can easily be searched for option names, for more familiar names (e.g., "CQL"), and a reader can get the whole story - including some pointers to relevant part of the code (this part of the document can be improved further - in this version this only exists for the internal protocol). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200217172049.25510-1-nyh@scylladb.com>	2020-02-18 16:16:20 +02:00
Avi Kivity	fe71ed5f82	Update seastar submodule * seastar c7c249f67d...2b51022073 (8): > dns_test: Test with seastar.io instead of www.google.com > sharded: fix move constructor for peering_sharded_service Fixes #5814. > tests: Delete Seastar.dist > reactor: distinguish structs from classes when befriending > util/tuple_utils.hh: avoid redundant move > io_request: do not include fmt/format.h > reactor: cleanup write_some leftover > posix: change the signature of accept/try_accept	2020-02-18 16:16:19 +02:00
Avi Kivity	6c7aa18238	Merge "Introduce schema::get_partitioner" from Piotr " Introduce schema::get_partitioner and use it instead of dht::global_partitioner. Fixes #5493 Tests: unit(dev, release, debug) " * 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits) cdc: stop using partitioners partitioner_test: stop calling set_global_partitioner storage_service: stop calling global_partitioner() mutation_writer_test: stop calling global_partitioner() schema: reduce number of global_partitioner() calls test_services: stop calling global_partitioner() sstable_utils: stop calling global_partitioner() sstable_resharding_test: stop depending on global partitioner sstable_mutation_test: stop calling global_partitioner() sstable_data_file_test: stop calling global_partitioner() random_schema: stop taking partitioner in constructor mutation_reader_test: stop calling global_partitioner() multishard_mutation_query_test: stop calling global_partitioner() row_level repair: stop calling global_partitioner() distribute_reader_and_consume_on_shards: don't take partitioner thrift: reduce global_partitioner() calls binary_search: stop calling global_partitioner() index_entry: stop calling global_partitioner() mc writer: stop calling global_partitioner() sstable: stop calling global_partitioner() ...	2020-02-17 18:12:53 +02:00
Avi Kivity	06c16108df	Merge "cql3: minor cleanups (de-pointerize APIs)" from Pavel " This change set is comprised of several unrelated patches regarding some cleanups in cql3 layer code. Most of the changes are aimed at eliminating superfluous `shared_ptr` usages. In places where it can be safely assumed that objects passed to the function are considered non-null and constant, these places were adjusted to use passing as const ref instead. Other changes incude eliminating unused arguments at some functions and replacing usages of `shared_ptr<service::pager::paging_state>` to use `lw_shared_ptr` instead, since `pager::paging_state` is final. Tests: unit(dev, debug) " * 'feature/cql_cleanups_4' of https://github.com/ManManson/scylla: cql3: minor sweeps through the cql layer code to reduce shared_ptrs count cql3: change some function signatures to accept const references cql3: change signatures of several functions to return crefs instead of pointers cql3: remove unused argument at functions::castas_functions::get paging_state: switch from shared_ptr to lw_shared_ptr	2020-02-17 17:50:30 +02:00
Piotr Dulikowski	01084a79b8	hh: send orphaned hints on HINT_MUTATION verb When replaying a hint with a destination node that is no longer in the cluster, it will be sent with cl=ALL to all its new replicas. Before this patch, the MUTATION verb was used, which causes such hints to be handled on the same connection and with the same priority as regular writes. This can cause problems when a large number of hints is orphaned and they are scheduled to be sent at once. Such situation may happen when replacing a dead node - all nodes that accumulated hints for the dead node will now send them with cl=ALL to their new replicas. This patch changes the verb used to send such hints to HINT_MUTATION. This verb is handled on a separate connection and with streaming scheduling group, which gives them similar priority to non-orphaned hints. Refs: #4712 Tests: unit(dev)	2020-02-17 14:45:22 +01:00
Tomasz Grabiec	76d1dd7ec6	Merge "nodetool scrub: implement validation and the skip-corrupted flag " from Botond Nodetool scrub rewrites all sstables, validating their data. If corrupt data is found the scrub is aborted. If the skip-corrupted flag is set, corrupt data is instead logged (just the keys) and skipped. The scrubbing algorithm itself is fairly simple, especially that we already have a mutation stream validator that we can use to validate the data. However currently scrub is piggy-backed on top of cleanup compaction. To implement this flag, we have to make scrub a separate compaction type and propagate down the flag. This required some massaging of the code: * Add support for more than two (cleanup or not) compaction types. * Allow passing custom options for each compaction type. * Allow stopping a compaction without the manager retrying it later. Additionally the validator itself needed some changes to allow different ways to handle errors, as needed by the scrub. Fixes: #5487 * https://github.com/denesb/nodetool-scrub-skip-corrupted/v7: table: cleanup_sstables(): only short-circuit on actual cleanup compaction: compaction_type: add Upgrade compaction: introduce compaction_options compaction: compaction_descriptor: use compaction options instead of cleanup flag compaction_manager: collect all cleanup related logic in perform_cleanup() sstables: compaction_stop_exception: add retry flag mutation_fragment_stream_validator: split into low-level and high-level API compaction: introduce scrub_compaction compaction_manager: scrub: don't piggy-back on upgrade_sstables() test: sstable_datafile_test: add scrub unit test	2020-02-17 15:28:07 +02:00
Piotr Jastrzebski	f0f6e220ea	cdc: stop using partitioners CDC can get all it needs from a config and does not need partitioner. For base table specific operations CDC is using partitioner from that table (obtained with schema::get_partitioner). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	c0873f9b10	partitioner_test: stop calling set_global_partitioner All the places that use partitioner have been switched to not use global partitioner any more and we can stop setting it in this test. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	499e330ff9	storage_service: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	81cfc63ba6	mutation_writer_test: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	406f42e012	schema: reduce number of global_partitioner() calls Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	8a9dc8b394	test_services: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	510245f3c3	sstable_utils: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	65f8fc5a06	sstable_resharding_test: stop depending on global partitioner Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	a65f3d1f7b	sstable_mutation_test: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	aae6240273	sstable_data_file_test: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	a18c791f6f	random_schema: stop taking partitioner in constructor random_schema already has a _schema field which in turn has a get_partitioner() function. Store partitioner in random_schema is redundant. At the moment all uses of random_schema are based on default partitioner so it is not necessary to set it explicitly. If in the future we need random_schema to work with other partitioners we will add the constructor back and fix the creation of _schema to contain it. It's not needed now though. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	aeb9ea87df	mutation_reader_test: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	4df60c7998	multishard_mutation_query_test: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	ef9acd9ee5	row_level repair: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	9494da2102	distribute_reader_and_consume_on_shards: don't take partitioner This function already takes schema so it can get partitioner using schema::get_partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	7c6f415647	thrift: reduce global_partitioner() calls Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	56e3cb8c3a	binary_search: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	1db437ee91	index_entry: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	1f866d7001	mc writer: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	6fe0dcbac4	sstable: stop calling global_partitioner() parse functions now take const schema& which allows them to reach a partitioner. It's safe to take schema by const& because the only caller takes the schema from an sstable object. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	0677bafd16	multishard_mutation_query: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	76d154dbac	view: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	6e424a3645	select_statement: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	2d7532f87f	dht: add dht::get_token and replace all calls to dht::global_partitioner().get_token dht::get_token is better because it takes schema and uses it to obtain partitioner instead of using a global partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	ca4a89d239	dht: add dht::decorate_key and replace all dht::global_partitioner().decorate_key with dht::decorate_key It is an improvement because dht::decorate_key takes schema and uses it to obtain partitioner instead of using global partitioner as it was before. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:06 +01:00
Piotr Jastrzebski	abd76e566f	dht::shard_of: stop calling global_partitioner() Take const schema& as a parameter of shard_of and use it to obtain partitioner instead of calling global_partitioner(). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:23:16 +01:00
Piotr Jastrzebski	5234350df2	split_range_to_single_shard: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:15 +01:00
Piotr Jastrzebski	24b721c21b	ring_position_exponential_sharder: stop calling global_partitioner() ring_position_exponential_sharder calls global_partitioner in one constructor. Luckily the constructor is never used so we can remove that constructor. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:15 +01:00
Piotr Jastrzebski	db19a76b1f	selective_token_range_sharder: stop calling global_partitioner() This requires a change in a repair that uses selective_token_range_sharder. Repair performs operation on a set of tables. We will have to make sure that all of that tables use the same partitioner. This is achieved by adding a check to a repair_info constructor. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:15 +01:00
Piotr Jastrzebski	75785ef13e	i_partitioner: add operator<< Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:15 +01:00
Piotr Jastrzebski	065885300d	i_partitioner: add == and != operators Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:15 +01:00
Piotr Jastrzebski	57e4b7f215	ring_position_range_sharder: stop calling global_partitioner Remove ring_position_range_sharder(nonwrapping_range<ring_position>) which calls another constructor with partitioner obtained with dht::global_partitioner(). Fix all the places the removed constructor was used and obtain partitioner from schema instead. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:15 +01:00
Piotr Jastrzebski	dd1120454b	dht: move sharders to a separate header i_partitioner.hh is widely included while sharders are used only in 6 places so there's no need to include them in the whole codebase. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:19:02 +01:00
Piotr Jastrzebski	a5b6374398	dht: remove unused ring_position_exponential_vector_sharder The next patch is moving sharders to a separate header. ring_position_exponential_vector_sharder is not used anywhere so instead of just silently removing it with the move, this commit is separated to make it clear the class is removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:04:41 +01:00
Piotr Jastrzebski	9b95153136	schema: add get_partitioner() The plan is to remove dht::global_partitioner() and use schema::get_partitioner() instead. This will allow a usage of per schema/table partitioner instead of a single global partitioner everywhere. Initially schema::get_partitioner will call dht::global_partitioner. After all the calls to dht::global_partitioner are switched to schema::get_partitioner, the ability to set per schema partitioner will be implemented. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:04:41 +01:00
Takuya ASADA	9a84164c95	dist: drop old distribution code Since we dropped support of Ubuntu 14.04 and Debian 8, we can remove the code for these distributions.	2020-02-17 10:18:35 +02:00
Avi Kivity	6728b96df7	clustering_interval_set: split to own header file clustering_interval_set is a rarely used class, but one that requires boost/icl, which is quite heavyweight. To speed up compilation, move it to its own header and sprinkle #includes where needed. Tests: unit (dev) Message-Id: <20200214190507.1137532-1-avi@scylladb.com>	2020-02-16 17:40:47 +02:00
Nadav Har'El	51f3e7eaff	merge: token_metadata: pimplify Merged patch series from Avi Kivity: token_metadata is a heavyweight class with heavyweight includes (boost/icl) it is a good candidate for the pimpl pattern, which this series implements. Tests: unit (dev) https://github.com/avikivity/scylla token_metadata-pimplification/v1 Avi Kivity (6): locator: token_metadata: use non-deduced return type for ring_range() locator: token_metadata: pimplify locator: token_metadata: make token_metadata_impl::tokens_iterator a non-nested class locator: token_metadata: pimplify tokens_iterator locator: token_metadata: move implementation classes to .cc locator: token_metadata: remove unused include "query-request.hh" locator/token_metadata.hh \| 783 +--------------- locator/token_metadata.cc \| 1338 ++++++++++++++++++++++++++- test/boost/sstable_datafile_test.cc \| 1 + 3 files changed, 1332 insertions(+), 790 deletions(-) Message-Id: <20200214184954.1130194-1-avi@scylladb.com>	2020-02-16 17:15:26 +02:00
Piotr Sarna	70c9889ef7	storage_proxy: remove dead metrics code This patch removes an implementation of register_split_metrics_for, which is not used anywhere in the codebase. Message-Id: <e83f3e9d109113fe0553919032f005d4ab3a3023.1581851904.git.sarna@scylladb.com>	2020-02-16 17:00:45 +02:00
Nadav Har'El	e18a302c54	merge: Implement stopping alternator server Merged patch series from Piotr Sarna: This miniseries implements graceful shutdown for alternator by introducing two mechanisms: - refusing to accept new requests during shutdown by stopping the HTTP/HTTPS server(s) - guarding pending requests with a gate, so that when alternator server is stopped, no in-flight alternator requests are being processed Fixes #5781 Tests: manual(stopping Scylla in the middle of alternator-test multiple times, used to crash every time with local_is_initialized() assertion) Piotr Sarna (3): alternator: implement stopping alternator server alternator: guard pending alternator requests with a gate alternator: guard alternator-specific handlers with a gate alternator/server.cc \| 64 +++++++++++++++++++++++++++++++++++--------- alternator/server.hh \| 4 +++ main.cc \| 11 ++++++-- 3 files changed, 64 insertions(+), 15 deletions(-)	2020-02-16 16:35:14 +02:00
Pavel Solodovnikov	abb3a7e218	cql3: minor sweeps through the cql layer code to reduce shared_ptrs count Convert some more helper functions to accept const reference to column_specification and column_identifier instead of shared_ptr. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-02-16 17:24:26 +03:00
Pavel Solodovnikov	5b6e2d7178	cql3: change some function signatures to accept const references This patch continues the effort of reducing shared_ptr's count in the different APIs throughout the cql3 code tree. These functions now pass cref to column_specification instead of shared_ptr: * multiple variants of `validate_assignable_to` * sets::value_spec_of * lists::value_spec_of * lists::index_spec_of * lists::uuid_index_spec_of * tuples::component_spec_of * user_types::field_spec_of These functions don't pass the shared_ptr around down the call hierarchy, also obviously assuming that the column_specification passed is always non-null. So it's safe to assume that they don't borrow the ownership of the pointer or knowingly prolongate lifetime of the object pointed by. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-02-16 17:24:14 +03:00
Pavel Solodovnikov	49bf936403	cql3: change signatures of several functions to return crefs instead of pointers The following functions now accept const reference to column_specification instead of shared_ptr: * lists::index_spec_of * lists::value_spec_of * lists::uuid_index_spec_of * sets::value_spec_of Changed maps::value_spec_of and maps::key_spec_of signatures to accept const ref instead of non-const ref to column_specification. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-02-16 17:23:56 +03:00
Pavel Solodovnikov	7c05100c87	cql3: remove unused argument at functions::castas_functions::get Remove unused `schema_ptr` argument at `functions::castas_functions::get` function. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-02-16 17:23:46 +03:00
Pavel Solodovnikov	d64fd52ae5	paging_state: switch from shared_ptr to lw_shared_ptr Change the way `service::pager::paging_state` is passed around from `shared_ptr` to `lw_shared_ptr`. It's safe since `paging_state` is final. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-02-16 17:23:36 +03:00
Piotr Sarna	626ec730c4	storage_proxy: make register_metrics_for function reentrant Helper function for registering metrics for an endpoint, register_metrics_for(ep) depends on an external state to be updated. It checks if given metrics are added to a map, and if not, the metrics are registered, but the mentioned map is expected to be updated by the caller (e.g. get_ep_stat). This behaviour is error-prone, because calling this function twice will result in an exception, since registering metrics twice is not allowed. Refs #5697 Message-Id: <5a9ddccf52861749dbda4204b5d098cc77bc51eb.1581855769.git.sarna@scylladb.com>	2020-02-16 15:43:07 +02:00
Piotr Sarna	bd888a2695	alternator: guard alternator-specific handlers with a gate Alternator is able to serve more requests than its database operations, e.g. a health check and returning the list of its nodes. These operation, for safety, are no also guarded by the pending requests gate.	2020-02-16 14:15:29 +01:00
Piotr Sarna	acfed880cc	alternator: guard pending alternator requests with a gate In order to make sure that pending alternator requests are processed during shutdown, a gate for each shard is introduced. On shutdown, each gate will be closed and all in-progress operations will be waited upon. Fixes #5781	2020-02-16 13:48:45 +01:00
Piotr Sarna	c8ab9b3ae4	alternator: implement stopping alternator server Stopping Scylla with alternator enabled is not clean, because the server does not stop accepting requests on shutdown, which leads to use-after-free events. The first step towards a cleaner solution is to implement alternator_server::stop(), which stops the HTTP/HTTPS servers. Refs #5781	2020-02-16 13:34:21 +01:00
Nadav Har'El	70d914ad5b	alternator: update docker instructions in docs/alternator/getting-started.md The instructions in docs/alternator/getting-started.md on how to run Alternator with docker are outdated and confusing, so this patch updates them. First, the instructions recommended the scylladb/scylla-nightly:alternator tag, but we only ever created this tag once, and never updated it. Since then, Alternator has been constantly improving, and we've caught up on a lot of features, and people who want to test or evaluate Alternator will most likely want to run the latest nightly build, with all the latest Alternator features. So we update the instructions to request the latest nightly build - and mention the need to explictly do "docker pull" (without this step, you can find yourself running an antique nightly build, which you downloaded months ago!). This instruction can be revisited once Alternator is GAed and not improving quickly and we can then recommend to run the latest stable Scylla - but I think we're not there yet. Second, in recent builds, Alternator requires that the LWT feature is enabled, and since LWT is still experimental, this means that one needs to add "--experimental 1" to the "docker run" command. Without it, the command line in getting-started.md will refuse to boot, complaining that Alternator was enabled but LWT wasn't. So this patch adds the "--experimental 1" in the relevant places in the text. Again, this instruction can and should be revisited once LWT goes out of experimental mode. Fixes #5813 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200216113601.9535-1-nyh@scylladb.com>	2020-02-16 12:42:37 +01:00
Nadav Har'El	b01b11c1f3	alternator: implement KeyConditionExpression This patch adds to Alternator's Query operation full support for the KeyConditionExpression parameter - a newer syntax for specifying which partition and which sort-key range are to be queried. The older syntax for the same thing, "KeyConditions", was already supported by Alternator. The patch also includes additional test cases for more corner cases discovered during the development. After this patch, all 47 test cases in test_key_condition_expression.py pass on Alternator (and, of course, also on DynamoDB). One interesting thing to note about this patch is that it does not include a new parser for the KeyConditionExpression syntax. It turns out that we need - to be fully compatible with DynamoDB - to use the already existing parser for ConditionExpression syntax, and then forbid certain things not allowed in KeyConditionExpression (you can see a lot of examples in code comments and in the tests included in this patch). Most importantly, allowing the full ConditionExpression syntax also means we allow completely useless parentheses on key conditions, e.g., '((p=:p) AND (c=:c))'. While the KeyConditionExpression documentation doesn't mention allowing these parentheses, DynamoDB does support them - and it turns out that boto3 uses them when you use its condition builders, as we do in one test case (test_query_key_condition_expression). Fixes #5037. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200213192509.32685-4-nyh@scylladb.com>	2020-02-16 11:22:30 +02:00
Nadav Har'El	15515b2cc1	alternator: more useful get_key_from_typed_value() utility function We had a get_key_from_typed_value() utility function to decode a JSON-encoded value with a known type (the JSON encoding is a map whose key is the type, the value always a string because all possible key types - string, bytes and number, are encoded as strings). However, the function was less useful than it could have been - it was missing one check for a malformed object (a check which only appeared in one of its callers), it unnecessarily received the column's expected type (all the callers passed it the given key column's type). The cleaned up function will be more useful for the following patch to support KeyConditionExpression, which wants to reuse it. While at it, this patch also uses rjson::to_string_view(it->value) instead of the less correct it->value.GetString() (the latter relies on null-termination, which is actually true for JSON strings, but there is no reason to rely on it). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200213192509.32685-3-nyh@scylladb.com>	2020-02-16 11:22:30 +02:00
Nadav Har'El	1fd44a0049	alternator: extract useful function to_string_view() conditions.cc contains a useful utility function for extracting (without copying) a string_view from a rjson::value which is known to contain a string. This function will be useful in more Alternator code, so let's extract it to rjson.hh, with the name rjson::to_string_view() Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200213192509.32685-2-nyh@scylladb.com>	2020-02-16 11:22:30 +02:00
Asias He	5e9925b9f0	streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations The table::flush_streaming_mutations is used in the days when streaming data goes to memtable. After switching to the new streaming, data goes to sstables directly in streaming, so the sstables generated in table::flush_streaming_mutations will be empty. It is unnecessary to invalidate the cache if no sstables are added. To avoid unnecessary cache invalidating which pokes hole in the cache, skip calling _cache.invalidate() if the sstables is empty. The steps are: - STREAM_MUTATION_DONE verb is sent when streaming is done with old or new streaming - table::flush_streaming_mutations is called in the verb handler - cache is invalidated for the streaming ranges In summary, this patch will avoid a lot of cache invalidation for streaming. Backports: 3.0 3.1 3.2 Fixes: #5769	2020-02-16 11:22:30 +02:00
Avi Kivity	82df5dfb76	Update seastar submodule * seastar 6d2ed8cdc...c7c249f67 (3): > reactor: fix issue with hrtimer completions being lost > Merge "refactor network and storage I/O handling in backend code" from Glauber > reactor: don't call set_heap_profiling_enable() if not needed	2020-02-16 11:22:30 +02:00
Piotr Sarna	84be1eb6f2	test,cdc: skip across-shard test when run with one shard Running cdc_test binary fails with a segmentation fault when run with --smp 1, because test_cdc_across_shards assumes shard count to be >=2. This patch skips the test case when run with a single shard and produces a log warning. Message-Id: <9b00537db9419d8b7c545ce0c3b05b8285351e7d.1581600854.git.sarna@scylladb.com>	2020-02-16 11:22:30 +02:00
Gleb Natapov	ed3e423922	lwt: add counter for a case where timeout is sent prematurely There is a case in current PAXOS implementation where timeout is returned because the code cannot guaranty whether the value is accepted or not in case of a contention. The counter will help to correlate this condition with failed requests. Message-Id: <20200211160653.30317-2-gleb@scylladb.com>	2020-02-16 11:22:30 +02:00
Gleb Natapov	7694f164c4	lwt: add more tracing to paxos stages Message-Id: <20200211160653.30317-1-gleb@scylladb.com>	2020-02-16 11:22:30 +02:00
Pavel Solodovnikov	bf95bd0916	cql3: more functions marked as const The following functions are now "const": * `term::collect_marker_specification` * `relation::to_term` * `multi_item_terminal::get_elements` * `raw_update::is_compatible_with` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200213142445.35312-1-pa.solodovnikov@scylladb.com>	2020-02-16 11:22:30 +02:00
Nadav Har'El	65d0a776c2	merge: alternator: Add keyspace per table This series implements keyspace-per-table approach for Alternator. The changes are as follows: - when a table is created, its keyspace is created first - after table deletion, its keyspace is deleted as well; works with views too, since these must be deleted before the base table is dropped - instead of SimpleStrategy, network topology is used Keyspaces are created with a prefix not legal from CQL - 'a#'. I validated that even though not reachable via CQL, keyspaces created with # character work well and produce correct directories, restarts work flawlessly too. Fixes #5611 Refs #5596 Tests: alternator(local, remote) Piotr Sarna (3): alternator: switch to keyspace-per-table approach alternator: move to NetworkTopologyStrategy alternator-test: add test for recreating a table	2020-02-16 11:22:30 +02:00
Piotr Sarna	e620181832	Merge 'cdc: TTLs on CDC log cells' from Juliusz Cells in CDC logs used to be created while completely neglecting TTLs (the TTLs from cdc = {...'ttl':600}). This patch adds TTLs to all cells; there are no row markers, so wee need not set TTL there. Fixes #5688 * jul-stas/5688-set-ttl-in-cdc-log-table: tests/cdc: added test for TTL on log table cells cdc: set TTLs on CDC log cells	2020-02-16 11:22:30 +02:00
Nadav Har'El	cb8315ace8	merge: alternator: Make write isolation config less terse Merged patch series from Piotr Sarna: This series addresses and fixes #5758 by providing less terse configuration for write isolation. Before the patch, suggested values for alternator write isolation policies was one of 'f', 'a', 'o', 'u', which are not really descriptive. The code actually checks only the first character from the tag value, but now the input is validated to allow only specific, expressive values: * 'a', 'always', 'always_use_lwt' - always use LWT * 'o', 'only_rmw_uses_lwt' - use LWT only for requests that require read-before-write * 'f', 'forbid', 'forbid_rmw' - forbid statements that need read-before- write. Using such statements (e.g. UpdateItem with ConditionExpression) will result in an error * 'u', 'unsafe', 'unsafe_rmw' - (unsafe) perform read-modify-write without any consistency guarantees Using other values will result in an error. This series comes with tests and docs updates. Fixes #5758 Tests: alternator-test(local,remote) Piotr Sarna (5): alternator: move rmw_operation to a header alternator: add validating write_isolation tag alternator-test: add test for write isolation tag alternator-test: mark write isolation tests scylla_only docs: update write isolation documentation alternator-test/test_condition_expression.py \| 10 +- alternator-test/test_tag.py \| 9 + alternator/executor.cc \| 163 +++++++------------ alternator/rmw_operation.hh \| 99 +++++++++++ docs/alternator/alternator.md \| 8 +- 5 files changed, 173 insertions(+), 116 deletions(-)	2020-02-16 11:22:30 +02:00
Pavel Solodovnikov	76a0652deb	types: fix serialization and validation of empty values Empty values (zero-sized string in serialized form) were not handled properly in serialize routines for floating types and uuids, which led to runtime exceptions and failing tests as described in https://github.com/scylladb/scylla/issues/5782. Also fix validation visitor to handle empty values properly. There already was the code in place that took into consideration zero-sized values. But it was trying to read some bytes regardless of that (e.g. for timeuuid values), even if there is none to read. Tests: unit(dev, debug) Fixes: #5782 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200213130021.31598-1-pa.solodovnikov@scylladb.com>	2020-02-16 11:22:30 +02:00
Pavel Emelyanov	b11cf6e950	cql3/query_processor.hh: Debloat from other headers This gives ~30% less (251 jobs -> 181 jobs) recompile when touching it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200212225828.3374-1-xemul@scylladb.com>	2020-02-16 11:22:30 +02:00
Alejo Sanchez	a5516767d5	tests: enforce SERIAL consistency on all prepared statements Add SERIAL consistency level query option to boost tests. This is required for LWT testing. Refs: #5777 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200212102921.27139-2-alejo.sanchez@scylladb.com>	2020-02-16 11:22:29 +02:00
Konstantin Osipov	7b7462b49f	test.py: fix a bug with an incorrect glob pattern On start, test.py cleans up testlog directory. The cleanup file search pattern was shell style, not python glob style, which led to .log files being left around between runs. Message-Id: <20200212204047.22398-9-kostja@scylladb.com>	2020-02-16 11:22:29 +02:00
Konstantin Osipov	70fcbd8e32	test.py: print test invocation failure to test log Capture test invocation failure in the test log. Remove dead code lingering from introduction of log output. Message-Id: <20200212204047.22398-6-kostja@scylladb.com>	2020-02-15 17:19:28 +02:00
Konstantin Osipov	851b2d652e	test.py: start run_test() by opening test log file Always open the log file first, this will be necessary to append output to it in case the test timed out or didn't start. Message-Id: <20200212204047.22398-5-kostja@scylladb.com>	2020-02-15 17:19:28 +02:00
Konstantin Osipov	22a050250e	test.py: if a test fails, print it on its own line, even in compact mode To be able to easily see what tests have failed as they run, print failed tests on their own line even if --verbose switch is off. Message-Id: <20200212204047.22398-4-kostja@scylladb.com>	2020-02-15 17:19:28 +02:00
Konstantin Osipov	8eb127279e	test.py: convert cookie to TabularConsoleOutput class test.py used a functional programming cookie pattern to carry tabular console output state, convert this cookie to an object. In order to make console output more pretty we'll need to add more state to it, and keeping this state in a tuple would be too messy. Message-Id: <20200212204047.22398-3-kostja@scylladb.com>	2020-02-15 17:19:28 +02:00
Avi Kivity	91c4409376	locator: token_metadata: remove unused include "query-request.hh" sstable_datafile_test.cc lost access to interval_map (via position_in_partition.hh), so it now includes that directly.	2020-02-14 20:46:25 +02:00
Avi Kivity	bee1cc42fe	locator: token_metadata: move implementation classes to .cc With pimplification complete, move the implementation classes to .cc and remove boost/icl includes.	2020-02-14 20:34:44 +02:00
Avi Kivity	ef41b45142	locator: token_metadata: pimplify tokens_iterator Because tokens_iterator refers to token_metadata_impl, the latter cannot be moved out-of-line. So this patch pimplifies tokens_iterator as well.	2020-02-14 20:29:14 +02:00
Avi Kivity	9425e9c13d	locator: token_metadata: make token_metadata_impl::tokens_iterator a non-nested class In order to pimplify token_metadata_impl::tokens_iterator, we must make it a non-nested class, since eventually token_metadata_impl will be an incomplete class for users and nested classes cannot be forward declared. So this patch makes it a non-nested class. Two inline functions that referred to it were moved out of class scope so they can see the definition. No functional changes.	2020-02-14 20:29:13 +02:00
Avi Kivity	6d53f240d1	locator: token_metadata: pimplify token_metadata is a heavyweight class, with heavyweight include dependencies (icl, which has tens of thousands of lines in headers), heavyweight methods, but it rarely used. So it is a classic candidate for pimmplication. This patch splits off the implementation into token_metadata_impl and leaves token_metadata as a forwarding class. Actual movement of the code is left to a later patch to ease review. Notes: - some constructors were made public due to limitations of std::make_unique - a few token_metadata methods pass *this along to external functions, so we now pass the holder object as "unpimplified_this" to support this.	2020-02-14 20:29:12 +02:00
Avi Kivity	90a3670952	locator: token_metadata: use non-deduced return type for ring_range() Deduced return types are user hostile as the user has to look at the implementation in order to understand what the return type is.	2020-02-14 15:44:46 +02:00
Konstantin Osipov	8b2ce03ce4	query_processor: add CQL logging to all major execute call sites. Add missing CQL query logging to statement prepare, internal execute, batch execute. The logging is done under log level "trace".	2020-02-13 21:53:58 +03:00
Botond Dénes	78624b5069	test: sstable_datafile_test: add scrub unit test	2020-02-13 15:02:37 +02:00
Botond Dénes	26d4c8be95	compaction_manager: scrub: don't piggy-back on upgrade_sstables() Now that we have the necessary infrastructure to do actual scrubbing, don't rely on `upgrade_sstables()` anymore behind the scenes, instead do an actual scrub. Also, use the skip-corrupted flag.	2020-02-13 15:02:37 +02:00
Botond Dénes	33c126e8c0	compaction: introduce scrub_compaction A specialized compaction subclass for executing a scrub compaction. `scrub_compaction` supplies a specialized reader which will validate its input and stop on the first error. If it is configured with `skip_corrupted`, it will instead skip bad data, logging it.	2020-02-13 15:02:37 +02:00
Botond Dénes	1b7725af4b	mutation_fragment_stream_validator: split into low-level and high-level API The low-level validator allows fine-grained validation of different aspects of monotonicity of a fragment stream. It doesn't do any error handling. Since different aspects can be validated with different functions, this allows callers to understand what exactly is invalid. The high-level API is the previous fragment filter one. This is now built on the low-level API. This division allows for advanced use cases where the user of the validator wants to do all error handling and wants to decide exactly what monotonicity to validate. The motivating use-case is scrubbing compaction, added in the next patches.	2020-02-13 15:02:32 +02:00
Juliusz Stasiewicz	c13e935eae	tests/cdc: added test for TTL on log table cells	2020-02-13 14:00:53 +01:00
Piotr Sarna	f4d03d6063	docs: update write isolation documentation The documentation now mentions all acceptable variants of write isolation configuration values.	2020-02-13 13:51:31 +01:00
Piotr Sarna	8795323678	alternator-test: mark write isolation tests scylla_only With scylla_only fixture already available, manual checks for dynamodb no longer need to be performed.	2020-02-13 13:51:31 +01:00
Piotr Sarna	fba756858e	alternator-test: add test for write isolation tag Write isolation tags now accept only a small set of valid values. The test case ensures that all valid values are accepted and that invalid values return an error.	2020-02-13 13:51:31 +01:00
Piotr Sarna	fa4ddd2947	alternator: add validating write_isolation tag In order to prevent users from using incorrect write isolation configuration, a set of allowed values is introduced. When tagging a resource (which is considered rare), a tag will only be allowed if it belongs to the allowed set.	2020-02-13 13:51:31 +01:00
Piotr Sarna	7e6c9cad9a	alternator: move rmw_operation to a header rmw_operation is a class with a public interface, including a write_isolation enum and a fixed tag name for its configuration. For convenience, it's moved to a header file, so that code from executor.cc can use the definitions regardless of their position in the source file - it prevents reordering functions just to make sure that rmw_operation is defined before a function that uses its attributes.	2020-02-13 13:51:31 +01:00
Konstantin Osipov	ced778ba0b	query_procesor: move raw_cql_statement to cql_statement We'd like to log CQL statements inside batches, and they don't have prepared_statement object created for them.	2020-02-13 13:35:37 +03:00
Piotr Sarna	f4a05e1d23	alternator-test: add test for recreating a table The first iteration of keyspace-per-table approach for alternator revealed an issue with recreating a table after deleting it. This test case was used as a regression check.	2020-02-13 09:54:12 +01:00
Piotr Sarna	dca6c2c81d	alternator: move to NetworkTopologyStrategy Imstead of SimpleStrategy, NetworkTopologyStrategy is used for setting up the replication configuration for alternator tables. Replication factor 3 is used along with a local datacenter, unless alternator discovers that it's running on a test cluster with less than 3 nodes - then, RF is reduced accordingly and emits a warning, which was also the case for SimpleStrategy.	2020-02-13 09:46:46 +01:00
Piotr Sarna	3eb6da224b	alternator: switch to keyspace-per-table approach Instead of a monolith alternator keyspace, each table creates its own keyspace, named in the following pattern: `a#TABLE_NAME`. The `a#` prefix contains an illegal CQL character in order to ensure that these keyspaces are never created via CQL.	2020-02-13 09:46:19 +01:00
Konstantin Osipov	b531a6fe82	query_processor: set raw_cql_statement consistently raw_cql_statement is a member of prepared_statement which is not set in its constructor because prepared_statement constructor has too many call sites inside cql_statement hierarchy. cql_statement and prepared_statement dependency form a cycle and long term it obviously should be fixed. As a quick fix to query processor tracing, consistently assign raw_cql_statement in all prepared_statement usage sites.	2020-02-13 11:18:32 +03:00
Piotr Sarna	dcf54331ea	alternator: allow custom names for keyspaces The maybe_create_keyspace utility now accepts a parameter - the desired name for a newly created keyspace.	2020-02-13 09:16:37 +01:00
Piotr Sarna	e93c54e837	db,view: fix generating view updates for partition tombstones The update generation path must track and apply all tombstones, both from the existing base row (if read-before-write was needed) and for the new row. One such path contained an error, because it assumed that if the existing row is empty, then the update can be simply generated from the new row. However, lack of the existing row can also be the result of a partition/range tombstone. If that's the case, it needs to be applied, because it's entirely possible that this partition row also hides the new row. Without taking the partition tombstone into account, creating a future tombstone and inserting an out-of-order write before it in the base table can result in ghost rows in the view table. This patch comes with a test which was proven to fail before the changes. Branches 3.1,3.2,3.3 Fixes #5793 Tests: unit(dev) Message-Id: <8d3b2abad31572668693ab585f37f4af5bb7577a.1581525398.git.sarna@scylladb.com>	2020-02-12 23:16:30 +02:00
Tomasz Grabiec	3252068588	Merge "Multiple cleanups in cql3" from Kostja These series were born when working on debugging (missing) query processor trace-level logging, and trying to identify all entry points into parsed_statement::prepare(). Unfortunately I was unable to easily merge prepared_statement and cql_statement objects. Rationale for individual patches is given in commit comments.	2020-02-12 17:33:39 +01:00
Nadav Har'El	b93204d6bf	Alternator: allow CreateTable with streams explicitly turned off While Alternator doesn't yet support creating a table with streams (i.e., CDC) turned on, we should only failed the creation if streams were really turned on. If the StreamSpecification option exists, but does not ask to turn on streams, we should not fail the creation - and this patch fixes this. This patch also adds two tests - one where StreamSpecification is passed but does not ask to turn on streams (so table creation should succeed), and another test which explicitly requests to turn on streams. The second test still xfails on Alternator, and should continue to do so until we implement streams (we do not want to silently ignore a request to turn on streams). Fixes #5796 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200212100546.16337-1-nyh@scylladb.com>	2020-02-12 17:29:02 +01:00
Avi Kivity	48b694df55	cql3: like_matcher: pimplify to reduce inclusions of boost/regex boost/regex has huge header dependencies amounting to tens of thousands of lines. This are now replicated in 167 translation units. This patch converts like_matcher to use the pointer-to-implementation idiom, which reduces the number of translations including boost/regex to 28. Since regular expressions are relatively expensive, and like_matcher is relatively rare, the extra memory usage and run time will be negligible. Message-Id: <20200211170152.809554-1-avi@scylladb.com>	2020-02-12 17:04:12 +02:00
Konstantin Osipov	d4866c1a28	cql3: remove prepared alias for prepared_statement cql3 has cql_statement, parsed_statement and prepared_statement classes, which, largely, stand for the same thing. prepared was an alias for prepared_statement which only required an extra tag jump in IDE and carried no meaning.	2020-02-12 16:44:43 +03:00
Konstantin Osipov	cfdef844d8	cql3: remove unused include from parsed_statement.hh	2020-02-12 16:44:43 +03:00
Konstantin Osipov	bcb094c87a	query_processor: move parsed_statement definition to raw/ This is where parsed_statement declaration resides, put the definition next to declaration as is conventional for the rest of the classes.	2020-02-12 16:44:43 +03:00
Konstantin Osipov	93db4d748c	query_processor: fold one execute_internal() into another. All internal execution always uses query text as a key in the cache of internal prepared statements. There is no need to publish API for executing an internal prepared statement object. The folded execute_internal() calls an internal prepare() and then internal execute(). execute_internal(cache=true) does exactly that.	2020-02-12 16:44:12 +03:00
Konstantin Osipov	2e07c76153	query_processor: rename process_statement_prepared Rename process_statement_prepared to execute_prepared for consistency with the rest of query_processor API.	2020-02-12 16:37:08 +03:00
Konstantin Osipov	1a53458239	query_processor: rename one overload of process() Rename an overloaded function process() to execute_direct(). Execute direct is a common term for executing a statement that was not previously prepared. See, for example SQLExecuteDirect in ODBC/SQL CLI specification, mysql_stmt_execute_direct() in MySQL C API or EXECUTE DIRECT in Postgres XC.	2020-02-12 16:36:56 +03:00
Konstantin Osipov	170d41acf4	query_processor: fold process_statement_unprepared into process() process_statement_unprepared() is used in ::process() only and can be inlined. This will simplify understading CQL log output.	2020-02-12 16:22:15 +03:00
Piotr Sarna	f4e51a96ca	alternator: replace overloaded with overloaded_functor Turns out we already have a utility header for a visitor with overloaded lambdas. This patch purges the explicit reimplementation of the same trick and uses the existing class instead. Message-Id: <60c0b9a978f8208b188ef6ddc0564cb133bed707.1581496049.git.sarna@scylladb.com>	2020-02-12 14:21:42 +02:00
Amnon Heiman	8581617e78	api/storage_service: protect the objects during function call The list_snapshot API, uses http stream to stream the result to the caller. It needs to keep all objects and stream alive until the stream is closed. This patch adds do_with to hold these objects during the lifetime of the function. Fixes #5752	2020-02-12 13:08:34 +02:00
Calle Wilund	5e46079e89	exceptions: Set correct error code in truncate_exception Refs #4924 truncate_exception should, like its origin counterpart, set error code to TRUNCATE_ERROR, not PROTOCOL_ERROR. tests: unit + partial dtest Message-Id: <20200212100920.14478-1-calle@scylladb.com>	2020-02-12 11:17:16 +01:00
Avi Kivity	da00530464	Update seastar submodule * seastar 1c7bccc500...6d2ed8cdc6 (11): > connect_test: keep socket alive until the end. > Merge "Add timeout to smp::submit_to() and friends" from Botond > reactor: use reference to addrlen in accept > tests: stall_detector_test: use same clock as in test as in the detector > reactor: fallback to epoll backend when fs.aio-max-nr is too small > util: move read_sys_file_as() from iotune to seastar header, rename read_first_line_as() > core/resources: fix cpuset error > distributed_tests: increase sleep time further > core: thread: Fix compilation error in comment > reactor: specialize the pollable_fd_state > build: Use with -fstack-clash-protection when using guard pages	2020-02-12 12:07:00 +02:00
Avi Kivity	a8a4e584ec	Merge "Move token_metadata from storage_service" from Pavel " Lots of code needs storage_service just to get token_metadata from. This creates unwanted dependency loops and increases the use of global storage_service instance. This set keeps the sharded<locator::token_metadata> on main's stack and carries the references where needed. This removes the dependency on storage_service from: - storage_proxy - gossiper - redis - batchlog manager and makes the database only need it for sstables_format (will fix in one of the next sets). Also, this set is the prerequisite for controlling the copying of token_metadata instances (spotted two occurrences in bootstrap code). Tests: unit(dev), manual start-stop " * 'br-token-metadata-standalone-2' of https://github.com/xemul/scylla: api: Keep and use reference on token_metadata redis: Use proxy token_metadata gossiper: Keep needed for failure_detection values on board database: Use own token_metadata batchlog: Use token_metadata from proxy proxy: Use own token_metadata gossiper: Use own token_metadata tokens: Switch into standalone sharded instance batchlog: Use in-config ring-delay database: Have it in size_estimate_virtual_reader storage_proxy: Pass token_metadata in some static helpers storage_service: Move get_local_tokens wrapper size_estimates_virtual_reader: Make get_local_ranges static migration_manager: Refactor validation of new/updating ksm storage_service: Tiny cleanup of excessive self-reference	2020-02-11 19:15:22 +02:00
Botond Dénes	7d3bce403d	sstables: compaction_stop_exception: add retry flag Allow the thrower to communicate that it doesn't want the compaction to be retried later. I know, using exceptions for control flow is very bad, but this is the existing mechanism to stop a compaction and I don't want to invent a new one for this. Also massage the error messages a bit to take the value of this flag into consideration.	2020-02-11 18:38:35 +02:00
Avi Kivity	ba30a4074d	Merge "stop passing tracing state pointer in client_state" from Gleb " client_state is used simultaneously by many requests running in parallel while tracing state pointer is per request. Both those facts do not sit well together and as a result sometimes tracing state is being overwritten while still been used by active request which may cause incorrect trace or even a crash. " Fixes #5700. * 'gleb/tracing_fix_v1' of github.com:scylladb/seastar-dev: client_state: drop the pointer to a tracing state from client_state transport: pass tracing state explicitly instead of relying on it been in the client_state alternator: pass tracing state explicitly instead of relying on it been in the client_state	2020-02-11 17:59:20 +02:00
Botond Dénes	8014c7124d	compaction_manager: collect all cleanup related logic in perform_cleanup() Currently the call chain for a cleanup collection looks like this: compaction_manager::perform_cleanup() compaction_manager::rewrite_sstables() table::cleanup_sstables() ... `perform_cleanup()` is essentially empty, immediately deferring to `rewrite_sstables()`. Cleanup related logic is scattered between the latter two methods on the call chain. These methods however recently started serving as generic methods for compactions that want to rewrite each sstable one-by-one, collecting cleanup related ifs in various places. The reason is historic, we first had cleanup, then bolted others on top, trying to share the underlying code as much as possible. It is time this is cleaned up (pun intended). Make `perform_cleanup()` the place where all cleanup related logic is, with the rest of the stack made truly generic.	2020-02-11 17:47:44 +02:00
Botond Dénes	b2dc5d4895	compaction: compaction_descriptor: use compaction options instead of cleanup flag Instead of the restrictive `cleanup` boolean flag, which allows for choosing between only two compaction types, use `compaction_options`, which in addition to allowing any number of compaction types to be selected, also allows seamlessly passing specific options to them.	2020-02-11 17:47:44 +02:00
Botond Dénes	8579bef076	compaction: introduce compaction_options Currently the compaction API is quite restrictive. It offers a generic `compact_sstables()` and `reshard_sstables()` methods. The former is the one used by all but resharding, however it only really supports two modes: regular and cleanup. The latter is supported by a semi-hidden `cleanup` flag in `compaction_description`. Actually there are two more compaction types already which are piggy-backed on cleanup: upgrade and scrub. The upper layers distinguish between actual cleanup and "fake" cleanup by a `is_actual_cleanup` flag. The latter two "fake" cleanup compactions cannot be distinguished even by the upper layers. This is terribly confusing and hard to follow, in addition to being restrictive. This worked so far, because upgrade is served quite well by the cleanup compaction type, turning off certain preparations by the above mentioned `is_actual_cleanup` flag. Scrub is barely implemented and just an upgrade behind the scenes. This situation is however preventing really specializing each compaction. Enter `compaction_options`. This variant in disguise is designed to allow passing specific option to each compaction type, and doubles as an enum allowing more than two low level compaction type. This patch only adds the option class itself, propagating and handling it will be done by the next patches.	2020-02-11 17:47:44 +02:00
Botond Dénes	6bc3b41c20	compaction: compaction_type: add Upgrade Although we currently do support upgrade compaction, it is piggy-backed on top of cleanup compaction. This is soon going to change, so in preparation to that, add an `Upgrade` member to the `compaction_type` enum.	2020-02-11 17:47:44 +02:00
Botond Dénes	0b53ccaecd	table: cleanup_sstables(): only short-circuit on actual cleanup Currently the cleanup call is short circuited if it is determined that cleanup is not needed for the sstable to-be-cleaned-up. This is undesired because actually not just cleanup uses this routine to rewrite sstables, sstable-upgrade and sstable-scrub also uses it, and they want to go on with the cleanup compaction sstables even if all data in it belongs to the current node. Fix: #5699	2020-02-11 17:47:44 +02:00
Nadav Har'El	9fad494572	merge: Reduce #include bloat around cql3 internals from non-cql3 users Merged pull request https://github.com/scylladb/scylla/pull/5755 from Avi Kivity: This series removes some #include dependencies around cql3. It results in 30k line (6.6%) reduction in the preprocessed size of database.i, mainly due to elimination of boost::regex (which was brought in in turn by like_matcher). This should result in fewer and faster recompiles. commits: tracing: remove #include of modification_statement.hh from table_helper cql3: selection: remove now-unneeded include of statement_restrictions.hh cql3: deinline result_set_builder::restrictions_filter constructor view_info: remove include of select_statement.hh cql3: selection: remove unnecessary include of selector_factories cql3: query_processor: reduce #includes	2020-02-11 15:58:29 +02:00
Juliusz Stasiewicz	67b92c584f	cdc: set TTLs on CDC log cells Cells in CDC logs used to be created while completely neglecting TTLs (the TTLs from `cdc = {...'ttl':600}`). This patch adds TTLs to all cells; there are no row markers, so wee need not set TTL there. Fixes #5688	2020-02-11 12:56:41 +01:00
Eliran Sinvani	9eb6ac7162	docker: add rsyslog for syslog support One of the logging options for Scylla is syslog, this method, until today wasn't supported in the docker images that are created with the Dockerfile in the repo. This commit add rsyslog installation, configuration and setup for Docker. Tests: built and ran the docker and validated the existance of the /dev/log socket. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20200210112448.210169-1-eliransin@scylladb.com>	2020-02-11 13:30:59 +02:00
Tomasz Grabiec	165913598b	Revert "features: Stop on shutdown" This reverts commit `ca55c6c15f`. Triggers the broken promise exception on aborted stop. If the feature service is stopped without enabling some features, the later may end up with "broken promise" exception on futures attached to the _pr promise.	2020-02-11 11:57:22 +01:00
Botond Dénes	3164456108	row: append(): downgrade assert to on_internal_error() This assert, added by `060e3f8` is supposed to make sure the invariant of the append() is respected, in order to prevent building an invalid row. The assert however proved to be too harsh, as it converts any bug causing out-of-order clustering rows into cluster unavailability. Downgrade it to on_internal_error(). This will still prevent corrupt data from spreading in the cluster, without the unavailability caused by the assert. Fixes: #5786 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200211083829.915031-1-bdenes@scylladb.com>	2020-02-11 11:07:42 +02:00
Piotr Sarna	b977aa034b	Merge 'cdc: disallow negative TTL values in CDC options' from Juliusz Setting TTL = -1 in cdc_options prevents any writes to CDC log. But enabling CDC and having unwritable log table makes no sense. Notably, normal writes USING TTL -1 are forbidden. This patch does the same to TTLs in CDC options. Fixes #5747 * jul-stas/5747-cdc-disallow-negative-ttl: tests/cdc: added test for exception when TTL < 0 cdc: disallow negative TTL values in CDC	2020-02-11 09:23:56 +01:00
Pavel Emelyanov	ac998e9576	repair: Do not explicitly switch sched group When registering callbacks for row-level repair verbs the sched groups is assigned automatically with the help of messaging_service::scheduling_group_for_verb. Thus the the lambda will be called in the needed sched group, no need for manual switch. This removes the last occurence of global storage_service usage from row-level repair. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 22:15:44 +03:00
Pavel Emelyanov	ccc102affa	repair: Use db from callee The do_repair_start() emulates db.invoke_on_all and can re-use the db.local() inside without the need to call for global storage_service instance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 22:13:03 +03:00
Pavel Emelyanov	c6ddd21c50	repair_writer: Use db from repair_meta The caller of repair_writer.create_writer al ready have the needed reference on database, no need to get it from global storage_service instance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 22:10:42 +03:00
Juliusz Stasiewicz	c0edc2bf53	tests/cdc: added test for exception when TTL < 0	2020-02-10 19:13:59 +01:00
Pavel Emelyanov	5434e412e4	api: Keep and use reference on token_metadata	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	4b2307c8b6	redis: Use proxy token_metadata This removes dependency between redis and storage_service	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	eb827c9f5d	gossiper: Keep needed for failure_detection values on board And drop the gossiper -> storage_service link Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	1a3f78a57d	database: Use own token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	7cdfd94207	batchlog: Use token_metadata from proxy This kills the second global reference on storage_service from batchlog code and breaks the dependency loop between these two. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	fecea1de7e	proxy: Use own token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	2f3490dc8d	gossiper: Use own token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	c5997b573c	tokens: Switch into standalone sharded instance Way too many places in code needs storage_service just for token_metadata. These references increase the amount of get(_local)?_storage_service() calls and create loops in components dependencies. Keep the token_metadata separately from storage_service and pass instances' references where needed (for now -- only into the storage_service itself). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	b4e66ddf1d	batchlog: Use in-config ring-delay This kills the first (out of two) global reference on storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	9257346c18	database: Have it in size_estimate_virtual_reader This is to remove the last global reference on storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	bf5be0e971	storage_proxy: Pass token_metadata in some static helpers Soon there will be token_metadata on storage_proxy, so prepare for that in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	6050c559a3	storage_service: Move get_local_tokens wrapper This wrapper just makes sure the system_keyspace::get_saved_tokens reports non empty result. Move them close together. As a side effect -- get rid of penultimate global storage_service reference from size_estimates_virtual_reader (the last one will be removed soon). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:31 +03:00
Piotr Sarna	bfd7d74b0f	Merge 'Protect CDC-related tables from being modified by the user' from Piotr This patch introduces following modifications: Disallows enabling cdc for table X when X_scylla_cdc_log already exists, Restricts DROP permissions for X_scylla_cdc_log tables, Restricts ALTER and DROP permissions for cdc_description and cdc_topology_description, Disallows cdc option when creating materialized views. Refs #4991. Tests: unit(dev). * piodul/4991-permissions-for-cdc-tables: cdc: disallow CDC options for materialized views cdc: restrict permissions on cdc_(topology_)description cdc: restrict permissions on _scylla_cdc_log tables cdc: refuse to enable cdc when table _scylla_cdc_log exists	2020-02-10 18:02:43 +01:00
Raphael S. Carvalho	140520ff87	sstables/compaction_manager: add metric for pending compaction tasks we have compaction_manager.compactions metric for the number of active tasks, but they don't account for tasks blocked waiting for an opportunity to run, and they're the problematic ones. Fixes #5254. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200210131929.30981-1-raphaelsc@scylladb.com>	2020-02-10 17:55:02 +01:00
Pavel Emelyanov	17db6df15c	size_estimates_virtual_reader: Make get_local_ranges static There's the call of the same name in storage_service, so make this one explicitly static for better readability. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 18:10:39 +03:00
Pavel Emelyanov	de1dc59548	migration_manager: Refactor validation of new/updating ksm The goal is to have token_metadata reference intide the keyspace_metadata.validate method. This can be acheived by doing the validation through the database reference which is "at hands" in migration_manager. While at it, merge the validation with exists/not-exists checks done in the same places. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 18:10:38 +03:00
Pavel Emelyanov	01a28867d6	storage_service: Tiny cleanup of excessive self-reference Do not use get_local_storage_service inside storage_service method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 18:10:38 +03:00
Piotr Dulikowski	949642b866	cdc: disallow CDC options for materialized views While it didn't have any effect, it was possible to supply cdc options for a materialized view. This change disallows it.	2020-02-10 15:51:11 +01:00
Piotr Dulikowski	81fa59e178	cdc: restrict permissions on cdc_(topology_)description Following permissions are disallowed on cdc_description and cdc_topoplogy_description: ALTER, DROP.	2020-02-10 15:40:48 +01:00
Piotr Dulikowski	6fe4f9ded8	cdc: restrict permissions on _scylla_cdc_log tables Disallows DROP permission on CDC log tables.	2020-02-10 15:40:48 +01:00
Piotr Dulikowski	0c18742997	cdc: refuse to enable cdc when table _scylla_cdc_log exists	2020-02-10 15:40:48 +01:00
Gleb Natapov	31cf2434d6	client_state: drop the pointer to a tracing state from client_state client_state is shared between requests and tracing state is per request. It is not safe to use the former as a container for the later since a state can be overwritten prematurely by subsequent requests.	2020-02-10 14:59:22 +02:00
Takuya ASADA	43097854a5	dist/debian: keep /etc/systemd .conf files on 'remove' Since dpkg does not re-install conffiles when it removed by user, currently we are missing dependencies.conf and sysconfdir.conf on rollback. To prevent this, we need to stop running 'rm -rf /etc/systemd/system/scylla-server.service.d/' on 'remove'. Fixes #5734	2020-02-10 14:54:25 +02:00
Gleb Natapov	9f1f60fc38	transport: pass tracing state explicitly instead of relying on it been in the client_state Multiple requests can use the same client_state simultaneously, so it is not safe to use it as a container for a tracing state which is per request. Currently next request may overwrite tracing state for previous one causing, in a best case, wrong trace to be taken or crash if overwritten pointer is freed prematurely. Fixes #5700	2020-02-10 14:54:15 +02:00
Gleb Natapov	38fcab3db4	alternator: pass tracing state explicitly instead of relying on it been in the client_state Multiple requests can use the same client_state simultaneously, so it is not safe to use it as a container for a tracing state which is per request. This is not yet an issue for the alternator since it creates new client_state object for each request, but first of all it should not and second trace state will be dropped from the client_state, by later patch.	2020-02-10 14:50:55 +02:00
Juliusz Stasiewicz	133156ddcf	cdc: disallow negative TTL values in CDC	2020-02-10 13:50:00 +01:00
Kamil Braun	6c4f2b9717	storage_service: check for CDC flag in start_gossiping This is a bug: we tried to retrieve the CDC streams timestamp even if CDC flag was not enabled in storage_service::start_gossiping.	2020-02-10 14:30:35 +02:00
Takuya ASADA	b6988112b4	scylla_post_install.sh: fix operator precedence issue with multiple statements In bash, 'A \|\| B && C' will be problem because when A is true, then it will be evaluates C, since && and \|\| have the same precedence. To avoid the issue we need make B && C in one statement. Fixes #5764	2020-02-10 14:29:40 +02:00
Avi Kivity	bed61b96a2	Merge "Move features from storage- into feature-service" from Pavel " There's a lot of code around that needs storage service purely to get the specific feature value (cluster_supports_<something> calls). This creates several circular dependencies, e.g. storage_service <-> migration_manager one and database <-> storage_servuce. Also features sit on storage_service, but register themselfs on the feature_service and the former subscribes on them back which also looks strange. I propose to keep all the features on feature_service, this keeps the latter intependent from other components, makes it possible to break one of the mentioned circle dependencyand heavily relax the other. Also the set helps us fighting the globals and, after it, the feature_service can be safely stopped at the very last moment. Tests: unit(dev), manual debug build start-stop " * 'br-features-to-service-5' of https://github.com/xemul/scylla: gossiper: Avoid string merge-split for nothing features: Stop on shutdown storage_service: Remove helpers storage_service: Prepare to switch from on-board feature helpers cql3: Check feature in .validate database: Use feature service storage_proxy: Use feature service migration_manager: Use feature service start: Pass needed feature as argument into migrate_truncation_records features: Unfriend storage_service features: Simplify feature registration features: Introduce known_feature_set features: Move disabled features set from storage_service features: Move schema_features helper features: Move all features from storage_service to feature_service storage_service: Use feature_config from _feature_service features: Add feature_config storage_service: Kill set_disabled_features gms: Move features stuff into own .cc file migration_manager: Move some fns into class	2020-02-09 19:22:07 +02:00
Calle Wilund	af963e76c7	keyspace/distributed_loader: Add wait for (user) keyspace population to finish Allows caller to check/wait for a given user keyspace to finish populating on boot. Can be called at any time, though if called before population starts, it will wait until it either starts and we can determine that the keyspace does not need populating, or population finishes. tests: unit Message-Id: <20200203151712.10003-1-calle@scylladb.com>	2020-02-09 18:56:22 +02:00
Pavel Emelyanov	d1775dd701	utils: Move disk-error-handler into it The disk-error-handler is purely auxiliary thing that helps propagating IO errors to the rest of the code. It well deserves not sitting in the root namespace. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200207112443.18475-1-xemul@scylladb.com>	2020-02-09 17:26:52 +02:00
Pavel Solodovnikov	bcc4647552	lwt: fix handling of nulls in parameter markers for LWT queries This patch affects the LWT queries with IF conditions of the following form: `IF col in :value`, i.e. if the parameter marker is used. When executing a prepared query with a bound value of `(None,)` (tuple with null, example for Python driver), it is serialized not as NULL but as "empty" value (serialization format differs in each case). Therefore, Scylla deserializes the parameters in the request as empty `data_value` instances, which are, in turn, translated to non-empty `bytes_opt` with empty byte-string value later. Account for this case too in the CAS condition evaluation code. Example of a problem this patch aims to fix: Suppose we have a table `tbl` with a boolean field `test` and INSERT a row with NULL value for the `test` column. Then the following update query fails to apply due to the error in IF condition evaluation code (assume `v=(null)`): `UPDATE tbl SET test=false WHERE key=0 IF test IN :v` returns false in `[applied]` column, but is expected to succeed. Tests: unit(debug, dev), dtest(prepared stmt LWT tests at https://github.com/scylladb/scylla-dtest/pull/1286) Fixes: #5710 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200205102039.35851-1-pa.solodovnikov@scylladb.com>	2020-02-09 16:50:42 +02:00
Avi Kivity	b26ded8ec5	tracing: remove #include of modification_statement.hh from table_helper Replace with a forward declration to reduce #include bloat and dependencies.	2020-02-09 13:04:13 +02:00
Avi Kivity	f8e85e5c2a	cql3: selection: remove now-unneeded include of statement_restrictions.hh Actual users gain #includes of statement_restrictions and query_options that they previously got through selection.hh.	2020-02-09 13:01:32 +02:00
Avi Kivity	710e4ec99d	cql3: deinline result_set_builder::restrictions_filter constructor It stands in the way of #include removal, so it must go. It should have no performance impact as it is too large to be inlined.	2020-02-09 13:00:17 +02:00
Avi Kivity	c6118d96d2	view_info: remove include of select_statement.hh It is not needed by users of view_info.	2020-02-09 12:43:33 +02:00
Avi Kivity	7474db4075	cql3: selection: remove unnecessary include of selector_factories It is only mentioned in the header file, so the forward declaration can be used and the include moved to the real users.	2020-02-09 12:37:36 +02:00
Avi Kivity	dcab666d52	cql3: query_processor: reduce #includes query_processor is a central class, so reducing its includes can reduce dependencies treewite. This patch removes includes for parsed_statement, cf_statement, and untyped_result_set and fixes up the rest of the tree to include what it lacks as a result of these removals.	2020-02-09 12:24:24 +02:00
Nadav Har'El	576f80be74	alternator-test: add comprehensive tests for KeyConditionExpression This patch adds comprehensive tests for KeyConditionExpression, the newer DynamoDB API syntax for specifying the item range which is requested from a Query (this syntax replaced the older KeyConditions syntax, which Alternator already supports). Before this patch, we had only a small test for KeyConditionExpression in test_query.py. This patch replaces it by a large number of small tests, testing the many sub-features of KeyConditionExpression - its different operators, sort-key types, different failure modes, etc. As usual, because we haven't yet implemented this feature in Alternator (see issue #5037), all these tests pass on AWS, but xfail on Alternator. Despite the new test file containing about 40 small tests, it finishes very quickly because we use pytest's fixture feature to allow small read-only tests to perform a query to a partition that is only written once for many tests. So these small tests become extremely fast, and there is no downside to having many small tests instead of lumping them into fewer large tests checking many things. Refs #5037. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200207134159.3283-1-nyh@scylladb.com>	2020-02-08 11:10:09 +02:00
Piotr Dulikowski	534e9ba27d	cdc: store information on ttl in "ttl" column, not in tuples This patch changes the way TTL is stored in the CDC log table. Instead of including TTL of cell `X` in the third element of the tuple in column `_X`, TTL is written to the previously unused column `ttl`. This is done for cosmetic purposes. This implementation works under assumption that there will be only one TTL included in a mutation coming from a CQL write. This might not be the case when writing a batch that modifies the same row twice, e.g.: ``` BATCH INSERT INTO ks.t (pk, ck, v1) VALUES (1,2,3) USING TTL 10; INSERT INTO ks.t (pk, ck, v2) VALUES (1,2,3) USING TTL 20; END BATCH ``` In this case, this implementation will choose only one TTL value to be written in the CDC log: ``` ... \| batch_seq_no \| _ck \| _pk \| _v1 \| _v2 \| operation \| ttl ...-+--------------+-----+-----+--------+--------+-----------+----- ... \| 0 \| 2 \| 1 \| (0, 3) \| (0, 3) \| 1 \| 20 ``` This behavior might be changed as a part of issue #5719, which considers splitting a batch write mutation when it contains multiple writes to the same row. Refs #5689 Tests: unit(dev)	2020-02-08 11:10:09 +02:00
Pavel Emelyanov	e2ec5eecf6	view_update: Do not need storage_proxy The view_update_generator acceps (and keeps) database and storage_proxy, the latter is only needed to initialize the view_updating_consumer which, in turn, only needs it to get database from (to find column family). This can be relaxed by providing the database from _generator to _consumer directly, without using the storage_proxy in between. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200207112427.18419-1-xemul@scylladb.com>	2020-02-07 13:30:01 +02:00
Pavel Emelyanov	00746d6a16	dht: Use const reference for token_metadata arg Two places in dht code have token_metadata _value_ arguments, but only read tokens from them. Optimize it a bit by turning values into const references. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200207112408.18352-1-xemul@scylladb.com>	2020-02-07 13:30:00 +02:00
Avi Kivity	5950a9e37f	.dockerignore: add testlog testlog files are not used when preparing the frozen toolchain, and can be very large, so ignore them in order to speed up the docker build.	2020-02-07 08:59:39 +01:00
Gleb Natapov	ff88ff880b	lwt: use cached truncation record instead of quering the database Message-Id: <20200206163838.5220-3-gleb@scylladb.com>	2020-02-06 18:15:48 +01:00
Gleb Natapov	20bf3800f3	database: cache truncation time in table objects Truncation time is used on each LWT request now, so reading it from the table is too heave operation to be on a fast path. It also requires jumping to a shard that contains corresponding data. This patch caches the data on the table object of each shard for easy access. The cache is initialized during boot from system.truncated table and updated on each truncation operation. Message-Id: <20200206163838.5220-2-gleb@scylladb.com>	2020-02-06 18:15:48 +01:00
Takuya ASADA	5d82fcf944	dist/ami: use prebuilt rpms on --localrpm We made --localrpm option to automatically build rpms from sourcecode, but we actually use the option to produce AMI using prebuilt rpm on our CI. To simplified the script, and to prevent accsidently start rpm build in the script, drop rpm build part.	2020-02-06 18:41:52 +02:00
Amnon Heiman	687e554737	api/storage_service: use stream in get_snapshots get_snapshot should use http stream to reduce memory allocation and stalls. This patch change the implementation so it would stream each of the snapshot object instead of creating a single response and return it. Fixes #5468 Depends on scylladb/seastar#723	2020-02-06 18:40:37 +02:00
Takuya ASADA	c44f347886	SCYLLA-VERSION-GEN: skip updating version files when git hash unchanged On our build system we tries to build relocatable package multiple times on same revision of the repository, it executes ./SCYLLA-VERSION-GEN for each time. When the build job invoked at midnight and it did not finished until 12:00AM, first build and last build has different SCYLLA-RELEASE-FILE, since it contains current date. To prevent it, skip updating SCYLLA-*-FILE when git hash unchanged. Fixes scylladb/scylla-pkg#826	2020-02-06 18:36:46 +02:00
Botond Dénes	05116ba963	reader_concurrency_semaphore: make signal() noexcept Currently reader_concurrency_semaphore::signal() can fail. This is dangerous in two ways: * It is called from constructors, so the exception can bring down the node. This will convert an `std::bad_alloc` to a crash. * Reads in the queue will be blocked until they either time-out, or another `signal()` succeeds. To solve this, wrap the `reader_permit` constructor, the only code that can throw, with try-catch and forward the exception to the reader admission promise. In practice this will result in the flushing of the reader queue, when we fail to admit a read. Fixes #5741 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200206154238.707031-1-bdenes@scylladb.com>	2020-02-06 17:51:03 +02:00
Botond Dénes	434d32befe	reader_permit: tidy up reader_permit::memory_units This patch is a bag of fixes/cleanups that were omitted from the reader memory tracking series due to contributor error. It contains the following changes: * Get rid of unused `increase()` and `decrease()` methods. * Make all constructors and assignment operators `noexcept`. * Make move assignment operator safe w.r.t. self assignment. * `reset()`: consume the new amount before releasing the old amount, to prevent a transient window where new readers might be admitted. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200206143007.633069-1-bdenes@scylladb.com>	2020-02-06 16:35:07 +02:00
Piotr Sarna	757c1cf91e	Merge ' Remove unnecessary schema copies' from Piotr Most of the time schema does not have to be copied and sometimes it's not even used. tests: unit(dev) Closes #5739 * hawk/remove_schema_copies: multishard_mutation_query_test: stop capturing unused schema index_reader: avoid copying schema to lambda	2020-02-06 15:20:24 +01:00
Piotr Jastrzebski	d1fe75edbc	multishard_mutation_query_test: stop capturing unused schema Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 14:18:50 +01:00
Piotr Jastrzebski	8813a6ca2a	index_reader: avoid copying schema to lambda Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 14:10:58 +01:00
Nadav Har'El	abdbb70ad9	Allow configuring alternator write isolation Merged patch series from Piotr Sarna: This series adds a way to confgure alternator write isolation policy per-table with the use of tags. Instead of hardcoded LWT_ALWAYS policy, it can now be set by tagging a table with a tag of the following form: { 'Key': 'system:write_isolation', 'Value': X }, where X is one of the following implemented levels: * 'f' - forbid RMW * 'a' - always enforce RMW * 'o' - only RMW writes will go through LWT * 'u' - unsafe RMW (to be deprecated/eradicated) By default, if no tag is found, alternator falls back to always applying LWT to writes. This series also contains fixes to the tagging interface - some minor issue came up while implementing write isolation config on top of tags. test: alternator-test(local,remote) Piotr Sarna (6): alternator: return tags for a table via const reference alternator: fix overwriting tags alternator: make _write_isolation a protected attribute alternator: add configuring write isolation policy via tags alternator-test: add testing different write isolation policies docs: update alternator on write isolation alternator-test/test_condition_expression.py \| 63 ++++++++++++++ alternator-test/test_tag.py \| 25 ++++++ alternator/executor.cc \| 89 +++++++++++++------- docs/alternator/alternator.md \| 21 +++-- 4 files changed, 162 insertions(+), 36 deletions(-)	2020-02-06 12:37:19 +02:00
Nadav Har'El	8b6925790f	Reduce usage of global_partitioner() Merged pull request https://github.com/scylladb/scylla/pull/5733 from Piotr Jastrzębski: In many places we use global_partitioner() to obtain parameters that are available in config. This PR replaces number of global_partitioner() calls with equivalent non-global ways. tests: unit(dev) * 'reduce_global_usage' of github.com:haaawk/scylla: storage_service: reduce number of global_partitioner calls cdc: remove partitioner from db_context gossiper: stop calling global_partitioner() system_keyspace: stop calling global_partitioner() transport/server: stop calling global_partitioner() thrift: stop calling global_partitioner() partitioner: move cpu_sharding_algorithm_name to token-sharding.hh	2020-02-06 12:10:38 +02:00
Piotr Sarna	9ac35b9367	docs: update alternator on write isolation Docs are appended with information on write isolation - which levels are implemented in alternator and how to configure them properly.	2020-02-06 10:26:26 +01:00
Piotr Sarna	4d3b8e3b5a	alternator-test: add testing different write isolation policies Additional testing is done via: 1. Checking that permissive isolation levels ('a', 'o', 'u') allow conditional writes 2. Checking that 'f' isolation level (forbid rmw) works as expected: - read-modify-write requests are forbidden - non-rmw writes are allowed	2020-02-06 10:26:26 +01:00
Piotr Sarna	4a9536b7c1	alternator: add configuring write isolation policy via tags Until now, write isolation policy was hardcoded to always enforcing LWT. From now on, setting a tag via UpdateTags request or during table creation will associate a policy with given table. The tag key is 'system:write_isolation' and its value can be one of: * 'f' - forbid RMW * 'a' - always enforce RMW * 'o' - only RMW writes will go through LWT * 'u' - unsafe RMW (to be deprecated/eradicated)	2020-02-06 10:26:26 +01:00
Piotr Sarna	0479a1bf67	alternator: make _write_isolation a protected attribute No useful semantic changes yet, but it will help produce better diffs for future patches.	2020-02-06 10:04:34 +01:00
Piotr Sarna	51c14cb1ce	alternator: fix overwriting tags Tagging a resource with a tag key that already exists should result in overwriting the old value. It wasn't the case, so it's now fixed and an appropriate test is added.	2020-02-06 10:04:34 +01:00
Piotr Sarna	ed940f000d	alternator: return tags for a table via const reference The signature of the helper function is changed, so that it's possible to acquire a const reference of the tags, instead of being forced to get a copy of the whole map (potentially large).	2020-02-06 10:04:34 +01:00
Piotr Sarna	f4b6f0956b	Merge "Pending Alternator patches" from Nadav Here is a rebase of some of my already-reviewed Alternator patches - the final piece of the fix to LWT timestamps (in BatchWriteItems), The "/localnodes" request, and a couple of patches reducing the number of times that the global storage_proxy is needed. Also available in a github branch, git@github.com:nyh/scylla.git series1 * nyh/series1: redis: remove redundant code storage_proxy: make it into a peering sharded service alternator: use simpler API for registering Alternator's HTTP URLs alternator-test: test "/localnodes" feature alternator: add public API for list of nodes in current DC alternator: use LWT timestamp - in BatchWriteItems too	2020-02-06 09:48:10 +01:00
Juliusz Stasiewicz	20f7b1b0ad	tests: add test for CDC schema extension Test for functionality added in #5720. Refs #5589	2020-02-06 09:26:13 +01:00
Piotr Jastrzebski	9bfd3dc311	storage_service: reduce number of global_partitioner calls Replace global_partitioner().sharding_ignore_msb() call with config::murmur3_partitioner_ignore_msb_bits() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 08:00:34 +01:00
Piotr Jastrzebski	97262bec82	cdc: remove partitioner from db_context partitioner from cdc::db_context is no longer used so it can be removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 08:00:01 +01:00
Piotr Jastrzebski	61d8308848	gossiper: stop calling global_partitioner() Obtain name of the default partitioner from config instead of a global. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 07:59:07 +01:00
Piotr Jastrzebski	8b4ec5b1d2	system_keyspace: stop calling global_partitioner() Obtain name of default partitioner from config instead of a global. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 07:58:07 +01:00
Piotr Jastrzebski	d3d6547889	transport/server: stop calling global_partitioner() Obtain SCYLLA_SHARDING_IGNORE_MSB and SCYLLA_PARTITIONER from config instead of a global. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 07:57:06 +01:00
Piotr Jastrzebski	dde8c7df00	thrift: stop calling global_partitioner() Replace global_partitioner().name() call with config::partitioner(). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 07:55:54 +01:00
Piotr Jastrzebski	8817a62499	partitioner: move cpu_sharding_algorithm_name to token-sharding.hh Sharding logic has been moved to token-sharding.hh some time ago. This logic does not depend on partitioner any more so cpu_sharding_algorithm_name can be safely moved to the header where rest of sharding logic lives. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-06 07:53:45 +01:00
Nadav Har'El	3f27b070e7	redis: remove redundant code In one place, we already had a "proxy" object, but still asked for it again. Remove the redundant line. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	9fd9ec14c2	storage_proxy: make it into a peering sharded service We consider globals like service::get_storage_proxy() a bad idea, and would like to reduce their use as much as possible - and eventually, eliminate it completely. One easy case to fix case is when we already have a shard-local proxy, but now we need the sharded object, to invoke_on() something on it. In this patch, we turn storage_proxy into a peering_sharded_service. This means that if you already have a storage_proxy, you can call its container() function to get the sharded<storage_proxy>, without needing to call the global service::get_storage_proxy(). We found a few such cases in storage_proxy itself, and in Alternator, and fixed them to use container() instead of the global function. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	b262eb5031	alternator: use simpler API for registering Alternator's HTTP URLs We used the Seastar HTTP server's add() method to register URLs to serve (so-called "routes"), but as suggested by Amnon, when we have fixed URLs without parameters being path components, it's simpler to use the put() method to do the same thing - and also results in slightly less work at run-time to look up these routes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	9de26b73a4	alternator-test: test "/localnodes" feature This is a partial test for the "/localnodes" request, which is supposed to return the list of live nodes in this DC. Because of the limitations of our current alternator-test framework (which should work on any pre-existing cluster), we don't know what to expects as a reply, but we just verify the minimum: The request is understood, returns a JSON list, which contains at least one item. As "/localnodes" is a Scylla-only feature, this test is skipped when running with "--aws". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	3fecf6f641	alternator: add public API for list of nodes in current DC If we want to balance the Alternator request load among the different nodes (Refs #5030), the load balancer - whether it uses HTTP load balancing or DNS - needs to be able to get an up-to-date list of live nodes to which it can direct Alternator traffic. This list should include only the live nodes in the same data center (geographical region) - it is expected that a separate load balancer will be installed in each data center, and clients from within this data center will reach this data center's load balancer. There are multiple APIs in current Scylla to do something similar to what we need, but as far as I know, none of them is exactly what we need or convenient for Alternator installations: We don't want the load balancer to use CQL, and the REST API http://localhost:10000/gossiper/endpoint/live/ doesn't do what we need (it doesn't restrict the list to one data center) plus it's not open to connections outside the machine. So in this patch, we implement a new HTTP request on the Alternator port - "/localnodes", returning a JSON-formatted list of all live nodes in the contacted node's data center: $ curl http://localhost:8000/localnodes ["127.0.0.2","127.0.0.1","127.0.0.3"] Like the existing health check HTTP request, this operation is public and unauthenticated. We consider the security risk low - it allows an attacker to enquire the list of Scylla nodes in this DC, but an attacker can achieve the same thing by just scanning the addresses in this subnet using the health check request (or even with ordinary DynamoDB API requests). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	95351016fd	alternator: use LWT timestamp - in BatchWriteItems too A previous patch fixed Alternator's writes to use the timestamp provided by LWT instead of the current timestamp. That patch fixed the PutItem, DeleteItem and UpdateItem operations - and this patch fixes the remaining write operation: BatchWriteItems. So, Fixes #5653. Unfortunatly, the requirements of both BatchWriteItems and LWT make the resulting code - and this patch - somewhat inelegant. BatchWriteItems requires that we prepare all the operations first - failing if any of them has an error. Before this patch, the result of this preparation was an array of mutations, which in a second step we wrote to the database. But we can no longer use mutations for the result of the first step, because creating a mutation requires knowing the timestamp, which we don't know during the preparate phase - we will only know it during the later LWT operation. So now we need to invent a new intermediate format between the request and the mutation. This intermediate format is further complicated by the need to be send it between shards (for LWT's shard forwarding) so it cannot, for example, contain a reference to a schema. The fact that different sub-operations need to be sent to different shards, and that different sub-operations may write to different tables, further complicate the book-keeping and gives us a bunch of funky-typed maps. But eventually it all fits together. After this patch, as before this patch, the same code (now called put_or_delete_item), is used to implement both the PutItem and DeleteItem stand-alone operation, and the BachWriteItems operation which includes a whole list of these PutItem and DeleteItem operation. This patch also includes two more tests in test_batch.py, which test two more corner tests we haven't tested before: One tests the capability of BatchWriteItems to write to more than one table. The other tests that BatchWriteItems can write an empty item (it is not surprising that it does, but we do have special code for this case, so we should test it). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Avi Kivity	27b36beb4a	Update seastar submodule * seastar 30185fd901...1c7bccc500 (8): > reactor: rename kernel_completion::set_value to complete_with > net: Remove unused member variables > net: Fix global buffer overflow around rss_key_type > reactor: remove kernel_completion::set_promise > Merge "generalize the io_desc (now kernel_completion)" from Glauber > everywhere: Disable -Wmisleading-indentation around ragel generated code > core: Make when_all_state_component final > io_tester: Remove unused lambda capture	2020-02-05 20:20:43 +02:00
Gleb Natapov	ff696682ed	add missing include to timestamp.hh The file uses std::string but does include <string> header. My compiler complains. Message-Id: <20200205085739.GN26048@scylladb.com>	2020-02-05 19:42:18 +02:00
Avi Kivity	e719ea1bba	Merge "Fix assert on initialization error" (in large_data_handler) from Rafael " This series fixes an assertion when initialization fails after creating a database. I don't know of a case where that currently happens, but it is easy to cause that when writing a patch and the produced assert is just confusing. " * 'espindola/dont-assert-on-init-error' of https://github.com/espindola/scylla: db: Replace large_data_handler::_stopped with _running db: Move nop_large_data_handler constructor out-of-line db: Move large_data_handler::stop out-of-line	2020-02-05 18:49:11 +02:00
Juliusz Stasiewicz	5127568cc4	cdc: cdc per-table options put into schema extensions With this patch, client tools (in particular cqlsh) get the access to cdc options and will be able to print them with `DESC TABLE`. Fixes #5589	2020-02-05 13:44:39 +01:00
Piotr Sarna	ee244a6d22	Merge 'Make it clear that memory_footprint_test has to be run with -c1' from Piotr This tests fails when run on more than 1 core. Tests: unit(dev) * hawk/fix_memory_footprint: memory_footprint_test: Make it clear it has to run with -c1 tests: move memory_footprint_test to perf/	2020-02-05 12:09:50 +01:00
Avi Kivity	31593e1451	Merge "Change token representation to int64_t" from Piotr " After deprecating partitioners other than Murmur3 we can change the representation of tokens to int64_t. This will allow setting custom partitioner on each table. With this change partitioners become just converters from partition keys to tokens (int64_t). Following operations are no longer dependant on partitioner implementation: - Tokens comparison - Tokens serialization/deserialization to strings - Tokens serialization/deserialization to bytes - Sharding logic - Random token generation This change will be followed by a PR that enables per table partitioner and then another PR that introduces a special partitioner for CDC tables. Tests: unit(dev) Results of memory footprint test: Differences: in cache: 992 vs 984 in memtable: 750 vs 742 sizeof(cache_entry) = 112 vs 104 -- sizeof(decorated_key) = 36 vs 32 MASTER: mutation footprint: in cache: 992 in memtable: 750 in sstable: 351 frozen: 540 canonical: 827 query result: 342 sizeof(cache_entry) = 112 -- sizeof(decorated_key) = 36 -- sizeof(cache_link_type) = 32 -- sizeof(mutation_partition) = 96 -- -- sizeof(_static_row) = 8 -- -- sizeof(_rows) = 24 -- -- sizeof(_row_tombstones) = 40 sizeof(rows_entry) = 232 sizeof(lru_link_type) = 16 sizeof(deletable_row) = 168 sizeof(row) = 112 sizeof(atomic_cell_or_collection) = 8 THIS PATCHSET: mutation footprint: in cache: 984 in memtable: 742 in sstable: 351 frozen: 540 canonical: 827 query result: 342 sizeof(cache_entry) = 104 -- sizeof(decorated_key) = 32 -- sizeof(cache_link_type) = 32 -- sizeof(mutation_partition) = 96 -- -- sizeof(_static_row) = 8 -- -- sizeof(_rows) = 24 -- -- sizeof(_row_tombstones) = 40 sizeof(rows_entry) = 232 sizeof(lru_link_type) = 16 sizeof(deletable_row) = 168 sizeof(row) = 112 sizeof(atomic_cell_or_collection) = 8 " * 'fixed_token_representation' of https://github.com/haaawk/scylla: (21 commits) token: cast to int64_t not long in long_token murmur3: move sharding logic to token and i_partitioner partitioner: move shard_of_minimum_token to token partitioner: remove token_to_bytes partitioner: move get_token_validator to token partitioner: merge tri_compare into dht::tri_compare partitioner: move describe_ownership to token partitioner: move from_bytes to token partitioner: move from_string to token partitioner: move to_sstring to token partitioner: move get_random_token to token partitioner: move midpoint function to token token: remove token_view sstables: use copy constructor for tokens token: change _data to int64_t partitioner: remove hash_large_token token: change data to array<uint8_t, 8> partitioner: Extract token to separate .hh and .cc files partitioner: remove unused functions Revert "dht/murmur3_partitioner: take private methods out of the class" ...	2020-02-05 12:21:02 +02:00
Piotr Jastrzebski	edd7398a0c	memory_footprint_test: Make it clear it has to run with -c1 The test fails when run on number of cores different than 1. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 10:22:32 +01:00
Piotr Jastrzebski	1a8fe4befd	tests: move memory_footprint_test to perf/ Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 10:18:28 +01:00
Piotr Jastrzebski	6d24f26ff7	token: cast to int64_t not long in long_token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	50cfe81331	murmur3: move sharding logic to token and i_partitioner Since token representation is fixed now, all the partitioners will share the sharding logic. It makes sense now to keep the logic in common super class and separate header that's included only in i_partitioner.cc. shard_of and token_for_next_shard are now implemented in i_partitioner. They would be non-virtual but we have to keep them virtual because one test is overriding them to enforce some specific sharding. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	7eab3024bd	partitioner: move shard_of_minimum_token to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	9c55e5be13	partitioner: remove token_to_bytes i_partitioner::token_to_bytes is just a call to token::data and does not depend on partitioner at all. It is possible to convert token to bytes without having access to partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	d4d55160f0	partitioner: move get_token_validator to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	2c630c5820	partitioner: merge tri_compare into dht::tri_compare Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	d0d8bfaf8c	partitioner: move describe_ownership to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	f845220445	partitioner: move from_bytes to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	8107d99e3d	partitioner: move from_string to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	03bdce2d68	partitioner: move to_sstring to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	9c202b52da	partitioner: move get_random_token to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	f42b1ee819	partitioner: move midpoint function to token Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	1d1ac476c3	token: remove token_view Now that both token and token_view contain int64_t it makes no sense to keep the view. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	06dfd16aad	sstables: use copy constructor for tokens instead of manually creating new token from another token internals. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	05e0451b27	token: change _data to int64_t Previously _data was stored as array of 8 bytes in network byte order. After this change it stores the same value in int64_t in host byte order. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	fea0187f55	partitioner: remove hash_large_token Now that token representation is always array<uint8_t, 8>, hash<dht::token> will always pick read_le<size_t>(reinterpret_cast<const char*>(b.data())) and never call hash_large_token because the check is always true b.size() == sizeof(size_t). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:31:32 +01:00
Piotr Jastrzebski	b569d127a0	token: change data to array<uint8_t, 8> It is save to do such change because we support only Murmur3Partitioner which uses only tokens that are 8 bytes long. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:30:46 +01:00
Piotr Jastrzebski	0da21c28ab	partitioner: Extract token to separate .hh and .cc files Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:18:24 +01:00
Piotr Jastrzebski	8bd9d3a69e	partitioner: remove unused functions Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:18:24 +01:00
Piotr Jastrzebski	d86548c06e	Revert "dht/murmur3_partitioner: take private methods out of the class" This patch conflicts with the following patches. The final effect is equivalent and it's easier to revert this patch and cleanly apply already reviewed patches. This reverts commit `f4f8593bac`.	2020-02-05 09:18:24 +01:00
Piotr Jastrzebski	08036fc511	murmur3_partitioner: get rid of static shard_of This will enable revert of a commit that creates conflicts with following patches. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 09:18:24 +01:00
Rafael Ávila de Espíndola	5d4671526c	db: Replace large_data_handler::_stopped with _running This is not just a direct flip to a variable with the negated Boolean value. When created, a large_data_handler is not considered to be running, the user has to call start() before it can be used. The advantaged of doing this is that if initialization fails and a database is destructed before the large_data_handler is started, the assert database::stop() { assert(!_large_data_handler->running()); is not triggered. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-04 21:15:44 -08:00
Rafael Ávila de Espíndola	33dfe34f78	db: Move nop_large_data_handler constructor out-of-line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-04 21:12:01 -08:00
Rafael Ávila de Espíndola	e99a225f25	db: Move large_data_handler::stop out-of-line Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-04 21:11:49 -08:00
Rafael Ávila de Espíndola	9eae0b57a3	test: Enable all experimental features in the cql_repl The cql repl will hopefully be used to write most new tests, so it should have all experimental features enabled. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200204173448.95892-1-espindola@scylladb.com>	2020-02-04 19:36:37 +02:00
Avi Kivity	7d70bfe20c	Merge "Lua: Fix handling of list<varint> and list<decimal>" from Rafael " This patch series fixes #5711, enables UDF support in CQL tests and and includes a few extra cleanups. " * 'espindola/lua-fixes' of https://github.com/espindola/scylla: lua: Use a negative index for consistency lua: Fix returning list<decimal> lua: Fix returning list<varint> lua: Use a lua_slice_state instead of a from_lua_visitor test: Enable UDF in the cql repl	2020-02-04 18:51:54 +02:00
Nadav Har'El	acafcbfdf4	alternator: use LWT timestamp, not current timestamp The DynamoDB API doesn't have the notion of client-supplied timestamps, so the server is supposed to use its own current timestamp for write operations. However, for LWT writes, we should not use this node's current time: Different nodes may slightly differ in their clocks, and LWT needs a monotonically-increasing notion of time for the consistent operations. LWT provides to the operation's apply() method the specific timestamp that it should use in its returned mutation - and we should use this timestamp, not the current timestamp. In the optional write modes where LWT is not used, we continue to use the current timestamp (api::new_timestamp()) as before. This patch fixes the PutItem, UpdateItem and DeleteItem operations. The BatchWriteItem operation is not yet fixed by this patch - fixing it will require more elaborate code changes so will be done in a separate patch. Refs #5653. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200130122853.7658-1-nyh@scylladb.com>	2020-02-04 10:18:49 +01:00
Nadav Har'El	0a23471eae	alternator: switch BatchWriteItems to use LWT too Today, we use LWT for all PutItem, UpdateItem and GetItem operations. We do this even for pure writes - writes which do not involve a read before the write). But BatchWriteItem also does pure writes - and it doesn't use LWT yet. So this patch changes it so it does. As before we keep in the code - not yet configurable by a user - also the option to do these unconditional writes without LWT. A BatchWriteItem may change multiple partitions (but a fairly low number - DynamoDB allows each BatchWriteItem to only do 25 updates) and we start the different LWT operations in parallel. This patch collects multiple mutations to the same partition together to be done with a single LWT operation, so we also add a test for this case, were we have a batch of writes involving several items in each of several partitions. Fixes #5637 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200128160538.11775-1-nyh@scylladb.com>	2020-02-04 10:08:18 +01:00
Rafael Ávila de Espíndola	6764316576	cql3: Simplify maybe_quote This produce code that is just as fast as the previous implementation and is quite a bit easier to read IMHO. I benchmarked it by temporally adding: BOOST_AUTO_TEST_CASE(bench_maybe_quote) { std::string val(1 << 20, 'x'); using clk = std::chrono::steady_clock; cql3::util::maybe_quote(val); auto start = clk::now(); for (int i = 0; i < 1000; ++i) { cql3::util::maybe_quote(val); } auto end = clk::now(); std::chrono::duration<double> duration = end - start; std::cout << "delta = " << duration.count() << '\n'; } Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200203225140.180262-1-espindola@scylladb.com>	2020-02-04 10:52:04 +02:00
Avi Kivity	cdecb21b78	Update seastar submodule * seastar 65980a9b30...30185fd901 (12): > sstring: resize: NulTerminate when downsizing > reactor: make open_flags::dsync respect --unsafe-bypass-fsync > json/json_elements: Use double quotes around element name > Revert "reactor: make open_flags::dsync respect --unsafe-bypass-fsync" > Merge "smp: reduce allocations in work_item::process" from Avi > task: optimize destruction by making destructor non-virtual > reactor: make open_flags::dsync respect --unsafe-bypass-fsync > Revert "sstring: resize: NulTerminate when downsizing" > sstring: resize: NulTerminate when downsizing > tests: Rename unix domain socket test for consistency > resource: downgrade cgroupsv2 message. > Merge "Simplify the stream/subscription implementation" from Rafael	2020-02-04 10:20:29 +02:00
Nadav Har'El	3de09042bb	CDC topology change support Merged pull request https://github.com/scylladb/scylla/pull/5485 by Kamil Braun: This series introduces the notion of CDC generations: sets of CDC streams used by the cluster to choose partition keys for CDC log writes. Each CDC generation begins operating at a specific time point, called the generation's timestamp (cdc_streams_timestamp in the code). It continues being used by all nodes in the cluster to generate log writes until superseded by a new generation. Generations are chosen so that CDC log writes are colocated with their corresponding base table writes, i.e. their partition keys (which are CDC stream identifiers picked from the generation operating at time of making the write) fall into the same vnode and shard as the corresponding base table write partition keys. Currently this is probabilistic and not 100% of log writes will be colocated - this will change in future commits, after per-table partitioners are implemented. CDC generations are a global property of the cluster -- they don't depend on any particular table's configuration. Therefore the old "CDC stream description tables", which were specific to each CDC-enabled table, were removed and replaced by a new, global description table inside the system_distributed keyspace. A new generation is introduced and supersedes the previous one whenever we insert new tokens into the token ring, which breaks the colocation property of the previous generation. The new generation is chosen to account for the new tokens and restore colocation. This happens when a new node joins the cluster. The joining node is responsible for creating and informing other nodes about the new CDC generation. It does that by serializing it and inserting into an internal distributed table ("CDC topology description table"). If it fails the insert, it fails the joining process. It then announces the generation to other nodes through gossip using the generation's timestamp, which is the partition key of the inserted distributed table entry. Nodes that learn about the new generation through gossip attempt to retrieve it from the distributed table. This might fail - for example, if the node is partitioned away from all replicas that hold this generation's table entry. In that case the node might stop accepting writes, since it knows that it should send log entries to a new generation of streams, but it doesn't know what the generation is. The node will keep trying to retrieve the data in the background until it succeeds or sees that it is no longer necessary (e.g., because yet another generation superseded this one). So we give up some availability to achieve safety. However, this solution is not completely safe (might break consistency properties): if a node learns about a new generation too late (if gossip doesn't reach this node in time), the node might send writes to the wrong (old) generation. In the future we will introduce a transaction-based approach where we will always make sure that all nodes receive the new generation before any of them starts using it (and if it's impossible e.g. due to a network partition, we will fail the bootstrap attempt). In practice, if the admin makes sure that the cluster works correctly before bootstrapping a new node, and a network partition doesn't start in the few seconds window where a new generation is announced, everything will work as it should. After the learning node retrieves the generation, it inserts it into an in-memory data structure called "CDC metadata". This structure is then used when performing writes to the CDC log -- given the timestamp of the written mutation, the data structure will return the CDC generation operating at this time point. CDC metadata might reject the query for two reasons: if the timestamp belongs to an earlier generation, which most probably doesn't have the colocation property anymore, or if it is picked too far away into the future, where we don't know if the current generation won't be superseded by a different one (so we don't yet know the set of streams that this log write should be sent to). If the client uses server-generated timestamps, the query will never be rejected. Clients can also use client-generated timestamps, but they must make sure that their clocks are not too desynchronized with the database -- otherwise some or all of their writes to CDC-enabled tables will be rejected. In the case of rolling upgrade, where we restart nodes that were previously running without CDC, we act a bit differently - there is no naturally selected joining node which must propose a new generation. We have to select such a node using other means. For this we use a bully approach: every node compares its host id with host ids of other nodes and if it finds that it has the greatest host id, it becomes responsible for creating the first generation. This change also fixes the way of choosing values of the "time" column of CDC log writes: the timeuuid is chosen in a way which preserves ordering of corresponding base table mutations (the timestamp of this timeuuid is equal to the base table mutation timestamp). Warning: if you were running a previous CDC version (without topology change support), make sure to disable CDC on all tables before performing the upgrade. This will drop the log data -- backup it if needed. TODO in future patchset: expire CDC generations. Currently, each inserted CDC generation will stay in the distributed tables forever (until manually removed by the administrator). When a generation is superseded, it should become "expired", and 24 hours after expiration, it should be removed. The distributed tables (cdc_topology_description and cdc_description) both have an "expired" column which can be used for this purpose. Unit tests: dev, debug, release dtests (dev): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/907/	2020-02-04 10:20:29 +02:00
Gleb Natapov	2876482373	lwt: account for cases where LWT request were moved to another shard in statistics Now that we bounce lwt requests to the correct shard before calling into storage_proxy the cross shard op accounting does not account for bounced lwt statement. Fix that by increasing corresponding counter when returning a "bounce" reply. Message-Id: <20200203122011.GH26048@scylladb.com>	2020-02-04 10:20:28 +02:00
Nadav Har'El	37f2f6112e	cql3::util::maybe_quote: avoid stack overflow and fix quote doubling Merged patch series from Benny Halevy: The function was reimplemented to solve the following issues. The cutom implementation also improved its performance in close to 19% Using regex_match("[a-z][a-z0-9_]") may cause stack overflow on long input strings as found with the limits_test.py:TestLimits.max_key_length_test dtest. std::regex_replace does not replace in-place so no doubling of quotes was actually done. Add unit test that reproduces the crash without this fix and tests various string patterns for correctness. Note that defining the regex with std::regex::optimize still ended up with stack overflow. Fixes #5671 cql3::util::maybe_quote: avoid stack overflow and fix quote doubling * cql3::util::maybe_quote: further optimize quote doubling	2020-02-04 10:20:28 +02:00
Nadav Har'El	6e91f159fe	LWT: handle bounce_to_shard result for batch statements Merged patch series from Gleb Natapov: Batch statement can also execute LWT and hence need to handle bounce_to_shard result. * transport: handle bounce_to_shard for batch statement * transport: consolidate bounce_to_shard handling between all three verbs that handle it	2020-02-04 10:20:28 +02:00
Takuya ASADA	1446fe930b	dist/redhat: install specified version of scylla-conf on meta package (#5599 ) To install specified version of scylla-conf package, we need to add it on Requires. Fixes #5639	2020-02-04 10:20:28 +02:00
Benny Halevy	f45fabab73	gossiper: do_stop_gossiping: copy live endpoints vector It can be resized asynchronously by mark_dead. Fixes #5701 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200203091344.229518-1-bhalevy@scylladb.com>	2020-02-04 10:20:28 +02:00
Avi Kivity	501b24cad3	test.py: use command line option in preference to environment variable when calling a test Command line options are printed out, so if a user cuts-and-pastes a command line they will get a run that is more similar to the one that the test executed. Message-Id: <20200202133209.209608-1-avi@scylladb.com>	2020-02-04 10:20:28 +02:00
Rafael Ávila de Espíndola	1294770970	lua: Use a negative index for consistency In this case we know the size of the stack and both indexes refer to the same position. Using a negative index is just more consistent with the rest of the file and hopefully a bit less brittle to future changes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 18:23:09 -08:00
Rafael Ávila de Espíndola	a4d668e8ed	lua: Fix returning list<decimal> We were accessing the wrong stack location if a decimal was not at top of the stack. Fixes: #5711 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 18:10:04 -08:00
Rafael Ávila de Espíndola	39e637f6bf	lua: Fix returning list<varint> We were accessing the wrong stack location if a varint was not at the top of the stack. Refs: #5711 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 18:09:59 -08:00
Rafael Ávila de Espíndola	530779efb6	lua: Use a lua_slice_state instead of a from_lua_visitor A few places were using a from_lua_visitor only to access the lua_slice_state member variable. This is just a simplification. No functionality changed. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 18:04:36 -08:00
Rafael Ávila de Espíndola	35023c831c	test: Enable UDF in the cql repl A followup commit will use this to write cql tests for UDF. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 17:58:27 -08:00
Gleb Natapov	9c75a25e9f	transport: consolidate bounce_to_shard handling between all three verbs that handle it All three verbs that need to handle bounce_to_shard have almost identical process_() and process__on_shard() functions. Consolidate them into one to reuse the code.	2020-02-03 14:27:50 +02:00
Gleb Natapov	dd793098fa	transport: handle bounce_to_shard for batch statement Batch statement can also execute LWT and hence need to handle bounce_to_shard result. Fixes: #5644	2020-02-03 14:27:30 +02:00
Pavel Emelyanov	8a7f13420f	gossiper: Avoid string merge-split for nothing The caller of check_knows_remote_features merges a set of features into a string, but the method in question ... splits them back into the set. Avoid this unneeded step and clean the respective storage service helpers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	ca55c6c15f	features: Stop on shutdown The service in question doesn't depend on anything, so it's started first and stopped last. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	f6f76ef8c1	storage_service: Remove helpers The storage_service no longers works as features provider. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	0e62d615ae	storage_service: Prepare to switch from on-board feature helpers There are some places that get global storage_service instance for individual features. In the next patch all these helpers will be removed, so here's the preparation for it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	0abddc4557	cql3: Check feature in .validate There's no local variable to get features from in the create_view_statement constructor, but since the .validate is always called after it, it looks safe to check for needed feature in it (we have storage_proxy there). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	abe588888d	database: Use feature service Keep local feature_service reference on database. This relaxes the circular storage_service <-> database reference, but not removes it completely. This needs some args tossing in apply_to_builder, but it's rather straightforward, so comes in the same patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	12c1378be0	storage_proxy: Use feature service Keep reference on local feature service from storage_proxy and use it in places that have (local) storage_proxy at hands. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	4f5b70dcb1	migration_manager: Use feature service This unties migration_manager from storage_service thus breaking the circular dependency between these two. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	74fd3466b5	start: Pass needed feature as argument into migrate_truncation_records As a nice side-effect this stops using global storage service instance by this function. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	aa6b1efc35	features: Unfriend storage_service The storage service no longer needs to mess with feature config. It only needs two features to register onself in, but this can be solved by respective cluster_supports_foo helpers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	9b67226715	features: Simplify feature registration Now features are registered into a map of vectors, but it looks like the vector is always 1-item long and is used to keep pointer on feature, instead of the feature itself. Switch it into map of reference_wrapper-s. Before this patch we could register more than one feature under the same name, now we can't. But this seems to be OK, as we don't actually do this. To catch violations of this restriction there's an assert() in the feature_service::register_feature. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	da6af8bde7	features: Introduce known_feature_set There are two masks -- supported and known. They differ in unbounded_range_tombstones one which is set depending on the sstables format in use. Since the feature_service doesn't know anything about sstables format, the logic is reverted -- the feature service reports back the known mask (all features) and storage_service clears the unbounded_range_tombstones if the sst format is low -- but is (hopefully) left intact. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	4a01f468dd	features: Move disabled features set from storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	a5b1998247	features: Move schema_features helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	b0638606e5	features: Move all features from storage_service to feature_service And leave some temporary storage_service->feature links. The plan is to make every subsystem that needs storage_service for features stop doing so and switch on the feature_service. The feature_service is the service w/o any dependencies, it will be freed last, thus making the service dependency tree be a tree, not a graph with loops. While at it -- make all const-s not have _FEATURE suffix (now there are both options) and const-qualify cluster_supports_lwt(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	49de3b4ad8	storage_service: Use feature_config from _feature_service This makes the testing/prod config logic much simpler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	052259f8ef	features: Add feature_config Some features take db::config to find out whether to be enabled or disabled. This creates unwanted dependency between database and features, so split the features configuration explicitly. Also this will make the "this is for testing env only" logic cleaner and simpler to understand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	d38f8ca52a	storage_service: Kill set_disabled_features The _disabled_features is configured by tests via storage_service constructor, so the helper in question is effectively useless. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Pavel Emelyanov	76a7fd4186	gms: Move features stuff into own .cc file Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:21 +03:00
Kamil Braun	4b3754ff94	docs: add documentation about CDC generations	2020-02-03 10:57:31 +01:00
Kamil Braun	b130b76274	test: disable CDC flag by default When CDC flag is on, the node startup procedure takes a few seconds longer (we have to generate CDC streams). This is not necessary in non-CDC tests.	2020-02-03 10:57:31 +01:00
Kamil Braun	0d41e2c1fe	test: add cdc::generate_timeuuid tests	2020-02-03 10:57:31 +01:00
Kamil Braun	5fb5925fb4	test: add cdc::find_timestamp tests	2020-02-03 10:57:31 +01:00
Kamil Braun	7cb6ac33f5	storage_service: check if we know other nodes' tokens when joining ring If we are a seed node (but not the only one) or we set auto_bootstrap=off, it might happen due to misconfiguration or a network partition that we don't know other nodes' tokens at the end of the join_token_ring function, when we go into the NORMAL status, finishing the joining process. CDC however requires that we know other nodes' tokens at this point: we need them to correctly create a new CDC generation. This commit adds a check which prevents the node from starting if that's not the case. If the check fails, the node first tries waiting a bit until it learns about the tokens or timeouts.	2020-02-03 10:57:28 +01:00
Pavel Emelyanov	7a2123c8dc	migration_manager: Move some fns into class These methods will need to have this-> in one of the next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 12:29:54 +03:00
Avi Kivity	2816404f57	test.py: documented exit code value Document our chosen exit failure code value and its relationship to git bisect. Message-Id: <20200202134223.210578-1-avi@scylladb.com>	2020-02-03 00:58:58 +02:00
Avi Kivity	541893e69a	Merge "Fix conversion of lua nil to cql null" from Rafael " The fix itself is fairly simple, but looking at the code I found that our code base was not cleanly distinguishing null and empty values and was treating null and missing values differently, but that distinction was dead since a null is represented as a dead cell. " * 'espindola/lua-fix-null-v6' of https://github.com/espindola/scylla: lua: Handle nil returns correctly types: Return bytes_opt from data_value::serialize query-result-set: Assert that we don't have null values types: Fix comparison of empty and null data_values Revert "tests: Handle null and not present values differently" query-result-set: Avoid a copy during construction types: Move operator== for data_value out-of-line	2020-02-02 15:43:24 +02:00
Avi Kivity	c8890eb124	Merge "Simplify usage of stream subscriptions" from Rafael " In a few places, the only use we had for a subscription was calling done(). With this series we now call done() early and store the future<> instead. " * 'espindola/stream-cleanup' of https://github.com/espindola/scylla: sstable_test: Store a future<> instead of a subscription commitlog: Store a future instead of a subscription in db::commitlog::segment_manager::list_descriptors::helper lister: Store a future<> instead of a subscription	2020-02-02 14:49:00 +02:00
Rafael Ávila de Espíndola	5dfb658e77	build: Add two missing dependencies With this change we always rebuild seastar/libseastar_testing.a for the same reason we always rebuild seastar/libseastar.a: We have no idea what its dependencies are, we have to recurse to seastar to find out. The other missing dependency is that we have to rebuild build.ninja when seastar/CMakeLists.txt changes. A change in seastar/CMakeLists.txt can cause seastar.pc to change which can change the command lines used. That is incomplete as change other seastar files can have the same impact, but it is better than nothing. It is not sufficient to put a dependency in the seastar.pc file as that file will be modified when cmake is run and the scylla ninja process doesn't see the CMakeLists.txt to seastar.pc edge. Fixes: #5687 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200201001126.458992-1-espindola@scylladb.com>	2020-02-01 21:08:26 +02:00
Pavel Emelyanov	4839ca8491	storage_service: Unregister from gossiper notifications ... at all This unregistration doesn't happen currently, but doesn't seem to cause any problems in general, as on stop gossiper is stopped and nothing from it hits the store_service. However (!) if an exception pops up between the storage_service is subscribed on gossiper and the drain_on_shutdown defer action is set up then we _may_ get into the following situation: - main's stuff gets unrolled back - gossiper is not stopped (drain_on_shutdown defer is not set up) - migration manager is stopped (with deferred action in main) - a nitification comes from gossiper -> storage_service::on_change might want to pull schema with the help of local migration manager -> assert(local_is_initialized) strikes Fix this by registering storage_service to gossiper a bit earlier (both are already initialized y that time) and setting up unregister defer right afterwards. Test: unit(dev), manual start-stop Bug: #5628 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200130190343.25656-1-xemul@scylladb.com>	2020-01-31 14:02:18 +01:00
Avi Kivity	ec5b721db7	test: make eventually() more patient We use eventually() in tests to wait for eventually consistent data to become consistent. However, we see spurious failures indicating that we wait too little. Increasing the timeout has a negative side effect in that tests that fail will now take longer to do so. However, this negative side effect is negligible to false-positive failures, since they throw away large test efforts and sometimes require a person to investigate the problem, only to conclude it is a false positive. This patch therefore makes eventually() more patient, by a factor of 32. Fixes #4707. Message-Id: <20200130162745.45569-1-avi@scylladb.com>	2020-01-31 14:02:18 +01:00
Dejan Mircevski	6661ed7de4	cql3: Drop restrictions::values() method No-one seems to invoke this method. Instead, clients invoke restriction::values (note singular "restriction"). Most subclasses of restrictions also inherit from restriction, so values() still exists in their public interface. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-01-31 13:05:51 +01:00
Avi Kivity	985e00efa6	Merge "Fix the serialization of negative varint values" from Rafael " Benny pointed out that we could avoid a branch inside a loop is the old serialization code. That got me looking at the logic and I found that it would also produce an unnecessary 0xff prefix for some negative numbers. This patch series fixes the serialization and optimizes it. It now does no extra copies for positives numbers and only one extra copy for negative numbers, which I think is optimal since cpp_int uses sign magnitude and we want the 2 complement representation. " * 'espindola/serialize_varint-improvements-v2' of https://github.com/espindola/scylla: types: Use a fancy iterator to avoid a temporary buffer types: Use export_bits to serialize cpp_int types: Avoid a branch in a loop types: Fix encoding of negative varint types: Replace "num.sign() < 0" with "num < 0"	2020-01-30 20:35:54 +02:00
Rafael Ávila de Espíndola	cc81ba3432	types: Use a fancy iterator to avoid a temporary buffer By using a fancy iterator we can avoid calling export_bits with a temporary buffer before copying the result to the output. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:26:39 -08:00
Rafael Ávila de Espíndola	7e67ce0bdb	types: Use export_bits to serialize cpp_int This avoid a copy when serializing positive numbers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:26:39 -08:00
Rafael Ávila de Espíndola	27a67f1a2c	types: Avoid a branch in a loop Thanks to Benny for the suggestion. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:26:39 -08:00
Rafael Ávila de Espíndola	c89c90d07f	types: Fix encoding of negative varint We would sometimes produce an unnecessary extra 0xff prefix byte. The new encoding matches what cassandra does. This was both a efficiency and correctness issue, as using varint in a key could produce different tokens. Fixes #5656 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:25:09 -08:00
Rafael Ávila de Espíndola	ed747122aa	types: Replace "num.sign() < 0" with "num < 0" Surprisingly, this produces better code with cpp_int. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:24:03 -08:00
Rafael Ávila de Espíndola	cc9495d4d3	sstable_test: Store a future<> instead of a subscription The only use we had for the subscription was calling done, may as well call it early and store the future<>. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 08:31:28 -08:00
Rafael Ávila de Espíndola	da984f1f33	commitlog: Store a future instead of a subscription in db::commitlog::segment_manager::list_descriptors::helper The only use we had for the subscription was calling done, may as well call it early and store the future<>. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 08:31:28 -08:00
Rafael Ávila de Espíndola	b88f6edee0	lister: Store a future<> instead of a subscription The only use we had for the subscription was calling done, may as well call it early and store the future<>. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 08:31:28 -08:00
Gleb Natapov	b08679e1d3	db/system_keyspace: use user memory limits for local.paxos table Treat writes to local.paxos as user memory, as the number of writes is dependent on the amount of user data written with LWT. Fixes #5682 Message-Id: <20200130150048.GW26048@scylladb.com>	2020-01-30 17:07:27 +02:00
Piotr Sarna	b783d40aaf	Merge 'Add per scheduling groups statistics' from Eliran This set implements support for per scheduling group statistics in storage proxy and tables view statistics (although tables view per scheduling group stats are not actively applied in this series). Having those statistics per scheduling group can help in finding operations that are performed outside their context, another advantage is that it lays the land for supporting per service level statistics for the workload prioritization enterprise feature. At some point there was a thought to add those stats per role but for now it is not feasible at the moment: 1. The number of roles/user is unbounded so it is dangerous to hold stats (in memory) for all of them. 2. We will need a proper design of how to deal with the hierarchical nature of roles in the stats. Besides these reasons and regardless, it is beneficial to look on resource related stats per scheduling group, looking at resources per user or role will not necessarily give insights since resources are divided per sg and not role, so it can lead to false conclusions if more than one role is attached to the same service level. Tests: unit tests (Dev, Debug) validating the stats with monitor * es/per_sg_stats/v6: storage proxy: migrate to per scheduling group statistics internalize storage proxy statistics metric registration	2020-01-30 15:02:33 +01:00
Eliran Sinvani	971711a546	storage proxy: migrate to per scheduling group statistics This commit builds on top of the introduced per scheduling group statistics template and employs it for achieving a per scheduling group statistics in storage_proxy. Some of the statistics also had meaning as a global - per shard one. Those are the ones for determining if to throttle the write request. This was handled by creating a global stats struct that will hold those stats and by changing the stat update to also include the global one. One point that complicated it is an already existing aggregation over the per shard stats that now became a per scheduling group per shard stats, converting the aggregation to a two-dimensional aggregation. One thing this commit doesn't handle is validating that an individual statistic didn't "cross a scheduling group boundary", such validation is possible but it can easily be added in the future. There is a subtlety to doing so since if the operation did cross to other scheduling group two connected statistics can lose balance for example written bytes and completed write transactions. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2020-01-30 15:01:44 +01:00
Eliran Sinvani	8cfc2aad57	internalize storage proxy statistics metric registration The storage proxy statistics structure did not contain a method for registering the statistics for metric groups, instead, each user had to register some of the metrics by itself. There is no real reason for separating the metrics registration from the statistics data. There is even less justification for doing this only for part of the stats as is the case for those statistics. This commit internalize the metrics registration in the storage_proxy stats structures. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2020-01-30 15:01:40 +01:00
Gleb Natapov	c138dfd33e	lwt: introduce LWT gossiper feature Do not allow lwt operation if LWT is not enabled by entire cluster. Message-Id: <20200130120912.GV26048@scylladb.com>	2020-01-30 15:12:56 +02:00
Benny Halevy	606db0d412	cql3::util::maybe_quote: further optimize quote doubling Avoid string copies when doubling quotes in the string by counting them when scanning the input string and reserving the required space when making the result std::string. This showed a performance improvement of ~1.8% when running the maybe_quote unit test in tight loop (w/ the shorter strings only) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-30 14:55:51 +02:00
Rafael Ávila de Espíndola	a16cb00719	configure: Don't use -Wno-error when building seastar This depends on the recent patches to avoid warnings in seastar. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200127210833.200410-1-espindola@scylladb.com>	2020-01-30 14:10:18 +02:00
Avi Kivity	09e2556541	Update seastar submodule * seastar 44cf127ee9...65980a9b30 (2): > io_tester: fix the fix for lack of file closing > cmake: Disable broken gcc warning -Warray-bounds	2020-01-30 14:10:18 +02:00
Avi Kivity	b01f0cab60	utils: add missing include for ssize_t gcc 10 tightened its C++ includes to no longer provide ssize_t, so we must get it from a C header instead. Message-Id: <20200129205912.21139-1-avi@scylladb.com>	2020-01-30 14:10:18 +02:00
Avi Kivity	adb64dc72f	treewide: tighten concepts syntax gcc 10 requires a semicolon after every compound requirement, as per the standard. Add missing semicolons where necessary. Message-Id: <20200129205805.20928-1-avi@scylladb.com>	2020-01-30 14:10:18 +02:00
Rafael Ávila de Espíndola	4b4efcf302	types: Remove collection_type_impl::serialize The rest of the serialize api has been devirtualized some time ago, but this auxiliary function stayed virtual. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200129203916.20460-1-espindola@scylladb.com>	2020-01-30 14:10:18 +02:00
Kamil Braun	bd42b10df1	cdc: rename cdc/cdc.{hh,cc} to cdc/log.{hh,cc} To increase modularity, making it easier to find what is where and maintain. The 'log' module (cdc/log.{hh,cc}) is responsible for updating CDC log tables when base table writes are performed. The 'generation' module (cdc/generation.{hh,cc}) handles stream generation changes in response to topology change events. cdc/metadata.{hh,cc} contains a helper class which holds the currently used generation of streams. It is used by both aforementioned modules: 'log' queries it, while 'generation' updates it.	2020-01-30 11:10:39 +01:00
Kamil Braun	1a56310687	locator: remove get_shard_count and get_ignore_msb_bits from snitch Snitch forms a class hierarchy which get_shard_count and get_ignore_msb_bits ignore (their returned values only depend on the gossiper's state). Besides, these functions just don't belong there. Snitch has nothing to do with shard_count or ignore_msb_bits.	2020-01-30 11:10:08 +01:00
Kamil Braun	e91af78cf5	cdc: update streams description table Inform CDC users about newly generated streams.	2020-01-30 11:10:08 +01:00
Kamil Braun	cbe510d1b8	cdc: use stream generations Change the CDC code to use the global CDC stream generations. The per-base-table CDC description table was removed. The code instead uses cdc::metadata which is updated on gossip events. The per-table description tables were replaced by a global description table to be used by clients when searching for streams.	2020-01-30 11:10:08 +01:00
Kamil Braun	8f4a2ba0b9	storage_service: learn about CDC stream generations. When a node learns that another node joins the cluster (or begins the joining process, i.e. bootstrap), it will read the CDC generation timestamp proposed by that node, use it to retrieve the generation from the distributed generations table, and save it in its local generation queue to be used for writing to the CDC log when its local clock crosses the generation's timestamp. The CDC generation is saved in the queue before tokens are saved in token_metadata. This is important so that when the node becomes a coordinator of a write, it will already have all the necessary information required to generate a corresponding CDC log mutation. After joining, nodes should keep gossiping their proposed stream generation timestamps forever, until they learn about a newer timestamp, in which case they'll start gossiping the new timestamp. There is one case where a node won't gossip such any generation timestamp: if it's upgrading from a non-CDC version. In this situation we make one of the nodes begin the first generation.	2020-01-30 11:10:08 +01:00
Kamil Braun	834c2ca997	cdc: add cdc::metadata class The class stores a queue of CDC generations to be used for choosing streams when writing to the CDC log. This data structure will be updated on some gossip events (when a new node joins the cluster and proposes a new generation of CDC streams).	2020-01-30 11:10:08 +01:00
Kamil Braun	86af2a63ec	clocks: add printing functions For debugging and logging.	2020-01-30 11:10:08 +01:00
Kamil Braun	34e4ce275d	storage_service: restore CDC streams timestamp when replacing a node When a node is replacing another node it will keep gossiping its CDC streams generation timestamp.	2020-01-30 11:10:08 +01:00
Kamil Braun	a6e62dba95	cdc: add get_streams_timestamp_for(endpoint) method In future commits this will be used by nodes learning about other nodes entering NORMAL status. The joining node proposes a new generation of streams, whose timestamp is gossiped by the node.	2020-01-30 11:10:08 +01:00
Kamil Braun	37ae37db38	storage_service: move get_application_state_value method to gossiper	2020-01-30 11:10:08 +01:00
Kamil Braun	b44c63a127	storage_service: small refactors in prepare_replacement_info	2020-01-30 11:10:08 +01:00
Kamil Braun	32f4489a18	storage_service: generate CDC streams generation and gossip its timestamp. Generate a new generation of streams during bootstrap, insert it into an internal distributed table for other nodes to read and save its timestamp in the system.local table. When restarting, read the generation timestamp from the system.local table. Gossip the generation timestamp.	2020-01-30 11:10:08 +01:00
Kamil Braun	19f23c6de1	cdc: add cdc-related node startup functions	2020-01-30 11:10:08 +01:00
Kamil Braun	96e5d6c924	token_metadata: add count_normal_token_owners method	2020-01-30 11:10:08 +01:00
Kamil Braun	52d71832f8	gossiper: make some methods const	2020-01-30 11:10:08 +01:00
Kamil Braun	3ae7b6cbc4	versioned_value: add cdc_streams_timestamp This will be used to inform other nodes that a new CDC streams generation has been created.	2020-01-30 11:10:08 +01:00
Kamil Braun	7fa30f6f34	db: add a system.cdc_local table with CDC generation timestamp This will be used to persist CDC streams generation timestamp proposed by a joining node in case the node crashes or restarts, similarly to the way tokens are persisted. The get_saved_cdc_streams_timestamp method retrieves the generation timestamp from the system table. It will be used by a restarting node. The update_cdc_streams_timestamp method saves CDC stream generation timestamp of the calling node in the system table. A joining node will persist the timestamp before it proposes it to other nodes.	2020-01-30 11:10:08 +01:00
Piotr Jastrzebski	04fe18de0f	system_distributed_keyspace: add cdc-related tables The cdc_topology_description table will be used internally by nodes to send new CDC stream generations to other nodes. The cdc_description table is a user-facing table, used to inform users about new sets of CDC streams. Regenerate sstables and digests for schema_change_test. We don't need to protect this change by a schema feature: when a node creates these tables, it announces them to all other nodes. If schema agreement happens before this migration, all nodes will use a digest calculated without these tables. If it happens after, then all nodes will eventually know about these tables and use a digest calculated with these tables.	2020-01-30 11:10:08 +01:00
Piotr Jastrzebski	9fa18c03c1	cdc: add generate_topology_description cdc::topology_description describes a mapping of tokens to CDC streams. The cdc::generate_topology_description function is given: 1. a set of tokens which split the token ring into token ranges (vnodes), 2. information on how each token range is distributed among its owning node's shards and tries to generate a set of CDC stream identifiers such that for each shard and vnode pair there exists a stream whose token falls into this vnode and is owned by this shard. It then builds a cdc::topology_description which maps tokens to these found stream identifiers, such that if token T is owned by shard S in vnode V, it gets mapped to the stream identifier generated for (S, V).	2020-01-30 11:10:07 +01:00
Piotr Jastrzebski	a3748f942e	cdc: add topology_description class This is a class that will be used for storing information required to perform CDC operations, i.e. assignment of token ranges to CDC streams. It is serializable to bytes and will be stored in such a form in a distributed table accessible by all nodes.	2020-01-30 11:10:07 +01:00
Kamil Braun	36ee36618a	dht: add i_partitioner::shard_of(token, shard_count, ignore_msb) method Allows calculating the shard of the given token using custom values of shard_count and sharding_ignore_msb (instead of the ones used by the particular partitioner instance).	2020-01-30 11:10:07 +01:00
Kamil Braun	f4f8593bac	dht/murmur3_partitioner: take private methods out of the class The methods were made static functions of the murmur3_partitioner module.	2020-01-30 11:09:48 +01:00
Benny Halevy	0329fe1fd1	cql3::util::maybe_quote: avoid stack overflow and fix quote doubling The function was reimplemented to solve the following issues. The cutom implementation also improved its performance in close to 19% Using regex_match("[a-z][a-z0-9_]*") may cause stack overflow on long input strings as found with the limits_test.py:TestLimits.max_key_length_test dtest. std::regex_replace does not replace in-place so no doubling of quotes was actually done. Add unit test that reproduces the crash without this fix and tests various string patterns for correctness. Note that defining the regex with std::regex::optimize still ended up with stack overflow. Fixes #5671 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-30 12:00:30 +02:00
Rafael Ávila de Espíndola	e4b8f52237	commitlog: Simplify the return of read_log_file This function really just wants to signal it is done, so return a future<>. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200128172847.31513-1-espindola@scylladb.com>	2020-01-30 12:00:29 +02:00
Gleb Natapov	67deab0661	test: fix cql_repl to be able to run lwt tests on smp Handle bounce_to_shard result properly in cql_repl. Message-Id: <20200129122547.GO26048@scylladb.com>	2020-01-30 11:37:27 +02:00
Konstantin Osipov	4d3423b983	test.py: add a help file Message-Id: <20200128210426.24509-2-kostja@scylladb.com>	2020-01-30 11:05:02 +02:00
Avi Kivity	5842833d62	test.py: change test failure exit code to be more friendly to git bisect test.py returns -1 on failure; exit() translates that to 255, which git bisect interprets as a special exit code requiring manual intervention. Change to return the more traditional 1 on failure, which git bisect can interpret as a normal failure condition. Message-Id: <20200130084950.4186598-1-avi@scylladb.com>	2020-01-30 11:02:22 +02:00
Rafael Ávila de Espíndola	090164791c	logalloc: Store unused ids in a std::vector There doesn't seem to be any requirement for how unused ids are reused, so we may as well use the simpler type. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200129211154.47907-1-espindola@scylladb.com>	2020-01-30 10:31:16 +02:00
Rafael Ávila de Espíndola	bd7593eab3	lua: Handle nil returns correctly With this patch lua nil values are mapped to CQL null values instead of producing an error. Fixes #5667 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 14:05:01 -08:00
Rafael Ávila de Espíndola	bd93a0af52	types: Return bytes_opt from data_value::serialize Since a data_value can contain a null value, returning bytes from serialize() was losing information as it was mapping null to empty. This also introduces a serialize_nonnull that still returns bytes, but results in an internal error if called with a null value. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 14:04:59 -08:00
Avi Kivity	5137b596f8	build_id: add missing include for assert() build_id.cc uses assert() but doesn't include the header. Reviewed-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200129205515.20406-1-avi@scylladb.com>	2020-01-29 23:44:50 +02:00
Rafael Ávila de Espíndola	2b45edd97e	query-result-set: Assert that we don't have null values Null values are represented with dead cells and never included in a result_set. To enforce that, this adds a non_null_data_value that wraps a data_value and whose constructor calls on_internal_error if a null data_value is passed. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 13:24:10 -08:00
Rafael Ávila de Espíndola	3abac35d9f	types: Fix comparison of empty and null data_values Before this patch a null data_value would compare equal to any data_value that serialized to an empty byte sequence. With this patch null only compares equal to null. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 13:24:10 -08:00
Rafael Ávila de Espíndola	9031294ea9	Revert "tests: Handle null and not present values differently" This reverts commit `2ebd1463b2`. The test introduced by that commit was wrong, and in fact depended on a bug in operator== for data_value. A followup patch fixes operator==, so this reverts the broken commit first. The reason it was broken was that it created a live cell with a null data_value. In reality, null values are represented with dead cells. For example, the sstable produced by CREATE TABLE my_table (key int PRIMARY KEY, v1 int, v2 int) with compression = {'sstable_compression': ''}; INSERT INTO my_table (key, v1, v2) VALUES (1, 42, null); Is 00 04 key_length 00 00 00 01 key 7f ff ff ff local_deletion_time 80 00 00 00 00 00 00 00 marked_for_delete_at 24 HAS_ALL_COLUMNS \| HAS_TIMESTAMP 09 row_body_size 12 prev_unfiltered_size 00 delta_timestamp 08 USE_ROW_TIMESTAMP_MASK 00 00 00 2a value 0d USE_ROW_TIMESTAMP_MASK \| HAS_EMPTY_VALUE_MASK \| IS_DELETED_MASK 00 deletion time 01 END_OF_PARTITION Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 13:24:10 -08:00
Rafael Ávila de Espíndola	66290c3bb9	query-result-set: Avoid a copy during construction No functionality change. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 13:24:10 -08:00
Rafael Ávila de Espíndola	02e8e8d6b3	types: Move operator== for data_value out-of-line Most of the work is done by decompose and compare which are out-of-line anyway. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-29 13:24:10 -08:00
Piotr Sarna	d13492485f	alternator: restore Python2 compatibility for test_tag ... by explicitly declaring utf-8 encoding. Message-Id: <e99789876176cf722ccfc297621338dc93843588.1580301449.git.sarna@scylladb.com>	2020-01-29 18:11:47 +02:00
Nadav Har'El	ce0c9c1044	merge: add tagging to alternator Merged patch series from Piotr Sarna: This series adds the following to alternator: - TagResource request - UntagResource request - ListTagsOfResource request - Honoring "Tags" parameter in CreateTable It also provides more tests for above features and extended docs. Tagging is backed by a schema extension, which is in turn backed by entries in system_schema.tables.extensions map. Tags are considered part of the schema, and in particular they are updated via an equivalent of: ALTER TABLE table WITH scylla_tags = {'key1':'v1', 'key2':'v2'} Each tag change is therefore a schema change, which also means that editing tags for the same table on different nodes may be subject to races, until the schema agreement issues are resolved in Scylla. Fixes #5066 Tests: alternator-test(local, remote) Piotr Sarna (6): alternator,main: add tags schema extension alternator: add creating values from string views alternator: implement tagging alternator: allow tagging on table creation docs: add entries for alternator tags and arn alternator-test: make test tables case sensitive alternator-test/test_tag.py \| 63 ++++++++++- alternator-test/util.py \| 2 +- alternator/executor.cc \| 191 ++++++++++++++++++++++++++++++++-- alternator/executor.hh \| 3 + alternator/rjson.cc \| 4 + alternator/rjson.hh \| 1 + alternator/server.cc \| 3 + alternator/tags_extension.hh \| 52 +++++++++ docs/alternator/alternator.md \| 14 ++- main.cc \| 5 + 10 files changed, 325 insertions(+), 13 deletions(-) create mode 100644 alternator/tags_extension.hh	2020-01-29 18:11:47 +02:00
Botond Dénes	69f606baa0	database: check timout before applying writes Attempting to apply timed-out writes is a wasted effort. The coordinator have already given up on the write and reported it as failed to the client. Any cycles spent on this write is a waste at this point. We currently only check the timeout if the write is blocked on memory, otherwise, if the system is not under pressure, we will happily apply timed out writes. If the system is under pressure we will make it worse by wasting cycles on processing a timed out write. Prevent this by checking the timeout as early as possible in `database::apply()` and `database::apply_counter_update()`. This patch doesn't solve all our problems related to timed out writes. They can still sit and accumulate in various queues without expiring, a prominent example being the smp queues. It is however a good first step towards reducing wasted effort spent on them. Refs: #5055 Ref #5251 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200129093007.550250-1-bdenes@scylladb.com>	2020-01-29 13:08:43 +02:00
Gleb Natapov	c654ffe34b	commitlog: fix flushing an entry marked as "sync" in periodic mode After `546556b71b` we can have mixed writes into commitlog, some do flush immediately some do not. If non flushing write races with flushing one and becomes responsible for writing back its buffer into a file flush will be skipped which will cause assert in batch_cycle() to trigger since flush position will not be advanced. Fix that by checking that flush was skipped and in this case flush explicitly our file position. Fixes #5670 Message-Id: <20200128145103.GI26048@scylladb.com>	2020-01-29 12:58:25 +02:00
Piotr Sarna	93d8612a49	alternator-test: make test tables case sensitive In order to test case sensitivity, test table names now contain a capital letter.	2020-01-29 10:21:35 +01:00
Piotr Sarna	f8c1c82149	docs: add entries for alternator tags and arn Support for tagging and arn was added already, so the documentation is properly extended.	2020-01-29 10:20:05 +01:00
Piotr Sarna	668e15643d	alternator: allow tagging on table creation During table creation, it's now possible to provide a 'Tags' parameter, which will add tags to a newly created table. Note that creating a table and tagging it is not atomic, so in case of failure it's possible to end up with a created table, but without appropriate tags. This commit comes with a test. Message-Id: <00c2e202e9075d2c61e4ee5ba322ff4d5dbe718c.1579618972.git.sarna@scylladb.com>	2020-01-29 10:20:05 +01:00
Piotr Sarna	4c9f2f3c0a	alternator: implement tagging The following requests are implemented: - TagResource - UntagResource - ListTagsOfResource Also, more tests are added for validating inputs, for both arns, tag values and tag keys. Message-Id: <a7ce9534ca580736fea445813fafef75a6139e29.1579618972.git.sarna@scylladb.com>	2020-01-29 10:20:05 +01:00
Piotr Sarna	ea04b7fb04	alternator: add creating values from string views An additional override for rjson::from_string() is added for a std::string_view type. Message-Id: <3552ac3347b6a79dd22ca1215c831808450b1ef8.1579618972.git.sarna@scylladb.com>	2020-01-29 10:20:05 +01:00
Piotr Sarna	16688efad7	alternator,main: add tags schema extension A schema extension is introduced for alternator - tags. This schema extension can be used to store arbitrary tags for a table, in the form of a map<text, text>. Updating tags for a table is equivalent to the following CQL query: ALTER TABLE table WITH scylla_tags = {'key1':'v1', 'key2':'v2'} The extension, as all other extensions, is backed by the entry in the system_schema.tables table.	2020-01-29 10:20:05 +01:00
Pavel Solodovnikov	f2feeb4b10	cql3: Propagate "const" to some virtual methods in cql hierarchy Add "const" attributes to `assignment_testable::test_assignment` and `term::raw::prepare` methods. These should have been marked as "const" even before the change but for some reason were missing these qualifiers. Mark other supplementary methods with "const" attributes as necessary. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200127213215.494000-1-pa.solodovnikov@scylladb.com>	2020-01-29 00:23:40 +02:00
Avi Kivity	3343baf159	Merge "cql3: time_uuid_fcts: validate time UUID" from Benny " Throw an error in case we hit an invalid time UUID rather than hitting an assert. Fixes #5552 (Ref #5588 that was dequeued and fixed here) Test: UUID_test, cql_query_test(debug) " * 'validate-time-uuid' of https://github.com/bhalevy/scylla: cql3: abstract_function_selector: provide assignment_testable_source_context test: cql_query_test: add time uuid validation tests cql3: time_uuid_fcts: validate timestamp arg cql3: make_max_timeuuid_fct: delete outdated FIXME comment cql3: time_uuid_fcts: validate time UUID test: UUID_test: add tests for time uuid utils: UUID: create_time assert nanos_since validity utils/UUID_gen: make_nanos_since utils: UUID: assert UUID.is_timestamp	2020-01-29 00:11:17 +02:00
Avi Kivity	ec1687e4fe	Merge "Remove deprecated partitioners #5636 " from Piotr " This PR makes named_value respect allowed_values and then use it to transition away from old deprecated RandomPartitioner and ByteOrderedPartitioner. Then it removes the code that's no longer used. We want to remove deprecated partitioners because, on one hand, they lead to performance problems and hot nodes. Moreover, we're planning to unify the token representation which would allow per table partitioner support. That, in turn, is a feature helpful in multiple efforts like CDC, materialized views, secondary indexes and multi-tenancy. tests: unit(dev) " * 'remove_deprecated_partitioners' of https://github.com/haaawk/scylla: partitioners: remove random_partitioner partitioners: Make it impossible to use RandomPartitioner partitioners: remove byte_ordered_partitioner partitioners: Make it impossible to use ByteOrderedPartitioner partitioners: Remove leftovers of OrderPreservingPartitioner i_partitioner.cc: stop including byte_ordered_partitioner.hh i_partitioner.cc: stop including random_partitioner.hh config: use allowed_values to verify named_value input config: add operator<< for seed_provider_type	2020-01-29 00:11:17 +02:00
Avi Kivity	652d8a9b84	install-dependencies.sh: add lld Since we now default to lld if present, and since lld is a faster linker than either ld or gold, it makes sense to install it as a dependency and to make it available as part of the frozen toolchain.	2020-01-29 00:11:17 +02:00
Avi Kivity	17eaf552f0	Merge "Improve the accuracy of reader memory tracking" from Botond " Grab the lowest hanging fruits. This patch-set makes three important changes: * Consume the memory for I/O operations on tracked files, before they are forwarded to the underlying file. * Track memory consumed by buffers created for parsing in `continuous_data_consumer`. As this is the basis for the data, index and promoted index parsers, all three are covered now in this regard. * Track the index file. The remaining, not-so-low handing fruits in order of gain/cost(performance) ratio: * Track in-memory index lists. * Track in-memory promoted index blocks. * Track reader buffer memory. Note that this ordering might change based on the workload and other environmental factors. Also included in this series is an infrastructure refactoring to make tracking memory easier and involve including lighter headers, as well as a manual test designed to allow testing and experimenting with the effects of changes to the accuracy of the tracking of reader memory consumption. Refs: #4176 Refs: #2778 Tests: unit(dev), manual(sstable_scan_footprint_test) The latter was run as: build/dev/test/manual/sstable_scan_footprint_test -c1 -m2G --reads=4000 --read-concurrency=1 --logger-log-level test=trace --collect-stats --stats-period-ms=20 This will trickle reads until the semaphore blocks, then wait until the wait queue drains before sending new reads. This way we are not testing the effectiveness of the pre-admission estimation (which is terribly optimistic) and instead check that with slowly ramping up read load the semaphore will block on memory preventing OOM. This now runs to completion without a single `std::bad_alloc`. The read concurrency semaphore allows between 15-30 reads, and is always blocked on memory. " * 'more-accurate-reader-resource-tracking/v1' of ssh://github.com/denesb/scylla: test/manual/sstable_scan_footprint_test: improve memory consumption diagnostics tests/manual/sstable_scan_footprint_test: use the semaphore to determine read rate tests/manual: Add test measuring memory demand of concurrent sstable reads index_reader: make the index file tracked sstables/continuous_data_consumer: track buffers used for parsing reader_concurrency_semaphore: tracking_file_impl: consume memory speculatively reader_concurrency_semaphore: bye reader_resource_tracker treewide: replace reader_resource_tracer with reader_permit reader_permit: expose make_tracked_temporary_buffer() reader_permit: introduce make_tracked_file() reader_permit: introduce memory_units reader_concurrency_semaphore: mv reader_resources and reader_permit to reader_permit.hh reader_concurrency_semaphore: reader_permit: make it a value type reader_concurrency_semaphore: s/resources/reader_resources/ reader_concurrency_semaphore::reader_permit: move methods out-of-line	2020-01-29 00:11:17 +02:00
Gleb Natapov	8dc37277df	commitlog: remove unused variable Message-Id: <20200128132118.GH26048@scylladb.com>	2020-01-29 00:11:17 +02:00
Eliran Sinvani	57f90e34ea	alternator: run alternator processing loop in the statement scheduling group In Scylla all query processing activity should run under the "statement" scheduling group. The scheduling group is important for maintaining the balance between background and foreground tasks in Scylla. Testing: In order to test the correctness of the patch. First, the following assert was inserted before any call to one of the executor functions in the http route: assert(current_scheduling_group().name() == "statement" Then all alternator tests ran and passed. The second stage was to change the name so the assert will fail: assert(current_scheduling_group().name() == "no-statement" And ran the tests again - validating that Scylla coredumps. The asserts were then removed. Fixes #5008 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20200127154341.10020-1-eliransin@scylladb.com>	2020-01-29 00:11:17 +02:00
Avi Kivity	e09ed81c23	Merge "Fix two corner cases in snapshots API" from Pavel " There seem to be two problems with handling snapshot API -- one on start and the other one on stop. Here's the set that addresses both. The fix moved snapshot API registration later in time that required Amnon's ACK. Now we have it :) so -- the rebase and resend. Tests: unit(dev), start-stop " * 'br-snapshot-bugs-2' of https://github.com/xemul/scylla: snapshot: Pass requests through gate api: Register snapshot API later api: Unwrap wrap_ks_cf	2020-01-29 00:11:17 +02:00
Avi Kivity	c0f412617e	Merge "Make the scylla build deterministic" from Rafael " With these changes and a binutils compiled with --enable-deterministic-archives, the only difference I get in the build directory if I build scylla twice from scratch are: * The various CMakeError.log because they have temporary file names. * The various CMakeOutput.log for the same reason. * .ninja_log and .ninja_deps. I am not sure what the contents are. " * 'espindola/fix-determinism' of https://github.com/espindola/scylla: build: remove timestamps from then antlr output build: Make the output of idl-compiler deterministic	2020-01-28 18:16:06 +02:00
Rafael Ávila de Espíndola	0e8bee0774	configure: Use lld if available This depends on the patch mk: avoid combining -r and -export-dynamic linker options being added to dpdk. I benchmarked this on top of my patches to get a reproducible build. I first compiled with ccache, deleted the build directory and recompiled so that all the "gcc -c" invocations were served by ccache. The times of the second "ninja release" invocations were: lld: ninja release 155.68s user 71.89s system 2077% cpu 10.953 total gold: ninja release 953.79s user 254.71s system 2533% cpu 47.699 total Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200127171516.26268-1-espindola@scylladb.com>	2020-01-28 18:15:50 +02:00
Avi Kivity	7440125cb1	Update seastar submodule > memory: add scoped_heap_profiling > build: add switch to enable heap profiling support > io_tester: do not abort on end of test > resource: clean up cgroups version determination. > prometheus: Silence a bogus gcc warning in http server > Update dpdk submodule > resource: Support cgroups v2 > net: Don't use variable length arrays > core/memory.hh: document set_heap_profiling_enabled() > Revert "net: Don't use variable length arrays" > cmake: fix pkgconfig boost deps > thread: Avoid confusing comment by switching value > net: posix-stack: fix allocator in ap listening sockets > net: posix-stack: fix passing allocator to new sockets > stall_detector: Add a counter for stall detector report > Merge "Don't use variable length arrays" from Rafael > treewide: fix minor issues reported by clang > thread: Call mprotect in make_stack > thread: Always allocate stack with aligned_alloc > build: Make SEASTAR_THREAD_STACK_GUARDS private > thread: Move code out of a header	2020-01-28 18:15:18 +02:00
Nadav Har'El	b06b34478e	merge: lwt: add lightweight transaction unit tests Merged patch series from Konstantin Osipov: This series sets cql_repl core count to 1 and adds LWT unit tests. test.py: invoke cql_repl with smp=1 lwt: add lightweight transactions unit tests	2020-01-28 12:39:23 +02:00
Nadav Har'El	30283f2544	merge: Alternator: return api_error instead of throwing Merged patch series from Piotr Sarna: In order to minimize the usage of throws and catches in code paths that are potentially hot, these paths instead return appropriate errors directly. The server layer is still able to catch and translate errors, but the preferred way is to return api_error directly in places that may be performance-sensitive. Tests: alternator-test(local) Fixes #5472 Piotr Sarna (3): alternator: change request return type to variant<value, error> alternator: elide throwing in condition checks alternator: replace top-level throws with returns in executor alternator/executor.hh \| 28 ++++---- alternator/server.hh \| 4 +- alternator/executor.cc \| 141 +++++++++++++++++++++-------------------- alternator/server.cc \| 44 ++++++++----- 4 files changed, 117 insertions(+), 100 deletions(-)	2020-01-28 12:39:23 +02:00
Konstantin Osipov	98c34ae750	test.py: always build cql_repl, do not strip Exclude cql_repl from the list of tests, since it's not a test. Build it as a separate app. Do not strip, so that any CQL test failure is easy to debug without a rebuild. All test-related targets are converted from lists to sets to avoid quadratic lookup cost in the check inside the loop which creates the ninja file.	2020-01-28 12:39:23 +02:00
Piotr Sarna	a81640d402	alternator: replace top-level throws with returns in executor In order to elide unnecessary throwing, all errors previously thrown from top-level executor methods (the ones that handle user requests) are now returned directly. Message-Id: <73e05d1057ee842576fae11be9d77265ffb2e96f.1579515640.git.sarna@scylladb.com>	2020-01-28 12:39:23 +02:00
Takuya ASADA	f21123b3ae	scylla_io_setup: Improve error message for unsupported EC2 instance types (#5561 ) Currently --ami does not check instance types, creates invalid io_properties.yaml on unsupported instance types. It actually won't occur on AMI startup, since scylla_ami_setup only invoke scylla_io_setup --ami when the instance is supported, so we don't get the issue on startup, but we still get when we run scylla_io_setup manually. It's better to check instance type on scylla_io_setup, too. Refs #5438	2020-01-28 12:39:23 +02:00
Piotr Sarna	854adf5b70	alternator: elide throwing in condition checks Conditional updates inform the user that the condition is not met by returning an error. An initial implementation was based on rethrowing these errors, but returning them directly is considered better for performance.	2020-01-28 12:39:23 +02:00
Gleb Natapov	0d0c05a569	lwt: allow only one paxos instance to run for each key simultaneously This will prevent contention in case of parallel updates of the same row by the same coordinator. The patch does it by introducing a new per key lock map and taking it before running PAXOS protocol (either for write of for read). Message-Id: <20200117101228.GA14816@scylladb.com>	2020-01-28 12:39:23 +02:00
Piotr Sarna	a6a65abc3c	alternator: change request return type to variant<value, error> In order to minimize the use of exceptions during normal operations, each request handler is now able to return either a proper JSON value, or an instance of api_error, which indicates that something went wrong, but without having to throw, catch and rethrow C++ exceptions. This is especially important for conditional updates, since it's expected to be common to return ConditionalCheckFailedException. Message-Id: <d8996a0a270eb0d9db8fdcfb7046930b96781e69.1579515640.git.sarna@scylladb.com>	2020-01-28 12:39:23 +02:00
Avi Kivity	897320f6ab	tools: toolchain: dbuild: relax process limit in container Docker restricts the number of processes in a container to some limit it calculates. This limit turns out to be too low on large machines, since we run multiple links in parallel, and each link runs many threads. Remove the limit by specifying --pids-limit -1. Since dbuild is meant to provide a build environment, not a security barrier, this is okay (the container is still restricted by host limits). I checked that --pids-limit is supported by old versions of docker and by podman. Fixes #5651. Message-Id: <20200127090807.3528561-1-avi@scylladb.com>	2020-01-28 12:39:23 +02:00
Avi Kivity	c7e0be75a5	Merge "Metrics for full scan" from Alejo " Final set of changes for full scan metrics. - allow filtering - full scan (Note: non-system tables only) - full scan without BYPASS CACHE option - tests for all metrics (bypass cache, allow filtering, full scan) - works with prepared statements (tested, too) " * 'as_full_scan_metrics' of https://github.com/alecco/scylla: Range scan query counter Counter of queries doing full scan. ALLOW FILTERING query counter	2020-01-28 12:39:23 +02:00
Botond Dénes	e4616f92fe	test/manual/sstable_scan_footprint_test: improve memory consumption diagnostics This test is all about tracking measured memory consumption vs. real memory consumption. To make this easier add additional diagnostics: * enable seastar heap profiler for the duration of the reads (seastar has to be compiled with `-DSEASTAR_HEAPPROF`). * Add a stats collector, which periodically collects stats such as non-LSA free/used memory, LSA free/used memory and memory tracked by the reader concurrency semaphore. These stats are written to a `.csv` file, allowing importing them into a spreadsheet and processing them.	2020-01-28 10:15:55 +02:00
Botond Dénes	9e9c59d125	tests/manual/sstable_scan_footprint_test: use the semaphore to determine read rate Currently the test fires the configured amount of reads at once. This is somewhat restricting in the number of testable scenarios. For example, it doesn't allow one to see if the semaphore correctly tracks the memory consumption of existing reads, by firing new reads after a while. Replace this algorithm by one which fires reads with a configured concurrency, then waits for the semaphore's queue (if any) to drain, before firing new reads. The test can now be configured with the total amount of reads to fire, and with the read-concurrency, i.e. the number of reads to fire at once in each iteration. This allows for much greater flexibility in the different test scenarios. The previous behaviour can still be achieved by configuring a concurrency of 100. This patch also adds better error handling. Reads are aborted on the first error and errors are caught and not allowed to bubble up past the test's main function and are logged instead. Extensive logging is also added to be able to monitor the system while the test is running.	2020-01-28 10:15:53 +02:00
Tomasz Grabiec	2eb88024c0	tests/manual: Add test measuring memory demand of concurrent sstable reads Allow manual experimentation with the effectiveness of the accuracy of the tracking of the resource consumption of readers, and hence the system's ability to prevent overload and the dreaded `std::bad_alloc`. This patch was originally developed by Tomasz Grabiec <tgrabiec@scylladb.com>, I only adapted it to compile and link on current master.	2020-01-28 08:13:16 +02:00
Botond Dénes	dfc66194c8	index_reader: make the index file tracked Track I/O going to the index file, similarly to how we already track I/O going to the data file.	2020-01-28 08:13:16 +02:00
Botond Dénes	936619a8d3	sstables/continuous_data_consumer: track buffers used for parsing Based on heap profiling, buffers used for storing half-parsed fields are a major contributor to the overall memory consumption of reads. This memory was completely "under the radar" before. Track it by using tracked `temporary_buffer` instances everywhere in `continuous_data_consumer`. As `continuous_data_consumer` is the basis for parsing all index and data files, adding the tracing here automatically covers all data, index and promoted index parsing. I'm almost convinced that there is a better place to store the `permit` then the three places now, but so far I was unable to completely decipher the our data/index file parsing class hierarchy.	2020-01-28 08:13:16 +02:00
Botond Dénes	92fffe51d5	reader_concurrency_semaphore: tracking_file_impl: consume memory speculatively Consume the memory before even submitting the I/O to the underlying `file` object. This is in line with the underlying `file` object allocating the buffer before it forwards the I/O request to the kernel. This extends the "visibility" over the memory consumed by I/O greatly, as it turns out buffers spend most time alive waiting for the I/O to complete and are parsed shortly afterwards.	2020-01-28 08:13:16 +02:00
Botond Dénes	4bb3c7b1f0	reader_concurrency_semaphore: bye reader_resource_tracker Replaced by `reader_permit`, of which it was a mere wrapper of in the first place.	2020-01-28 08:13:16 +02:00
Botond Dénes	dfc8b2fc45	treewide: replace reader_resource_tracer with reader_permit The former was never really more than a reader_permit with one additional method. Currently using it doesn't even save one from any includes. Now that readers will be using reader_permit we would have to pass down both to mutation_source. Instead get rid of reader_resource_tracker and just use reader_permit. Instead of making it a last and optional parameter that is easy to ignore, make it a first class parameter, right after schema, to signify that permits are now a prominent part of the reader API. This -- mostly mechanical -- patch essentially refactors mutation_source to ask for the reader_permit instead of reader_resource_tracking and updates all usage sites.	2020-01-28 08:13:16 +02:00
Botond Dénes	dea24ca859	reader_permit: expose make_tracked_temporary_buffer() Previously `tracking_file_impl::make_tracked_buf()`. In the next patches we plan on using this outside `tracking_file_impl`, so make it public and templatize on the char type.	2020-01-28 08:13:16 +02:00
Botond Dénes	16cea36a94	reader_permit: introduce make_tracked_file() Free function equivalent of `reader_resource_tracker::track_file()`, using a `reader_permit` directly.	2020-01-28 08:13:16 +02:00
Botond Dénes	1859a03629	reader_permit: introduce memory_units Similar to `seastar::semaphore_units`, this allows consuming and releasing memory via an RAII object. In addition to that, it also allows tracking changing values. This feature was designed to be used for tracking the ever changing memory consumption of the buffers of `flat_mutation_reader`:s. This is now the only supported way of consuming memory from a permit.	2020-01-28 08:13:16 +02:00
Botond Dénes	c0f96db2d9	reader_concurrency_semaphore: mv reader_resources and reader_permit to reader_permit.hh In the next patches we will replace `reader_resource_tracker` and have code use the `reader_permit` directly. In subsequent patches, the `reader_permit` will get even more usages as we attempt to make the tracking of reader resource more accurate by tracking more parts of it. So the grand plan is that the current `reader_concurrency_semaphore.hh` is split into two headers: * `reader_concurrency_semaphore.hh` - containing the semaphore proper. * `reader_permit.hh` - a very lightweight header, to be used by components which only want to track various parts of the resource consumption of reads.	2020-01-28 08:13:16 +02:00
Botond Dénes	2005495857	reader_concurrency_semaphore: reader_permit: make it a value type Currently `reader_permit` is passed around as `lw_shared_ptr<reader_permit>`, which is clunky to write and use and is also an unnecessary leak of details on how permit ownership is managed. Make `reader_permit` a simple value type, making it a little bit easier and safer to use. In the next patches we will get rid of `reader_resource_tracker` and instead have code use the permit instance directly, so this small improvement in usability will go a long way towards preventing eye sore.	2020-01-28 08:13:16 +02:00
Botond Dénes	932bc02730	reader_concurrency_semaphore: s/resources/reader_resources/ In preparation of making it a top-level class and moving it to another file.	2020-01-28 08:13:16 +02:00
Botond Dénes	89c5fd0c25	reader_concurrency_semaphore::reader_permit: move methods out-of-line In preparation for making the reader_permit a top-level class, and moving it to another file. It is also good practice to define non-performance critical methods out-of-line to reduce header bloat.	2020-01-28 08:13:16 +02:00
Konstantin Osipov	511ae023f0	lwt: add lightweight transactions unit tests These unit tests cover all CQL aspects of lightweight transactions, such as grammar, null semantics, batch semantics, result set format, and so on. For now, comment out unicode tests: test output depends on libjsoncpp version in use.	2020-01-27 23:09:57 +03:00
Konstantin Osipov	fef50b66a2	test.py: invoke cql_repl with smp=1 Since bounce_to_shard is not handled by cql_repl, invoke it with smp=1 until it is fixed.	2020-01-27 22:57:10 +03:00
Pavel Emelyanov	976463f620	snapshot: Pass requests through gate When the scylla process is stopped no code waits for current snapshot operations to finish. Also, the API server is not stopped either, so new snapshot requests can creep into. In seastar there's a useful abstraction to address both. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-27 17:42:04 +03:00
Pavel Emelyanov	fd6b5efe75	api: Register snapshot API later In storage_service's snapshot code there are checks for _operation_mode being _not_ JOINING to proceed. The intention is apparently to allow for snapshots only after the cluster join. However, here's how the start-up code looks like - _operation_mode = STARTING in storage_service::constructor - snapshot API registered in api::set_server_storage_service - _operation_mode = JOINING in storage_service::join_token_ring So in between steps 2 and 3 snapshots can be taken. Although there's a quick and simple fix for that (check for the _operation_mode to be not STARTING either) I think it's better to register the snapshot API later instead. This will help greatly to de-bload the storage_service, in particular -- to incapsulate the _operation_mode properly. Note, though the check for _operation_mode is made only for taking snapshot, I move all snapshot ops registration to the later phase. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-27 17:42:04 +03:00
Pavel Emelyanov	4886c1db74	api: Unwrap wrap_ks_cf This is preparation for the next patch -- the lambda in question (and the used type) will be needed in two functions, so make the lambda a "real" function. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-27 17:42:04 +03:00
Benny Halevy	10c912d3db	cql3: abstract_function_selector: provide assignment_testable_source_context Return function name. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	35e9538d49	test: cql_query_test: add time uuid validation tests Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	1078c86af9	cql3: time_uuid_fcts: validate timestamp arg Make sure that the timestamp argument does not overflow 60 bits when converted to units of 100 nanos since epoch, like with writetime() that returns microseconds since epoch in contrast to other time functions like unixtimestampof that return millis since epoch. Fixes #5552 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	fa0fa53bd3	cql3: make_max_timeuuid_fct: delete outdated FIXME comment Done in `86c09046fd` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	72e2ea47c1	cql3: time_uuid_fcts: validate time UUID Throw an error in case we hit an invalid time UUID rather than hitting an assert. Ref #5552 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	00bd1d32d3	test: UUID_test: add tests for time uuid Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	f8b079b599	utils: UUID: create_time assert nanos_since validity Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	cd3460cc88	utils/UUID_gen: make_nanos_since Safely convert millis to "nanos_since" (number of 100 nanseconds since START_EPOCH) while type casting to uint64_t to avoid possible int overflow. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:08:16 +02:00
Benny Halevy	22bac26023	utils: UUID: assert UUID.is_timestamp Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-26 18:54:36 +02:00
Avi Kivity	cc0222ec2d	Merge "Futurize get_changed_ranges_for_leaving" from Asias " Futurize get_changed_ranges_for_leaving to fix stalls like: 2019-12-17T15:18:33+00:00 ip-10-0-116-62 !INFO \| scylla: Reactor stalled for 4609 ms on shard 0. 0x0000000002accbd2 0x0000000002a4579b 0x0000000002a45cc2 0x0000000002a45ff7 0x00007ff0a609be7f 0x0000000001b0b500 0x0000000001b03185 0x0000000001af0d41 0x0000000001af027a 0x0000000001f7e89a 0x0000000001f9f55a 0x0000000001fc9c09 0x0000000001fcac08 0x00000000007dfee3 /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/src/core/reactor.cc:1041 (inlined by) seastar::reactor::block_notifier(int) at /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/src/core/reactor.cc:1164 ?? ??:0 __gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > > std::__lower_bound<__gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > >, dht::token, __gnu_cxx::__ops::_Iter_less_val>(__gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > >, __gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > >, dht::token const&, __gnu_cxx::__ops::_Iter_less_val) at crtstuff.c:? locator::token_metadata::first_token_index(dht::token const&) const at crtstuff.c:? locator::token_metadata::ring_range(dht::token const&, bool) const at crtstuff.c:? locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at crtstuff.c:? service::storage_service::get_changed_ranges_for_leaving(seastar::basic_sstring<char, unsigned int, 15u, true>, gms::inet_address) at crtstuff.c:? service::storage_service::unbootstrap() at crtstuff.c:? service::storage_service::decommission()::{lambda(service::storage_service&)#1}::operator()(service::storage_service&) const::{lambda()#1}::operator()() const [clone .isra.0] at storage_service.cc:? Refs: #5495 " * 'futurize_get_changed_ranges_for_leaving' of https://github.com/asias/scylla: storage_service: Yield in get_changed_ranges_for_leaving storage_service: Make get_changed_ranges_for_leaving run inside thread	2020-01-26 13:25:53 +02:00
Takuya ASADA	dd81fd3454	dist/debian: Use tilde for release candidate builds We need to add '~' to handle rcX version correctly on Debian variants (merged at `ae33e9f`), but when we moved to relocated package we mistakenly dropped the code, so add the code again. Fixes #5641	2020-01-26 13:25:53 +02:00
Ivan Prisyazhnyy	4c001553eb	dep/arch: better messages Tested on Arch 5.4.2-arch1-1 and docker archlinux. Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Message-Id: <20200125122836.460811-1-ivan@scylladb.com>	2020-01-26 12:02:32 +02:00
Ivan Prisyazhnyy	98a8c36c60	cmake: fix seastar and gen include dirs lookup Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Message-Id: <20200125145926.545859-1-ivan@scylladb.com>	2020-01-26 12:02:32 +02:00
Dejan Mircevski	90b54c8c42	view_info: Drop partition_ranges() The method view_info::partition_ranges() is unused. Also drop the now-dead _partition_ranges data member. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-01-26 12:02:32 +02:00
Piotr Sarna	9fa88e26a9	Merge 'Alternator - LWT and ConditionExpression' from Nadav This is a fourth iteration of the patch series adding LWT usage (instead of the old naive - and wrong - read before write) to Alternator, as well as full support for the ConditionExpression syntax for conditional updates. Changes in v4: * Rebased to most recent master * Replaced 3 booleans which had 2^3 = 8 theoretical combinations, by just 4 options in enum write_isolation: FORBID_RMW, LWT_ALWAYS, LWT_RMW_ONLY, UNSAFE_RMW The four options are described in details comments. * Fix reversed assertion in FORBID_RMW case. * Two new metrics: write_using_lwt and shard_bounce_for_lwt. * Fail boot if alternator is enabled, but LWT isn't. * Add information about enabling LWT in docs/alternator/alternator.md * nyh/v4-lwt: alternator: add support for ConditionExpression alternator: reimplement read-modify-write operations using LWT alternator: make "executor" a peering_sharded_service	2020-01-26 12:02:32 +02:00
Alejo Sanchez	936cae6069	Range scan query counter Fixes #5209 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-01-24 15:02:58 +01:00
Alejo Sanchez	f57513a809	Counter of queries doing full scan. In scope of #5209 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-01-24 14:25:19 +01:00
Alejo Sanchez	dbe8a54768	ALLOW FILTERING query counter Implements a counter of executions of SELECT queries with ALLOW FILTERING option. In scope of #5209 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-01-24 13:38:30 +01:00
Piotr Jastrzebski	682dfdafe1	partitioners: remove random_partitioner Previous patch makes it impossible to configure Scylla with RandomPartitioner so this code is effectively dead now. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	d80ac4c2d0	partitioners: Make it impossible to use RandomPartitioner RandomPartitioner has been deprecated for 2.5 year. Now we drop the support for it. There are two reasons for this. First, this partitioner can lead to uneven distribution of partitions among the nodes in the cluster which leads to hot nodes. Second, we're planning to unify the representation of tokens and fix it as int64_t. RandomPartitioner does not comply with this. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	7a86e2ff46	partitioners: remove byte_ordered_partitioner Previous patch makes it impossible to configure Scylla with ByteOrderedPartitioner so this code is effectively dead now. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	130eb91636	partitioners: Make it impossible to use ByteOrderedPartitioner ByteOrderedPartitioner has been deprecated for 2.5 year. Now we drop the support for it. There are two reasons for this. First, this partitioner can lead to uneven distribution of partitions among the nodes in the cluster which leads to hot nodes. Second, we're planning to unify the representation of tokens and fix it as int64_t. ByteOrderPartitioner does not comply with this. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	4088be2056	partitioners: Remove leftovers of OrderPreservingPartitioner OrderPreservingPartitioner seems to be long gone and not supported so remove all the places it's still mentioned. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	1d345091f6	i_partitioner.cc: stop including byte_ordered_partitioner.hh Nothing from that header is used in i_partitioner.cc. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	44c9a71686	i_partitioner.cc: stop including random_partitioner.hh Nothing from that header is used in i_partitioner.cc. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:09:13 +01:00
Piotr Jastrzebski	6a2cd64b5c	config: use allowed_values to verify named_value input Even though we configure the set of accepted values for some config flags, named_value ignore them. This patch implements the checks that verify flag is not set to the value that's not on the list. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:08:59 +01:00
Nadav Har'El	b50274e8a7	alternator: add support for ConditionExpression This patch adds support for the ConditionExpression parameter of the item-writing operations in Alternator: PutItem, UpdateItem and DeleteItem. We already supported conditional updates/put/delete using the "Expected" parameter. The ConditionExpression parameter implemented here provides a very similar feature, using a different - and also newer and more powerful - syntax. The implementation here reuses much of our existing expression-parsing infrastructure. Unsurprisingly, ConditionExpression's syntax has much in common with UpdateExpression which we already support) and also many of the comparison functions already implemented for "Expected". However, it's still quite a bit of new code, because of the many different comparisons, functions, and syntax variations we need to support. This patch also expands alternator-test/test_condition_expression.py with a few additional corner cases discovered during the development of this patch. Almost all of the tests for this feature (35 out of 39) now pass. Two tests still fail because we don't yet support nested attributes (this is a missing feature across Alternator), and two tests fail because of minor ideosyncracies in DynamoDB's error path that we chose not to duplicate yet (but still remember the difference in the form of an xfailing test). Fixes #5035 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-01-23 13:57:33 +02:00
Nadav Har'El	370b963ce5	alternator: reimplement read-modify-write operations using LWT In this patch, we re-implement the three read-modify-write operations - PutItem, UpdateItem, DeleteItem. All three operations may need to read the item before writing it to support conditional updates (the "Expected" parameter) and UpdateItem may also need the previous item's value for its update expression (e.g., a user may ask to "set a=a+1" or "set a=b"). Before this patch, the implementation of RMW operations simply did a read, and then a write - without any attempt to protect concurrent operations. In this patch, Scylla's LWT mechanism (storage_proxy::cas()) is used instead, to ensure that concurrent update operations are correctly isolated even if they are conditional. This means that Alternator now requires the experimental LWT feature to be enabled (and refuses to boot if it isn't). The version presented here is configured to always use LWT for every write, regardless of whether it has a condition or not. So it will will significantly slow down write-only workloads like YCSB. But the code in this patch actually includes three other modes, which can be chosen by setting an enum constant in the code. In the future we will want to let the user configure this mode, globally, per table or per attribute. Note that read requests are NOT modified, and work exactly as they did before: i.e., strongly-consistent reads are done using a normal CL=LOCAL_QUORUM read - not via LWT. I believe this is good enough given Dynamo's guarantees, and critical for our read performance. Also note that patch doesn't yet fix the BatchWriteItem operation. Although BatchWriteItem does not support any RMW operations - just pure writes - we may still need to do those pure writes using LWT. This should be fixed in a follow-up patch. Unfortunately, this patch involves a large amount of code movement and reorganization, because: 1. The cas operation requires each operation to be made into an object, with a separate apply() function, forcing a lot of code to move. 2. Moreover, we need to do this for three different operations (PutItem, UpdateItem, DeleteItem) so to avoid massive code duplication, I had to move some common code. 3. The cas operation also forced us to change some of the utility functions' APIs. The end result is that this patch focuses more on a compact and understandable end result than it does on an easy to understand patch, so reviewers - sorry about that. All alternator-test/ tests pass with this patch (and also with all of the different optional modes enabled). However, other than that, I did not yet do any real isolation tests (are concurrent operations really isolated correctly? or is LWT just faking it? :-) ), performance tests or stress tests - and I'll definitely need to do those as well. Fixes #5054 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-01-23 13:57:28 +02:00
Nadav Har'El	7dfd081e0d	alternator: make "executor" a peering_sharded_service Alternator uses a sharded<executor> for handling execution of Alternator requests on different shards. In this patch we make executor a subclass of peering_sharded_service, to allow one of these executors to run an exector method on a different shard: Any one of the shard-local executor instances can call container() to get the full sharded<executor>. We will need this capability later, when we need to bounce requests between shards because of requirements of the storage_proxy::cas (LWT) code. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-01-23 13:57:23 +02:00
Benny Halevy	5b0ea4c114	storage_service: drain_on_shutdown: unregister storage_proxy subscribers from local_storage_service Match subscription done in main() and avoid cross shard access to _lifecycle_subscribers vector. Fixes #5385 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Acked-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200123092817.454271-1-bhalevy@scylladb.com>	2020-01-23 11:38:23 +02:00
Piotr Jastrzebski	df1b7d2805	config: add operator<< for seed_provider_type Following patch will start checking allowed_values in named_value and print errors for wrong values. This will require all the types used with named_value to have operator<< implemented. seed_provider_type is one such type. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-23 10:28:58 +01:00
Rafael Ávila de Espíndola	6058fe8007	build: remove timestamps from then antlr output The output of antrl always has the timestamp of when it was created. This expands the existing sed hack to remove that too. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 16:29:54 -08:00
Rafael Ávila de Espíndola	72e900291b	build: Make the output of idl-compiler deterministic If at any point during the topological sort we had more than one node with zero dependencies, the order they were printed was not deterministic. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 16:28:00 -08:00
Avi Kivity	46951f8b1a	Merge "Refactor migration_notifier listeners and gossip subscribers" from Rafael " This series refactors the code used by migration_notifier and gossiper into an atomic_vector type. " * 'espindola/gossiper_atomic_vector' of https://github.com/espindola/scylla: gossiper: Store subscribers in an atomic_vector load_broadcaster: Unregister from load_broadcaster::stop_broadcasting repair: add row_level::stop() locator: Return future from i_endpoint_snitch::reload_gossiper_state service: Refactor code into a atomic_vector class migration_manager: Fix typo load_meter: Use a shared_ptr to store a load_broadcaster	2020-01-22 18:58:15 +02:00
Rafael Ávila de Espíndola	845116dfaf	gossiper: Store subscribers in an atomic_vector The new guarantees are a bit better IMHO: Once a subscriber is removed, it is never notified. This was not true in the old code since it would iterate over a copy that would still have that subscriber. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Rafael Ávila de Espíndola	c62a33965d	load_broadcaster: Unregister from load_broadcaster::stop_broadcasting This is in preparation for unregistration returning a future. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Rafael Ávila de Espíndola	7390485e20	repair: add row_level::stop() Now unregister_ is called from stop(). This reduces the noise in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Rafael Ávila de Espíndola	085544f054	locator: Return future from i_endpoint_snitch::reload_gossiper_state This just reduces the noise of a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Rafael Ávila de Espíndola	d9a71a7cff	service: Refactor code into a atomic_vector class This templates the code for listener_vector, renames it to atomic_vector and moves it to the utils directory. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Rafael Ávila de Espíndola	baeb6744f6	migration_manager: Fix typo Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Rafael Ávila de Espíndola	9d4cf25c84	load_meter: Use a shared_ptr to store a load_broadcaster load_broadcaster::stop_broadcasting uses shared_from_this(). Since that is the only reference that the produced shared_ptr knows of, it is deleted immediately. Fix that by also using a shared_ptr in load_meter. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Pekka Enberg	0abb4e1742	Update seastar submodule * seastar afc46681...147d50b1 (6): > perftune.py: Use safe_load() for fix arbitrary code execution Fixes #5630 > clang: current_exception_as_future must be in namespaced > tests: add an expected failures version of thread fixture > Enable stack guards in Dev builds > net: posix: Introduce load_balancing_algorithm::fixed > stream: Move _next from subscription to stream	2020-01-22 17:54:14 +02:00
Pavel Solodovnikov	e1b22b6a4c	cql3: get rid of lw_shared_ptr for `variable_specifications` `parsed_statement::get_bound_variables` is assumed to always return a nonnull pointer to `variable_specifications` instance. In this case using a pointer is superfluous and can be safely replaced by a plain reference. Also add a default ctor and a utility method `set_bound_variables` to the `variable_specifications` class to actually reset the contents of the class instance. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200120195839.164296-1-pa.solodovnikov@scylladb.com>	2020-01-22 12:51:02 +02:00
Avi Kivity	5d78d511ad	Merge "cql: Simplify sum overflow" from Benny " As a followup to `0bde590` This series implements suggestions from @avikivity and @espindola It simplifies the template definitions for accumulator_for, adds some debug logging for the overflow values, and adds unit tests for float and double sum overflow. Test: unit(dev), paging_test:TestPagingWithIndexingAndAggregation.test_filter_{indexed,non_indexed,pk}_column(dev) " * 'simplify-sum-overflow' of https://github.com/bhalevy/scylla: test: cql_query_test: test float/double sum overflow cql3: aggregate_fcts: simplify accumulator_for template definitions	2020-01-22 11:30:25 +02:00
Asias He	be9d7c3b28	storage_service: Yield in get_changed_ranges_for_leaving It is always called inside a seastar thread. Call yield to prevent stalls. This patch fixes stalls like: 2019-12-17T15:18:33+00:00 ip-10-0-116-62 !INFO \| scylla: Reactor stalled for 4609 ms on shard 0. 0x0000000002accbd2 0x0000000002a4579b 0x0000000002a45cc2 0x0000000002a45ff7 0x00007ff0a609be7f 0x0000000001b0b500 0x0000000001b03185 0x0000000001af0d41 0x0000000001af027a 0x0000000001f7e89a 0x0000000001f9f55a 0x0000000001fc9c09 0x0000000001fcac08 0x00000000007dfee3 /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/src/core/reactor.cc:1041 (inlined by) seastar::reactor::block_notifier(int) at /jenkins/slave/workspace/scylla-3.2/build/scylla/seastar/src/core/reactor.cc:1164 ?? ??:0 __gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > > std::__lower_bound<__gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > >, dht::token, __gnu_cxx::__ops::_Iter_less_val>(__gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > >, __gnu_cxx::__normal_iterator<dht::token const, std::vector<dht::token, std::allocator<dht::token> > >, dht::token const&, __gnu_cxx::__ops::_Iter_less_val) at crtstuff.c:? locator::token_metadata::first_token_index(dht::token const&) const at crtstuff.c:? locator::token_metadata::ring_range(dht::token const&, bool) const at crtstuff.c:? locator::simple_strategy::calculate_natural_endpoints(dht::token const&, locator::token_metadata&) const at crtstuff.c:? service::storage_service::get_changed_ranges_for_leaving(seastar::basic_sstring<char, unsigned int, 15u, true>, gms::inet_address) at crtstuff.c:? service::storage_service::unbootstrap() at crtstuff.c:? service::storage_service::decommission()::{lambda(service::storage_service&)#1}::operator()(service::storage_service&) const::{lambda()#1}::operator()() const [clone .isra.0] at storage_service.cc:? Refs: #5495	2020-01-22 12:36:15 +08:00
Asias He	74b787c91a	storage_service: Make get_changed_ranges_for_leaving run inside thread It is the only place where get_changed_ranges_for_leaving is not running inside a thread. Preparing patch to futurize get_changed_ranges_for_leaving. Refs: #5495	2020-01-22 12:36:13 +08:00
Piotr Sarna	9b379e3d63	db,view: fix checking for secondary index special columns A mistake in handling legacy checks for special 'idx_token' column resulted in not recognizing materialized views backing secondary indexes properly. The mistake is really a typo, but with bad consequences - instead of checking the view schema for being an index, we asked for the base schema, which is definitely not an index of itself. Branches 3.1,3.2 (asap) Fixes #5621 Fixes #4744	2020-01-21 22:32:04 +02:00
Rafael Ávila de Espíndola	27bd3fe203	service: Add a lock around migration_notifier::_listeners Before this patch the iterations over migration_notifier::_listeners could race with listeners being added and removed. The addition side is not modified, since it is common to add a listener during construction and it would require a fairly big refactoring. Instead, the iteration is modified to use indexes instead of iterators so that it is still valid if another listener is added concurrently. For removal we use a rw lock, since removing an element invalidates indexes too. There are only a few places that needed refactoring to handle unregister_listener returning a future<>, so this is probably OK. Fixes #5541. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200120192819.136305-1-espindola@scylladb.com>	2020-01-20 22:14:02 +02:00
Avi Kivity	c317b952a3	Merge "cql_query_test: Fix abandoned failed futures" from Rafael " This series fixes all abandoned failed futures in cql_query_test and starts running it with --fail-on-abandoned-failed-futures to avoid regressions. " * 'espindola/fix-abandoned-failed-futures' of https://github.com/espindola/scylla: cql_query_test: Avoid new abandoned failed futures cql_query_test: Explicitly ignore a failed future cql_query_test: Remove duplicated do_with_cql_env_thread cql_query_test: Fix cql and values in test_int_sum_with_cast	2020-01-20 20:40:56 +02:00
Rafael Ávila de Espíndola	4ce7cb9aa6	cql_query_test: Avoid new abandoned failed futures Now that cql_query_test has no abandoned failed futures, run it with --fail-on-abandoned-failed-futures to avoid regressions. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-20 09:23:22 -08:00
Rafael Ávila de Espíndola	ef5cd107ea	cql_query_test: Explicitly ignore a failed future This avoids an abandoned future warning. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-20 09:20:46 -08:00
Rafael Ávila de Espíndola	b547659c07	cql_query_test: Remove duplicated do_with_cql_env_thread With this test_int_sum_with_cast now runs and passes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-20 09:19:08 -08:00
Rafael Ávila de Espíndola	9334514c7c	cql_query_test: Fix cql and values in test_int_sum_with_cast This test is not running because of the double do_with_cql_env_thread. Fix it before we remove the extra do_with_cql_env_thread. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-20 09:17:35 -08:00
Avi Kivity	7d64b0f478	Update seastar submodule * seastar 3f3e117de3...afc46681e5 (7): > json: add move assignment to json_return_type > net: do not check if an unsigned variabe is less than 0 > stack: add virtual destructor definition for class w/ virtual functions > future,json: add ":" at end of concept definition > Fixing a bug in the handling of abort_accept() > install-dependencies.sh: improve arch detect > metrics: Avoid a copy during unregistration	2020-01-20 18:52:36 +02:00
Botond Dénes	e8a948ece6	configure.py: enable alloc failure injection for dev and debug modes We have numerous tests that rely on the seastar alloc failure injection infrastructure to test the exception safety of different components. These tests are essentially useless when the said infrastructure is not enabled, which is currently the case for all build modes, allowing bugs to sneak in undetected. Enable the allocation failure injection infrastructure for the dev and debug modes. Sanitize is excluded as it produces some (suspected false positive) failures and is not run in gating either currently. Tests: unit(dev, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200117104747.748866-1-bdenes@scylladb.com>	2020-01-20 18:07:33 +02:00
Kamil Braun	957fa8da11	dht: make i_partitioner::get_token method(s) const	2020-01-20 14:55:12 +02:00
Nadav Har'El	bd419ae723	merge: alternator: Add prerequisites for tagging Merged patch series from Piotr Sarna: This miniseries adds two simple prerequisites for implementing tagging: 1. A table is able to generate its Arn identifier 2. Simple tests for TagResource, UntagResource, ListTagsOfResource In general, tags should be stored in table metadata - either by expanding the schema of an existing schema table, e.g. scylla_tables, or by providing another meta-table - e.g. system_schema.alternator_tables, which stores alternator-specific metadata, like tags. Refs #5066 Tests: alternator-test(local, remote) Piotr Sarna (2): alternator: add Arn support for tables alternator-test: add basic tests for tags alternator-test/test_describe_table.py \| 1 - alternator-test/test_tag.py \| 88 ++++++++++++++++++++++++++ alternator/executor.cc \| 5 ++ 3 files changed, 93 insertions(+), 1 deletion(-) create mode 100644 alternator-test/test_tag.py	2020-01-20 14:42:40 +02:00
Piotr Jastrzebski	9279a679da	keys.hh: make it independent from schema.hh This cuts build dependency keys.hh -> schema.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-20 14:25:17 +02:00
Piotr Sarna	b8277e43e5	alternator-test: add basic tests for tags TagResource, UntagResource and ListTagsOfResource validation tests are added. Refs #5066	2020-01-20 12:24:51 +01:00
Piotr Sarna	8c17b5aec4	alternator: add Arn support for tables Several API-s, e.g. TagResource, UntagResource and ListTagsOfResource rely on identifying tables by their "Arn". According to the docs, an Arn should uniquely identify a resource, so it's implemented as: arn:KEYSPACE_NAME:TABLE_NAME which is a minimal set of information that uniquely identifies a table in Scylla. The `arn:` prefix is needed for compatibility purposes. This commit adds a simple function for generating the Arn string, and also includes it in DescribeTable result under the TableArn attribute. Refs #5066	2020-01-20 12:24:51 +01:00
Botond Dénes	a74a82d4d2	flat_mutation_reader: mutation_fragment_stream_validator: add name Add a name parameter to the validator, so that the validator can be identified in log messages. Schema identity information is added to the name automatically. This should help pinpoint the problematic place where validation failed. Although at the moment we have a single validator, it still benefits from having a name, as we can now include in it the name of the sstable being written and hence trace the source of the bad data. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200117150616.895878-1-bdenes@scylladb.com>	2020-01-20 11:06:30 +01:00
Takuya ASADA	893dfbce59	dist/ami: update packer to 1.5.1 Update Packer to 1.5.1. Needed to rename clean_ami_name -> clean_resource_name on scylla.json, since the variable name had been changed. Also fixed checksum verification code, trimmed unwanted extra strings from sha256sum output.	2020-01-20 11:24:57 +02:00
Takuya ASADA	46386beba2	install.sh: convert relocate_python_scripts.py to a bash function Since we need to run relocate_python_scripts.py on install time, python script may not able to run on various different environment. So convert the script to bash script, merge it into install.sh.	2020-01-20 11:15:34 +02:00
Takuya ASADA	5627888b7c	scylla_post_install.sh: fix 'integer expression expected' error awk returns float value on Debian, it causes postinst script failure since we compare it as integer value. Replaced with sed + bash. Fixes #5569	2020-01-20 11:13:55 +02:00
Asias He	343986a70b	gossiper: Introduce gossip STATUS_UNKNOWN When a node does not have gossip STATUS application_state, we currently use an empty string to present such state in get_gossip_status. It is better to use an explicit "UNKNOWN" to present it. It makes the log easier to understand when the status is unknown. Before: 'gossip - InetAddress n2 is now UP, status =' After: 'gossip - InetAddress n2 is now UP, status = UNKNOWN' This patch is safe because the STATUS_UNKNOWN is never sent over the cluster. So the presentation is only internal to the node. Fixes #5520	2020-01-20 10:59:14 +02:00
Benny Halevy	2b383b404a	test: cql_query_test: test float/double sum overflow Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-20 10:42:03 +02:00
Ivan Prisyazhnyy	8fde8e3600	dep: support arch linux Support arch linux dependencies. Tested on Arch 5.4.2-arch1-1 and docker archlinux. Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Message-Id: <20200118162110.824317-1-ivan@scylladb.com>	2020-01-19 14:30:03 +02:00
Benny Halevy	476a102de0	cql3: aggregate_fcts: simplify accumulator_for template definitions Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-19 08:26:40 +02:00
Avi Kivity	12bc965f71	atomic_cell: consistently use comma as separator in pretty-printers The atomic_cell pretty printers use a mix of commas and semicolons. This change makes them use commas everywhere, for consistency. Message-Id: <20200116133327.2610280-1-avi@scylladb.com>	2020-01-16 17:26:33 +01:00
Nadav Har'El	1ed21d70dc	merge: CDC: do mutation augmentation from storage proxy Merged pull request https://github.com/scylladb/scylla/pull/5567 from Calle Wilund: Fixes #5314 Instead of tying CDC handling into cql statement objects, this patch set moves it to storage proxy, i.e. shared code for mutating stuff. This means we automatically handle cdc for code paths outside cql (i.e. alternator). It also adds api handling (though initially inefficient) for batch statements. CDC is tied into storage proxy by giving the former a ref to the latter (per shard). Initially this is not a constructor parameter, because right now we have chicken and egg issues here. Hopefully, Pavels refactoring of migration manager and notifications will untie these and this relationship can become nicer. The actual augmentation can (as stated above) be made much more efficient. Hopefully, the stream management refactoring will deal with expensive stream lookup, and eventually, we can maybe coalesce pre-image selects for batches. However, that is left as an exercise for when deemed needed. The augmentation API has an optional return value for a "post-image handler" to be used iff returned after mutation call is finished (and successful). It is not yet actually invoked from storage_proxy, but it is at least in the call chain.	2020-01-16 17:12:56 +02:00
Avi Kivity	e677f56094	Merge "Enable general centos RPM (not only centos7)" from Hagit	2020-01-16 14:13:24 +02:00
Tomasz Grabiec	36d90e637e	Merge "Relax migration manager dependencies" from Pavel Emalyanov The set make dependencies between mm and other services cleaner, in particular, after the set: - the query processor no longer needs migration manager (which doesn't need query processor either) - the database no longer needs migration manager, thus the mutual dependency between these two is dropped, only migration manager -> database is left - the migration manager -> storage_service dependency is relaxed, one more patchset will be needed to remove it, thus dropping one more mutual dependency between them, only the storage_service -> migration manager will be left - the migration manager is stopped on drain, but several more services need it on stop, thus causing use after free problems, in particular there's a caught bug when view builder crashes when unregistering from notifier list on stop. Fixed. Tests: unit(dev) Fixes: #5404	2020-01-16 12:12:25 +01:00
Hagit Segev	d0405003bd	building-packages doc: Update no specific el7 on path	2020-01-16 12:49:08 +02:00
Rafael Ávila de Espíndola	c42a2c6f28	configure: Add -O1 when compiling generated parsers Enabling asan enables a few cleanup optimizations in gcc. The net result is that using -fsanitize=address -fno-sanitize-address-use-after-scope Produces code that uses a lot less stack than if the file is compiled with just -O0. This patch adds -O1 in addition to -fno-sanitize-address-use-after-scope to protect the unfortunate developer that decides to build in dev mode with --cflags='-O0 -g'. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200116012318.361732-2-espindola@scylladb.com>	2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola	317e0228a8	configure: Put user flags after the mode flags It is sometimes convenient to build with flags that don't match any existing mode. Recently I was tracking a bug that would not reproduce with debug, but reproduced with dev, so I tried debugging the result of ./configure.py --cflags="-O0 -g" While the binary had debug info, it still had optimizations because configure.py put the mode flags after the user flags (-O0 -O1). This patch flips the order (-O1 -O0) so that the flags passed in the command line win. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200116012318.361732-1-espindola@scylladb.com>	2020-01-16 12:05:50 +02:00
Gleb Natapov	51281bc8ad	lwt: fix write timeout exception reporting CQL transport code relies on an exception's C++ type to create correct reply, but in lwt we converted some mutation_timeout exceptions to more generic request_timeout while forwarding them which broke the protocol. Do not drop type information. Fixes #5598. Message-Id: <20200115180313.GQ9084@scylladb.com>	2020-01-16 12:05:50 +02:00
Piotr Jastrzębski	0c8c1ec014	config: fix description of enable_deprecated_partitioners Murmur3 is the default partitioner. ByteOrder and Random are the deprecated ones and should be mentioned in the description. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-16 12:05:50 +02:00
Nadav Har'El	9953a33354	merge "Adding a schema file when creating a snapshot" Merged pull request https://github.com/scylladb/scylla/pull/5294 from Amnon Heiman: To use a snapshot we need a schema file that is similar to the result of running cql DESCRIBE command. The DESCRIBE is implemented in the cql driver so the functionality needs to be re-implemented inside scylla. This series adds a describe method to the schema file and use it when doing a snapshot. There are different approach of how to handle materialize views and secondary indexes. This implementation creates each schema.cql file in its own relevant directory, so the schema for materializing view, for example, will be placed in the snapshot directory of the table of that view. Fixes #4192	2020-01-16 12:05:50 +02:00
Piotr Dulikowski	c383652061	gossip: allow for aborting on sleep This commit makes most sleeps in gossip.cc abortable. It is now possible to quickly shut down a node during startup, most notably during the phase while it waits for gossip to settle.	2020-01-16 12:05:50 +02:00
Avi Kivity	e5e0642f2a	tools: toolchain: add dependencies for building debian and rpm packages This reduces network traffic and eliminates time for installation when building packages from the frozen toolchain, as well as isolating the build from updates to those dependencies which may cause breakage.	2020-01-16 12:05:50 +02:00
Pekka Enberg	da9dae3dbe	Merge 'test.py: add support for CQL tests' from Kostja This patch set adds support for CQL tests to test.py, as well as many other improvements: * --name is now a positional argument * test output is preserved in testlog/${mode} * concise output format * better color support * arbitrary number of test suites * per-suite yaml-based configuration * options --jenkins and --xunit are removed and xml files are generated for all runs A simple driver is written in C++ to read CQL for standard input, execute in embedded mode and produce output. The patch is checked with BYO. Reviewed-by: Dejan Mircevski <dejan@scylladb.com> * 'test.py' of github.com:/scylladb/scylla-dev: (39 commits) test.py: introduce BoostTest and virtualize custom boost arguments test.py: sort tests within a suite, and sort suites test.py: add a basic CQL test test.py: add CQL .reject files to gitignore test.py: print a colored unidiff in case of test failure test.py: add CqlTestSuite to run CQL tests test.py: initial import of CQL test driver, cql_repl test.py: remove custom colors and define a color palette test.py: split test output per test mode test.py: remove tests_to_run test.py: virtualize Test.run(), to introduce CqlTest.Run next test.py: virtualize test search pattern per TestSuite test.py: virtualize write_xunit_report() test.py: ensure print_summary() is agnostic of test type test.py: tidy up print_summary() test.py: introduce base class Test for CQL and Unit tests test.py: move the default arguments handling to UnitTestSuite test.py: move custom unit test command line arguments to suite.yaml test.py: move command line argument processing to UnitTestSuite test.py: introduce add_test(), which is suite-specific ...	2020-01-16 12:05:50 +02:00
Pekka Enberg	e8b659ec5d	dist/docker: Remove Ubuntu-based Docker image The Ubuntu-based Docker image uses Scylla 1.0 and has not been updated since 2017. Let's remove it as unmaintained. Message-Id: <20200115102405.23567-1-penberg@scylladb.com>	2020-01-16 12:05:50 +02:00
Avi Kivity	546556b71b	Merge "allow commitlog to wait for specific entires to be flushed on disk" from Gleb " Currently commitlog supports two modes of operation. First is 'periodic' mode where all commitlog writes are ready the moment they are stored in a memory buffer and the memory buffer is flushed to a storage periodically. Second is a 'batch' mode where each write is flushed as soon as possible (after previous flush completed) and writes are only ready after they are flushed. The first option is not very durable, the second is not very efficient. This series adds an option to mark some writes as "more durable" in periodic mode meaning that they will be flushed immediately and reported complete only after the flush is complete (flushing a durable write also flushes all writes that came before it). It also changes paxos to use those durable writes to store paxos state. Note that strictly speaking the last patch is not needed since after writing to an actual table the code updates paxos table and the later uses durable writes that make sure all previous writes are flushed. Given that both writes supposed to run on the same shard this should be enough. But it feels right to make base table writes durable as well. " * 'gleb/commilog_sync_v4' of github.com:scylladb/seastar-dev: paxos: immediately sync commitlog entries for writes made by paxos learn stage paxos: mark paxos table schema as "always sync" schema: allow schema to be marked as 'always sync to commitlog' commitlog: add test for per entry sync mode database: pass sync flag from db::apply function to the commitlog commitlog: add sync method to entry_writer	2020-01-16 12:05:50 +02:00
Rafael Ávila de Espíndola	2ebd1463b2	tests: Handle null and not present values differently Before this patch result_set_assertions was handling both null values and missing values in the same way. This patch changes the handling of missing values so that now checking for a null value is not the same as checking for a value not being present. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200114184116.75546-1-espindola@scylladb.com>	2020-01-16 12:05:50 +02:00
Botond Dénes	0c52c2ba50	data: make cell::make_collection(): more consistent and safer `3ec889816` changed cell::make_collection() to take different code paths depending whether its `data` argument is nothrow copyable/movable or not. In case it is not, it is wrapped in a view to make it so (see the above mentioned commit for a full explanation), relying on the methods pre-existing requirement for callers to keep `data` alive while the created writer is in use. On closer look however it turns out that this requirement is neither respected, nor enforced, at least not on the code level. The real requirement is that the underlying data represented by `data` is kept alive. If `data` is a view, it is not expected to be kept alive and callers don't, it is instead copied into `make_collection()`. Non-views however are expected to be kept alive. This makes the API error prone. To avoid any future errors due to this ambiguity, require all `data` arguments to be nothrow copyable and movable. Callers are now required to pass views of nonconforming objects. This patch is a usability improvement and is not fixing a bug. The current code works as-is because it happens to conform to the underlying requirements. Refs: #5575 Refs: #5341 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200115084520.206947-1-bdenes@scylladb.com>	2020-01-16 12:05:50 +02:00
Amnon Heiman	ac8aac2b53	tests/cql_query_test: Add schema describe tests This patch adds tests for the describe method. test_describe_simple_schema tests regular tables. test_describe_view_schema tests view and index. Each test, create a table, find the schema, call the describe method and compare the results to the string that was used to create the table. The view tests also verify that adding an index or view does not change the base table. When comparing results, leading and trailing white spaces are ignored and all combination of whitespaces and new lines are treated equaly. Additional tests may be added at a future phase if required. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:07:57 +02:00
Amnon Heiman	028525daeb	database: add schema.cql file when creating a snapshot When creating a snapshot we need to add a schema.cql file in the snapshot directory that describes the table in that snapshot. This patch adds the file using the schema describe method. get_snapshot_details and manifest_json_filter were modified to ignore the schema.cql file. Fixes #4192 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:06:00 +02:00
Amnon Heiman	82367b325a	schema: Add a describe method This patch adds a describe method to a table schema. It acts similar to a DESCRIBE cql command that is implemented in a CQL driver. The method supports tables, secondary indexes local indexes and materialize views. relates to: #4192 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:06:00 +02:00
Amnon Heiman	6f58d51c83	secondary_index_manager: add the index_name_from_table_name function index_name_from_table_name is a reverse of index_table_name, it gets a table name that was generated for an index and return the name of the index that generated that table. Relates to #4192 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-01-15 15:06:00 +02:00
Pavel Emelyanov	555856b1cd	migration_manager: Use in-place value factory The factory is purely a state-less thing, there is no difference what instance of it to use, so we may omit referencing the storage_service in passive_announce This is 2nd simple migration_manager -> storage_service link to cut (more to come later). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	f129d8380f	migration_manager: Get database through storage_proxy There are several places where migration_manager needs storage_service reference to get the database from, thus forming the mutual dependency between them. This is the simplest case where the migration_manager link to the storage_service can be cut -- the databse reference can be obtained from storage_proxy instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	5cf365d7e7	database: Explicitly pass migration_manager through init_non_system_keyspace This is the last place where database code needs the migration_manager instance to be alive, so now the mutual dependency between these two is gone, only the migration_manager needs the database, but not the vice-versa. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	ebebf9f8a8	database: Do not request migration_manager instance for passive_announce The helper in question is static, so no need to play with the migration_manager instances. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	3f84256853	migration_manager: Remove register/unregister helpers In the 2nd patch the migration_manager kept those for simpler patching, but now we can drop it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	9e4b41c32a	tests: Switch on migration notifier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	9d31bc166b	cdc: Use migration_notifier to (un)register for events If no one provided -- get it from storage_service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:19 +03:00
Pavel Emelyanov	ecab51f8cc	storage_service: Use migration_notifier (and stop worrying) The storage_server needs migration_manager for notifications and carefully handles the manager's stop process not to demolish the listeners list from under itself. From now on this dependency is no longer valid (however the storage_service seems still need the migration_manager, but this is different story). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	7814ed3c12	cql_server: Use migration_notifier in events_notifier This patch removes an implicit cql_server -> migration_manager dependency, as the former's event notifier uses the latter for notifications. This dependency also breaks a loop: storage_service -> cql_server -> migration_manager -> storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	d9edcb3f15	query_processor: Use migration_notifier This patch breaks one (probably harmless but still) dependency loop. The query_processor -> migration_manager -> storage_proxy -> tracing -> query_processor. The first link is not not needed, as the query_processor needs the migration_manager purely to (ub)subscribe on notifications. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	2735024a53	auth: Use migration_notifier The same as with view builder. The constructor still needs both, but the life-time reference is now for notifier only. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	28f1250b8b	view_builder: Use migration notifier The migration manager itself is still needed on start to wait for schema agreement, but there's no longer the need for the life-time reference on it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	7cfab1de77	database: Switch on mnotifier from migration_manager Do not call for local migration manager instance to send notifications, call for the local migration notifier, it will always be alive. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	f45b23f088	storage_service: Keep migration_notifier The storage service will need this guy to initialize sub-services with. Also it registers itself with notifiers. That said, it's convenient to have the migration notifier on board. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	e327feb77f	database: Prepare to use on-database migration_notifier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	f240d5760c	migration_manager: Split notifier from main class The _listeners list on migration_manager class and the corresponding notify_xxx helpers have nothing to do with the its instances, they are just transport for notification delivery. At the same time some services need the migration manager to be alive at their stop time to unregister from it, while the manager itself may need them for its needs. The proposal is to move the migration notifier into a complete separate sharded "service". This service doesn't need anything, so it's started first and stopped last. While it's not effectively a "migration" notifier, we inherited the name from Cassandra and renaming it will "scramble neurons in the old-timers' brains but will make it easier for newcomers" as Avi says. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:19 +03:00
Pavel Emelyanov	074cc0c8ac	migration_manager: Helpers for on_before_ notifications Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:27:27 +03:00
Pavel Emelyanov	1992755c72	storage_service: Kill initialization helper from init.cc The helper just makes further patching more complex, so drop it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:27:27 +03:00
Konstantin Osipov	a665fab306	test.py: introduce BoostTest and virtualize custom boost arguments	2020-01-15 13:37:25 +03:00
Gleb Natapov	51672e5990	paxos: immediately sync commitlog entries for writes made by paxos learn stage	2020-01-15 12:15:42 +02:00
Gleb Natapov	0fc48515d8	paxos: mark paxos table schema as "always sync" We want all writes to paxos table to be persisted on a storage before declared completed.	2020-01-15 12:15:42 +02:00
Gleb Natapov	16e0fc4742	schema: allow schema to be marked as 'always sync to commitlog' All writes that uses this schema will be immediately persisted on a storage.	2020-01-15 12:15:42 +02:00
Gleb Natapov	0ce70c7a04	commitlog: add test for per entry sync mode	2020-01-15 12:15:42 +02:00
Gleb Natapov	29574c1271	database: pass sync flag from db::apply function to the commitlog Allow upper layers to request a mutation to be persisted on a disk before making future ready independent of which mode commitlog is running in.	2020-01-15 12:15:42 +02:00
Gleb Natapov	e0bc4aa098	commitlog: add sync method to entry_writer If the method returns true commitlog should sync to file immediately after writing the entry and wait for flush to complete before returning.	2020-01-15 12:15:42 +02:00
Piotr Sarna	9aab75db60	alternator: clean up single value rjson comparator The comparator is refreshed to ensure the following: - null compares less to all other types; - null, true and false are comparable against each other, while other types are only comparable against themselves and null. Comparing mixed types is not currently reachable from the alternator API, because it's only used for sets, which can only use strings, binary blobs and numbers - thus, no new pytest cases are added. Fixes #5454	2020-01-15 10:57:49 +02:00
Juliusz Stasiewicz	d87d01b501	storage_proxy: intercept rpc::closed_error if counter leader is down (#5579 ) When counter mutation is about to be sent, a leader is elected, but if the leader fails after election, we get `rpc::closed_error`. The exception propagates high up, causing all connections to be dropped. This patch intercepts `rpc::closed_error` in `storage_proxy::mutate_counters` and translates it to `mutation_write_failure_exception`. References #2859	2020-01-15 09:56:45 +01:00
Konstantin Osipov	a351ea57d5	test.py: sort tests within a suite, and sort suites This makes it easier to navigate the test artefacts. No need to sort suites since they are already stored in a dict.	2020-01-15 11:41:19 +03:00
Konstantin Osipov	ba87e73f8e	test.py: add a basic CQL test	2020-01-15 11:41:19 +03:00
Konstantin Osipov	44d31db1fc	test.py: add CQL .reject files to gitignore To avoid accidental commit, add .reject files to .gitignore	2020-01-15 11:41:19 +03:00
Konstantin Osipov	4f64f0c652	test.py: print a colored unidiff in case of test failure Print a colored unidiff between result and reject files in case of test failure.	2020-01-15 11:41:19 +03:00
Konstantin Osipov	d3f9e64028	test.py: add CqlTestSuite to run CQL tests Run the test and compare results. Manage temporary and .reject files. Now that there are CQL tests, improve logging. run_test success no longer means test success.	2020-01-15 11:41:19 +03:00
Konstantin Osipov	b114bfe0bd	test.py: initial import of CQL test driver, cql_repl cql_repl is a simple program which reads CQL from stdin, executes it, and writes results to stdout. It support --input, --output and --log options. --log is directed to cql_test.log by default. --input is stdin by default --output is stdout by default. The result set output is print with a basic JSON visitor.	2020-01-15 11:41:16 +03:00
Konstantin Osipov	0ec27267ab	test.py: remove custom colors and define a color palette Using a standard Python module improves readability, and allows using colors easily in other output.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	0165413405	test.py: split test output per test mode Store test temporary files and logs in ${testdir}/${mode}. Remove --jenkins and --xunit, and always write XML files at a predefined location: ${testdir}/${mode}/xml/. Use .xunit.xml extension for tests which XML output is in xunit format, and junit.xml for an accumulated output of all non-boost tests in junit format.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	4095ab08c8	test.py: remove tests_to_run Avoid storing each test twice, use per-tests list to construct a global iterable.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	169128f80b	test.py: virtualize Test.run(), to introduce CqlTest.Run next	2020-01-15 10:53:24 +03:00
Konstantin Osipov	d05f6c3cc7	test.py: virtualize test search pattern per TestSuite CQL tests have .cql extension, while unit tests have .cc.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	abcc182ab3	test.py: virtualize write_xunit_report() Make sure any non-boost test can participate in the report.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	18aafacfad	test.py: ensure print_summary() is agnostic of test type Introduce a virtual Test.print_summary() to print a failed test summary.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	21fbe5fa81	test.py: tidy up print_summary() Now that we have tabular output, make print_summary() more concise.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	c171882b51	test.py: introduce base class Test for CQL and Unit tests	2020-01-15 10:53:24 +03:00
Konstantin Osipov	fd6897d53e	test.py: move the default arguments handling to UnitTestSuite Move UnitTeset default seastar argument handling to UnitTestSuite (cleanup).	2020-01-15 10:53:24 +03:00
Konstantin Osipov	d3126f08ed	test.py: move custom unit test command line arguments to suite.yaml Load the command line arguments, if any, from suite.yaml, rather than keep them hard-coded in test.py. This is allows operations team to have easier access to these. Note I had to sacrifice dynamic smp count for mutation_reader_test (the new smp count is fixed at 3) since this is part of test configuration now.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	ef6cebcbd2	test.py: move command line argument processing to UnitTestSuite	2020-01-15 10:53:24 +03:00
Konstantin Osipov	4a20617be3	test.py: introduce add_test(), which is suite-specific	2020-01-15 10:53:24 +03:00
Konstantin Osipov	7e10bebcda	test.py: move long test list to suite.yaml Use suite.yaml for long tests	2020-01-15 10:53:24 +03:00
Konstantin Osipov	32ffde91ba	test.py: move test id assignment to TestSuite Going forward finding and creating tests will be a responsibility of TestSuite, so the id generator needs to be shared.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	b5b4944111	test.py: move repeat handling to TestSuite This way we can avoid iterating over all tests to handle --repeat. Besides, going forward the tests will be stored in two places: in the global list of all tests, for the runner, and per suite, for suite-based reporting, so it's easier if TestSuite if fully responsible for finding and adding tests.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	34a1b49fc3	test.py: move add_test_list() to TestSuite	2020-01-15 10:53:24 +03:00
Konstantin Osipov	44e1c4267c	test.py: introduce test suites - UnitTestSuite - for test/unit tests - BoostTestSuite - a tweak on UnitTestSuite, with options to log xml test output to a dedicated file	2020-01-15 10:53:24 +03:00
Konstantin Osipov	eed3201ca6	test.py: use path, rather than test kind, for search pattern Going forward there may be multiple suites of the same kind.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	f95c97667f	test.py: support arbitrary number of test suites Scan entire test/ for folders that contain suite.yaml, and load tests from these folders. Skip the rest. Each folder with a suite.yaml is expected to have a valid suite configuration in the yaml file. A suite is a folder with test of the same type. E.g. it can be a folder with unit tests, boost tests, or CQL tests. The harness will use suite.yaml to create an appropriate suite test driver, to execute tests in different formats.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	c1f8169cd4	test.py: add suite.yaml to boost and unit tests The plan is to move suite-specific settings to the configuration file.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	ec9ad04c8a	test.py: move 'success' to TestUnit class There will be other success attributes: program return status 0 doesn't mean the test is successful for all tests.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	b4aa4d35c3	test.py: save test output in tmpdir It is handy to have it so that a reference of a failed test is available without re-running it.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	f4efe03ade	test.py: always produce xml output, derive output paths from tmpdir It reduces the number of configurations to re-test when test.py is modified. and simplifies usage of test.py in build tools, since you no longer need to bother with extra arguments.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	d2b546d464	test.py: output job count in the log	2020-01-15 10:53:24 +03:00
Konstantin Osipov	233f921f9d	test.py: make test output brief&tabular New format: % ./test.py --verbose --mode=release ================================================================================ [N/TOTAL] TEST MODE RESULT ------------------------------------------------------------------------------ [1/111] boost/UUID_test release [ PASS ] [2/111] boost/enum_set_test release [ PASS ] [3/111] boost/like_matcher_test release [ PASS ] [4/111] boost/observable_test release [ PASS ] [5/111] boost/allocation_strategy_test release [ PASS ] ^C % ./test.py foo ================================================================================ [N/TOTAL] TEST MODE RESULT ------------------------------------------------------------------------------ [3/3] unit/memory_footprint_test debug [ PASS ] ------------------------------------------------------------------------------	2020-01-15 10:53:24 +03:00
Konstantin Osipov	879bea20ab	test.py: add a log file Going forward I'd like to make terminal output brief&tabular, but some test details are necessary to preserve so that a failure is easy to debug. This information now goes to the log file. - open and truncate the log file on each harness start - log options of each invoked test in the log, so that a failure is easy to reproduce - log test result in the log Since tests are run concurrently, having an exact trace of concurrent execution also helps debugging flaky tests.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	cbee76fb95	test.py: gitignore the default ./test.py tmpdir, ./testlog	2020-01-15 10:53:24 +03:00
Konstantin Osipov	1de69228f1	test.py: add --tmpdir It will be used for test log files.	2020-01-15 10:53:24 +03:00
Konstantin Osipov	caf742f956	test.py: flake8 style fix	2020-01-15 10:53:24 +03:00
Konstantin Osipov	dab364c87d	test.py: sort imports	2020-01-15 10:53:24 +03:00
Konstantin Osipov	7ec4b98200	test.py: make name a positional argument. Accept multiple test names, treat test name as a substring, and if the same name is given multiple times, run the test multiple times.	2020-01-15 10:53:24 +03:00
Dejan Mircevski	bb2e04cc8b	alternator: Improve comments on comparators Some comparator methods in conditions.cc use unexpected operators; explain why. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-01-14 22:25:55 +02:00
Tomasz Grabiec	c8a5a27bd9	Merge "storage_service: Move load_broadcaster away" from Pavel E. The storage_service struct is a collection of diverse things, most of them requiring only on start and on stop and/or runing on shard 0 (but is nonetheless sharded). As a part of clearing this structure and generated by it inter- -componenes dependencies, here's the sanitation of load_broadcaster.	2020-01-14 19:26:06 +01:00
Calle Wilund	313ed91ab0	cdc: Listen for migration callbacks on all shards Fixes #5582 ... but only populate log on shard 0. Migration manager callbacks are slightly assymetric. Notifications for pre-create/update mutations are sent only on initiating shard (neccesary, because we consider the mutations mutable). But "created" callbacks are sent on all shards (immutable). We must subscribe on all shards, but still do population of cdc table only once, otherwise we can either miss table creat or populate more than once. v2: - Add test case Message-Id: <20200113140524.14890-1-calle@scylladb.com>	2020-01-14 16:35:41 +01:00
Avi Kivity	2138657d3a	Update seastar submodule * seastar 36cf5c5ff0...3f3e117de3 (16): > memcached: don't use C++17-only std::optional > reactor: Comment why _backend is assigned in constructor body > log: restore --log-to-stdout for backward compatibility > used_size.hh: Include missing headers > core: Move some code from reactor.cc to future.cc > future-util: move parallel_for_each to future-util.cc > task: stop wrapping tasks with unique_ptr > Merge "Setup timer signal handler in backend constructor" from Pavel Fixes #5524 > future: avoid a branch in future's move constructor if type is trivial > utils: Expose used_size > stream: Call get_future early > future-util: Move parallel_for_each_state code to a .cc > memcached: log exceptions > stream: Delete dead code > core: Turn pollable_fd into a simple proxy over pollable_fd_state. > Merge "log to std::cerr" from Benny	2020-01-14 16:56:25 +02:00
Pavel Emelyanov	e1ed8f3f7e	storage_service: Remove _shadow_token_metadata This is the part of de-bloating storage_service. The field in question is used to temporary keep the _token_metadata value during shard-wide replication. There's no need to have it as class member, any "local" copy is enough. Also, as the size of token_metadata is huge, and invoke_on_all() copies the function for each shard, keep one local copy of metadata using do_with() and pass it into the invoke_on_all() by reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Reviewed-by: Asias He <asias@scylladb.com> Message-Id: <20200113171657.10246-1-xemul@scylladb.com>	2020-01-14 16:29:10 +02:00
Rafael Ávila de Espíndola	054f5761a7	types: Refactor code into a serialize_varint helper This is a bit cleaner and avoids a boost::multiprecision::cpp_int copy while serializing a decimal. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200110221422.35807-1-espindola@scylladb.com>	2020-01-14 16:28:27 +02:00
Avi Kivity	6c84dd0045	cql3: update_statement: do not set query option always_return_static_content for list read-before-write The query option always_return_static_content was added for lightweight transations in commits `e0b31dd273` (infrastructure) and `65b86d155e` (actual use). However, the flag was added unconditionally to update_parameters::options. This caused it to be set for list read-modify-write operations, not just for lightweight transactions. This is a little wasteful, and worse, it breaks compatibility as old nodes do not understand the always_return_static_content flag and complain when they see it. To fix, remove the always_return_static_content from update_parameters::options and only set it from compare-and-swap operations that are used to implement lightweight transactions. Fixes #5593. Reviewed-by: Gleb Natapov <gleb@scylladb.com> Message-Id: <20200114135133.2338238-1-avi@scylladb.com>	2020-01-14 16:15:20 +02:00
Hagit Segev	ef88e1e822	CentOS RPMs: Remove target to enable general centos.	2020-01-14 14:31:03 +02:00
Alejo Sanchez	6909d4db42	cql3: BYPASS CACHE query counter This patch is the first part of requested full scan metrics. It implements a counter of SELECT queries with BYPASS CACHE option. In scope of #5209 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200113222740.506610-2-alejo.sanchez@scylladb.com>	2020-01-14 12:19:00 +02:00
Rafael Ávila de Espíndola	dca1bc480f	everywhere: Use serialized(foo) instead of data_value(foo).serialize() This is just a simple cleanup that reduces the size of another patch I am working on and is an independent improvement. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200114051739.370127-1-espindola@scylladb.com>	2020-01-14 12:17:12 +02:00
Pavel Emelyanov	b9f28e9335	storage_service: Remove dead drain branch The drain_in_progress variable here is the future that's set by the drain() operation itself. Its promise is set when the drain() finishes. The check for this future in the beginning of drain() is pointless. No two drain()-s can run in parallels because of run_with_api_lock() protection. Doing the 2nd drain after successfull 1st one is also impossible due to the _operation_mode check. The 2nd drain after _exceptioned_ (and thus incomplete) 1st one will deadlock, after this patch will try to drain for the 2nd time, but that should by ok. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200114094724.23876-1-xemul@scylladb.com>	2020-01-14 12:07:29 +02:00
Piotr Sarna	36ec43a262	Merge "add table with connected cql clients" from Juliusz This change introduces system.clients table, which provides information about CQL clients connected. PK is the client's IP address, CK consists of outgoing port number and client_type (which will be extended in future to thrift/alternator/redis). Table supplies also shard_id and username. Other columns, like connection_stage, driver_name, driver_version..., are currently empty but exist for C* compatibility and future use. This is an ordinary table (i.e. non-virtual) and it's updated upon accepting connections. This is also why C*'s column request_count was not introduced. In case of abrupt DB stop, the table should not persist, so it's being truncated on startup. Resolves #4820	2020-01-14 10:01:07 +02:00
Avi Kivity	1f46133273	Merge "data: make cell::make_collection() exception safe" from Botond " Most of the code in `cell` and the `imr` infrastructure it is built on is `noexcept`. This means that extra care must be taken to avoid rouge exceptions as they will bring down the node. The changes introduced by 0a453e5d3a did just that - introduced rouge `std::bad_alloc` into this code path by violating an undocumented and unvalidated assumption -- that fragment ranges passed to `cell::make_collection()` are nothrow copyable and movable. This series refactors `cell::make_collection()` such that it does not have this assumption anymore and is safe to use with any range. Note that the unit test included in this series, that was used to find all the possible exception sources will not be currently run in any of our build modes, due to `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` not being set. I plan to address this in a followup because setting this flags fails other tests using the failure injection mechanism. This is because these tests are normally run with the failure injection disabled so failures managed to lurk in without anyone noticing. Fixes: #5575 Refs: #5341 Tests: unit(dev, debug) " * 'data-cell-make-collection-exception-safety/v2' of https://github.com/denesb/scylla: test: mutation_test: add exception safety test for large collection serialization data/cell.hh: avoid accidental copies of non-nothrow copiable ranges utils/fragment_range.hh: introduce fragment_range_view	2020-01-14 10:01:06 +02:00
Nadav Har'El	5b08ec3d2c	alternator: error on unsupported ScanIndexForward=false We do not yet support the ScanIndexForward=false option for reversing the sort order of a Query operation, as reported in issue #5153. But even before implementing this feature, it is important that we produce an error if a user attempts to use it - instead of outright ignoring this parameter and giving the user wrong results. This is what this patch does. Before this patch, the reverse-order query in the xfailing test test_query.py::test_query_reverse seems to succeed - yet gives results in the wrong order. With this patch, the query itself fails - stating that the ScanIndexForward=false argument is not supported. Refs #5153 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200105113719.26326-1-nyh@scylladb.com>	2020-01-14 10:01:06 +02:00
Pavel Emelyanov	c4bf532d37	storage_service: Fix race in removenode/force_removenode/other Here's another theoretical problem, that involves 3 sequential calls to respectively removenode, force_removenode and some other operation. Let's walk through them First goes the removenode: run_with_api_lock _operation_in_progress = "removenode" storage_service::remove_node sleep in replicating_nodes.empty() loop Now the force_removenode can run: run_with_no_api_lock storage_service::force_removenode check _operation_in_progress (not empty) _force_remove_completion = true sleep in _operation_in_progress.empty loop Now the 1st call wakes up and: if _force_remove_completion == true throw <some exception> .finally() handler in run_with_api_lock _operation_in_progress = <empty> At this point some other operation may start. Say, drain: run_with_api_lock _operation_in_progress = "drain" storage_service::drain ... go to sleep somewhere No let's go back to the 1st op that wakes up from its sleep. The code it executes is while (!ss._operation_in_progress.empty()) { sleep_abortable() } and while the drain is running it will never exit. However (! and this is the core of the race) should the drain operation happen _before_ the force_removenode, another check for _operation_in_progress would have made the latter exit with the "Operation drain is in progress, try again" message. Fix this inconsistency by making the check for current operation every wake-up from the sleep_abortable. Fixes #5591 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-14 10:01:06 +02:00
Pavel Emelyanov	cc92683894	storage_service: Fix race and deadlock in removenode/force_removenode Here's a theoretical problem, that involves 3 sequential calls to respectively removenode, force_removenode and removenode (again) operations. Let's walk through them First goes the removenode: run_with_api_lock _operation_in_progress = "removenode" storage_service::remove_node sleep in replicating_nodes.empty() loop Now the force_removenode can run: run_with_no_api_lock storage_service::force_removenode check _operation_in_progress (not empty) _force_remove_completion = true sleep in _operation_in_progress.empty loop Now the 1st call wakes up and: if _force_remove_completion == true _force_remove_completion = false throw <some exception> .finally() handler in run_with_api_lock _operation_in_progress = <empty> ! at this point we have _force_remove_completion = false and _operation_in_progress = <empty>, which opens the following opportunity for the 3d removenode: run_with_api_lock _operation_in_progress = "removenode" storage_service::remove_node sleep in replicating_nodes.empty() loop Now here's what we have in 2nd and 3rd ops: 1. _operation_in_progress = "removenode" (set by 3rd) prevents the force_removenode from exiting its loop 2. _force_remove_completion = false (set by 1st on exit) prevents the removenode from waiting on replicating_nodes list One can start the 4th call with force_removenode, it will proceed and wake up the 3rd op, but after it we'll have two force_removenode-s running in parallel and killing each other. I propose not to set _force_remove_completion to false in removenode, but just exit and let the owner of this flag unset it once it gets the control back. Fixes #5590 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-14 10:01:06 +02:00
Benny Halevy	ff55b5dca3	cql3: functions: limit sum overflow detection to integral types Other types do not have a wider accumulator at the moment. And static_cast<accumulator_type>(ret) != _sum evaluates as false for NaN/Inf floating point values. Fixes #5586 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200112183436.77951-1-bhalevy@scylladb.com>	2020-01-14 10:01:06 +02:00
Avi Kivity	e3310201dd	atomic_cell_or_collection: type-aware print atomic_cell or collection components Now that atomic_cell_view and collection_mutation_view have type-aware printers, we can use them in the type-aware atomic_cell_or_collection printer. Message-Id: <20191231142832.594960-1-avi@scylladb.com>	2020-01-14 10:01:06 +02:00
Avi Kivity	931b196d20	mutation_partition: row: resolve column name when in schema-aware printer Instead of printing the column id, print the full column name. Message-Id: <20191231142944.595272-1-avi@scylladb.com>	2020-01-14 10:01:06 +02:00
Nadav Har'El	4aa323154e	merge: Pretty print canonical_mutation objects Merged pull request https://github.com/scylladb/scylla/pull/5533 from Avi Kivity: canonical_mutation objects are used for schema reconciliation, which is a fragile area and thus deserves some debugging help. This series makes canonical_mutation objects printable.	2020-01-14 10:01:06 +02:00
Takuya ASADA	5241deda2d	dist: nonroot: fix CLI tool path for nonroot (#5584 ) CLI tool path is hardcorded, need to specify correct path on nonroot.	2020-01-14 10:01:06 +02:00
Nadav Har'El	1511b945f8	merge: Handle multiple regular base columns in view pk Merged patch series from Piotr Sarna: "Previous assumption was that there can only be one regular base column in the view key. The assumption is still correct for tables created via CQL, but it's internally possible to create a view with multiple such columns - the new assumption is that if there are multiple columns, they share their liveness. This series is vital for indexing to work properly on alternator, so it would be best to solve the issue upstream. I strived to leave the existing semantics intact as long as only up to one regular column is part of the materialized view primary key, which is the case for Scylla's materialized views. For alternator it may not be true, but all regular columns in alternator share liveness info (since alternator does not support per-column TTL), which is sufficient to compute view updates in a consistent way. Fixes #5006 Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)" Piotr Sarna (3): db,view: fix checking if partition key is empty view: handle multiple regular base columns in view pk test: add a case for multiple base regular columns in view key alternator-test/test_gsi.py \| 1 - view_info.hh \| 5 +- cql3/statements/alter_table_statement.cc \| 2 +- db/view/view.cc \| 77 ++++++++++++++---------- mutation_partition.cc \| 2 +- test/boost/cql_query_test.cc \| 58 ++++++++++++++++++ 6 files changed, 109 insertions(+), 36 deletions(-)	2020-01-14 10:01:00 +02:00
Nadav Har'El	f16e3b0491	merge: bouncing lwt request to an owning shard Merged patch series from Gleb Natapov: "LWT is much more efficient if a request is processed on a shard that owns a token for the request. This is because otherwise the processing will bounce to an owning shard multiple times. The patch proposes a way to move request to correct shard before running lwt. It works by returning an error from lwt code if a shard is incorrect one specifying the shard the request should be moved to. The error is processed by the transport code that jumps to a correct shard and re-process incoming message there. The nicer way to achieve the same would be to jump to a right shard inside of the storage_proxy::cas(), but unfortunately with current implementation of the modification statements they are unusable by a shard different from where it was created, so the jump should happen before a modification statement for an cas() is created. When we fix our cql code to be more cross-shard friendly this can be reworked to do the jump in the storage_proxy." Gleb Natapov (4): transport: change make_result to takes a reference to cql result instead of shared_ptr storage_service: move start_native_transport into a thread lwt: Process lwt request on a owning shard lwt: drop invoke_on in paxos_state prepare and accept auth/service.hh \| 5 +- message/messaging_service.hh \| 2 +- service/client_state.hh \| 30 +++- service/paxos/paxos_state.hh \| 10 +- service/query_state.hh \| 6 + service/storage_proxy.hh \| 2 + transport/messages/result_message.hh \| 20 +++ transport/messages/result_message_base.hh \| 4 + transport/request.hh \| 4 + transport/server.hh \| 25 ++- cql3/statements/batch_statement.cc \| 6 + cql3/statements/modification_statement.cc \| 6 + cql3/statements/select_statement.cc \| 8 + message/messaging_service.cc \| 2 +- service/paxos/paxos_state.cc \| 48 ++--- service/storage_proxy.cc \| 47 ++++- service/storage_service.cc \| 120 +++++++------ test/boost/cql_query_test.cc \| 1 + thrift/handler.cc \| 3 + transport/messages/result_message.cc \| 5 + transport/server.cc \| 203 ++++++++++++++++------ 21 files changed, 377 insertions(+), 180 deletions(-)	2020-01-14 09:59:59 +02:00
Botond Dénes	300728120f	test: mutation_test: add exception safety test for large collection serialization Use `seastar::memory::local_failure_injector()` to inject al possible `std::bad_alloc`:s into the collection serialization code path. The test just checks that there are no `std::abort()`:s caused by any of the exceptions. The test will not be run if `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` is not defined.	2020-01-13 16:53:35 +02:00
Botond Dénes	3ec889816a	data/cell.hh: avoid accidental copies of non-nothrow copiable ranges `cell::make_collection()` assumes that all ranges passed to it are nothrow copyable and movable views. This is not guaranteed, is not expressed in the interface and is not mentioned in the comments either. The changes introduced by 0a453e5d3a to collection serialization, making it use fragmented buffers, fell into this trap, as it passes `bytes_ostream` to `cell::make_collection()`. `bytes_ostream`'s copy constructor allocates and hence can throw, triggering an `std::terminate()` inside `cell::make_collection()` as the latter is noexcept. To solve this issue, non-nothrow copyable and movable ranges are now wrapped in a `fragment_range_view` to make them so. `cell::make_collection()` already requires callers to keep alive the range for the duration of the call, so this does not introduce any new requirements to the callers. Additionally, to avoid any future accidents, do not accept temporaries for the `data` parameter. We don't ever want to move this param anyway, we will either have a trivially copyable view, or a potentially heavy-weight range that we will create a trivially copyable view of.	2020-01-13 16:53:35 +02:00
Botond Dénes	b52b4d36a2	utils/fragment_range.hh: introduce fragment_range_view A lightweight, trivially copyable and movable view for fragment ranges. Allows for uniform treatment of all kinds of ranges, i.e. treating all of them as a view. Currently `fragment_range.hh` provides lightweight, view-like adaptors for empty and single-fragment ranges (`bytes_view`). To allow code to treat owning multi-fragment ranges the shame way as the former two, we need a view for the latter as well -- this is `fragment_range_view`.	2020-01-13 16:52:59 +02:00
Calle Wilund	75f2b2876b	cdc: Remove free function for mutation augmentation	2020-01-13 13:18:55 +00:00
Calle Wilund	3eda3122af	cdc: Move mutation augment from cql3::modification_statement to storage proxy Using the attached service object	2020-01-13 13:18:55 +00:00
Juliusz Stasiewicz	27dfda0b9e	main/transport: using the infrastructure of system.clients Resolves #4820. Execution path in main.cc now cleans up system.clients table if it exists (this is done on startup). Also, server.cc now calls functions that notify about cql clients connecting/disconnecting.	2020-01-13 14:07:04 +01:00
Pavel Emelyanov	148da64a7e	storage_servce: Move load_broadcaster away This simplifies the storage_service API and fixes the complain about shared_ptr usage instead of unique_ptr. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-13 13:55:09 +03:00
Pavel Emelyanov	b6e1e6df64	misc_services: Introduce load_meter There's a lonely get_load_map() call on storage_service that needs only load broadcaster, always runs on shard 0 and that's it. Next patch will move this whole stuff into its own helper no-shard container and this is preparation for this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-13 13:53:08 +03:00
Gleb Natapov	5753ab7195	lwt: drop invoke_on in paxos_state prepare and accept Since lwt requests are now running on an owning shard there is no longer a need to invoke cross shard call on paxos_state level. RPC calls may still arrive to a wrong shard so we need to make cross shard call there.	2020-01-13 10:26:02 +02:00
Gleb Natapov	d28dd4957b	lwt: Process lwt request on a owning shard LWT is much more efficient if a request is processed on a shard that owns a token for the request. This is because otherwise the processing will bounce to an owning shard multiple times. The patch proposes a way to move request to correct shard before running lwt. It works by returning an error from lwt code if a shard is incorrect one specifying the shard the request should be moved to. The error is processed by transport code that jumps to a correct shard and re-process incoming message there.	2020-01-13 10:26:02 +02:00
Piotr Sarna	3853594108	alternator-test: turn off TLS self-signed verification Two test cases did not ignore TLS self-signed warnings, which are used locally for testing HTTPS. Fixes #5557 Tests(test_health, test_authorization) Message-Id: <8bda759dc1597644c534f94d00853038c2688dd7.1578394444.git.sarna@scylladb.com>	2020-01-10 15:31:30 +02:00
Rafael Ávila de Espíndola	5313828ab8	cql3: Fix indentation Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200109025855.10591-2-espindola@scylladb.com>	2020-01-09 10:42:55 +02:00
Rafael Ávila de Espíndola	4da6dc1a7f	cql3: Change a lambda capture order to match another Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200109025855.10591-1-espindola@scylladb.com>	2020-01-09 10:42:49 +02:00
Avi Kivity	6d454d13ac	db/schema_tables: make gratuitous generic lambdas in do_merge_schema() concrete Those gratuitous lambdas make life harder for IDE users by hiding the actual types from the IDEs. Message-Id: <20200107154746.1918648-1-avi@scylladb.com>	2020-01-08 17:43:18 +01:00
Avi Kivity	454074f284	Merge "database: Avoid OOMing with flush continuations after failed memtable flush" from Tomasz " The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717 " * tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla: database: Avoid OOMing with flush continuations after failed memtable flush lsa: Introduce operator bool() to occupancy_stats lsa: Expose region_impl::evictable_occupancy in the region class	2020-01-08 16:58:54 +02:00
Gleb Natapov	feed544c5d	paxos: fix truncation time checking during learn stage The comparison is done in millisecons, not microseconds. Fixes #5566 Message-Id: <20200108094927.GN9084@scylladb.com>	2020-01-08 14:37:07 +01:00
Gleb Natapov	2832f1d9eb	storage_service: move start_native_transport into a thread The code runs only once and it is simple if it runs in a seastar thread.	2020-01-08 14:57:57 +02:00
Gleb Natapov	7fb2e8eb9f	transport: change make_result to takes a reference to cql result instead of shared_ptr	2020-01-08 14:57:57 +02:00
Avi Kivity	0bde5906b3	Merge "cql3: detect and handle int overflow in aggregate functions #5537 " from Benny " Fix overflow handling in sum() and avg(). sum: - aggregated into __int128 - detect overflow when computing result and log a warning if found avg: - fix division function to divide the accumulator type _sum (__int128 for integers) by _count Add unit tests for both cases Test: - manual test against Cassandra 3.11.3 to make sure the results in the scylla unit test agree with it. - unit(dev), cql_query_test(debug) Fixes #5536 " * 'cql3-sum-overflow' of https://github.com/bhalevy/scylla: test: cql_query_test: test avg overflow cql3: functions: protect against int overflow in avg test: cql_query_test: test sum overflow cql3: functions: detect and handle int overflow in sum exceptions: sort exception_code definitions exceptions: define additional cassandra CQL exceptions codes	2020-01-08 10:39:38 +02:00
Avi Kivity	d649371baa	Merge "Fix crash on SELECT SUM(udf(...))" from Rafael " We were failing to start a thread when the UDF call was nested in an aggregate function call like SUM. " * 'espindola/fix-sum-of-udf' of https://github.com/espindola/scylla: cql3: Fix indentation cql3: Add missing with_thread_if_needed call cql3: Implement abstract_function_selector::requires_thread remove make_ready_future call	2020-01-08 10:25:42 +02:00
Benny Halevy	dafbd88349	query: initialize read_command timestamp to now This was initialized to api::missing_timestamp but should be set to either a client provided-timestamp or the server's. Unlike write operations, this timestamp need not be unique as the one generated by client_state::get_timestamp. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200108074021.282339-2-bhalevy@scylladb.com>	2020-01-08 10:19:07 +02:00
Benny Halevy	39325cf297	storage_proxy: fix int overflow in service::abstract_read_executor::execute exec->_cmd->read_timestamp may be initialized by default to api::min_timestamp, causing: service/storage_proxy.cc:3328:116: runtime error: signed integer overflow: 1577983890961976 - -9223372036854775808 cannot be represented in type 'long int' Aborting on shard 1. Do not optimize cross-dc repair if read_timestamp is missing (or just negative) We're interested in reads that happen within write_timeout of a write. Fixes #5556 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200108074021.282339-1-bhalevy@scylladb.com>	2020-01-08 10:18:59 +02:00
Raphael S. Carvalho	390c8b9b37	sstables: Move STCS implementation to source file header only implementation potentially create a problem with duplicate symbols Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200107154258.9746-1-raphaelsc@scylladb.com>	2020-01-08 09:55:35 +02:00
Benny Halevy	20a0b1a0b6	test: cql_query_test: test avg overflow Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:50:50 +02:00
Benny Halevy	1c81422c1b	cql3: functions: protect against int overflow in avg Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:48:33 +02:00
Benny Halevy	9053ef90c7	test: cql_query_test: test sum overflow Add unit tests for summing up int's and bigint's with possible handling of overflow. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:48:33 +02:00
Benny Halevy	e97a111f64	cql3: functions: detect and handle int overflow in sum Detect integer overflow in cql sum functions and throw an error. Note that Cassandra quietly truncates the sum if it doesn't fit in the input type but we rather break compatibility in this case. See https://issues.apache.org/jira/browse/CASSANDRA-4914?focusedCommentId=14158400&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14158400 Fixes #5536 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:48:33 +02:00
Benny Halevy	98260254df	exceptions: sort exception_code definitions Be compatible with Cassandra source. It's easier to maintain this way. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:48:21 +02:00
Benny Halevy	30d0f1df75	exceptions: define additional cassandra CQL exceptions codes As of `e9da85723a` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-08 09:40:57 +02:00
Rafael Ávila de Espíndola	282228b303	cql3: Fix indentation Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola	4316bc2e18	cql3: Add missing with_thread_if_needed call This fixes an assert when doing sum(udf(...)). Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-07 22:14:50 -08:00
Rafael Ávila de Espíndola	d301d31de0	cql3: Implement abstract_function_selector::requires_thread Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-07 22:14:24 -08:00
Rafael Ávila de Espíndola	dc9b3b8ff2	remove make_ready_future call Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-07 22:10:27 -08:00
Calle Wilund	9f6b22d882	cdc: Assign self to storage proxy object	2020-01-07 12:01:58 +00:00
Calle Wilund	fc5904372b	storage_proxy: Add (optional) cdc service object pointer member The cdc service is assigned from outside, post construction, mainly because of the chickens and eggs in main startup. Would be nice to have it unconditionally, but this is workable.	2020-01-07 12:01:58 +00:00
Calle Wilund	d6003253dd	storage_proxy: Move mutate_counters to private section It is (and shall) only be called from inside storage proxy, and we would like this to be reflected in the interface so our eventual moving of cdc logic into the mutate call chains become easier to verify and comprehend.	2020-01-07 12:01:58 +00:00
Calle Wilund	b6c788fccf	cdc: Add augmentation call to cdc service To eventually replace the free function. Main difference is this is build to both handle batches correctly and to eventually allow hanging cdc object on storage proxy, and caches on the cdc object.	2020-01-07 12:01:58 +00:00
Piotr Sarna	04dc8faec9	test: add a case for multiple base regular columns in view key The test case checks that having two base regular columns in the materialized view key (not obtainable via CQL), still works fine when values are inserted or deleted. If TTL was involved and these columns would have different expiration rules, the case would be more complicated, but it's not possible for a user to reach that case - neither with CQL, nor with alternator.	2020-01-07 12:19:06 +01:00
Piotr Sarna	155a47cc55	view: handle multiple regular base columns in view pk Previous assumption was that there can only be one regular base column in the view key. The assumption is still correct for tables created via CQL, but it's internally possible to create a view with multiple such columns - the new assumption is that if there are multiple columns, they share their liveness. This patch is vital for indexing to work properly on alternator, so it would be best to solve the issue upstream. I strived to leave the existing semantics intact as long as only up to one regular column is part of the materialized view primary key, which is the case for Scylla's materialized views. For alternator it may not be true, but all regular columns in alternator share liveness info (since alternator does not support per-column TTL), which is sufficient to compute view updates in a consistent way. Fixes #5006 Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo) Message-Id: <c9dec243ce903d3a922ce077dc274f988bcf5d57.1567604945.git.sarna@scylladb.com>	2020-01-07 12:18:39 +01:00
Avi Kivity	6e0a073b2e	mutation_partition: use type-aware printing of the clustering row Now that position_in_partition_view has type-aware printing, use it to provide a human readable version of clustering keys. Message-Id: <20191231151315.602559-2-avi@scylladb.com>	2020-01-07 12:17:11 +01:00
Avi Kivity	488c42408a	position_in_partition_view: add type-aware printer If the position_in_partition_view represents a clustering key, we can now see it with the clustering key decoded according to the schema. Message-Id: <20191231151315.602559-1-avi@scylladb.com>	2020-01-07 12:15:09 +01:00
Piotr Sarna	54315f89cd	db,view: fix checking if partition key is empty Previous implementation did not take into account that a column in a partition key might exist in a mutation, but in a DEAD state - if it's deleted. There are no regressions for CQL, while for alternator and its capability of having two regular base columns in a view key, this additional check must be performed.	2020-01-07 12:05:36 +01:00
Avi Kivity	3a3c20d337	schema_tables: de-templatize diff_table_or_view() This reduces code bloat and makes the code friendlier for IDEs, as the IDE now understands the type of create_schema. Message-Id: <20191231134803.591190-1-avi@scylladb.com>	2020-01-07 11:56:54 +01:00
Avi Kivity	e5e42672f5	sstables: reduce bloat from sstables::write_simple() sstables::write_simple() has quite a lot of boilerplate which gets replicated into each template instance. Move all of that into a non-template do_write_simple(), leaving only things that truly depend on the component being written in the template, and encapsulating them with a noncopyable_function. An explicit template instantiation was added, since this is used in a header file. Before, it likely worked by accident and stopped working when the template became small enough to inline. Tests: unit (dev) Message-Id: <20200106135453.1634311-1-avi@scylladb.com>	2020-01-07 11:56:11 +01:00
Avi Kivity	8f7f56d6a0	schema_tables: make gratuitous generic lambda in create_tables_from_partitions() concrete The generic lambda made IDE searches for create_table_from_table_row() fail. Message-Id: <20191231135210.591972-1-avi@scylladb.com>	2020-01-07 11:49:10 +01:00
Avi Kivity	92fd83d3af	schema_tables: make gratuitoous generic lambda in create_table_from_name() concrete The lambda made IDE searches for read_table_mutations fail. Message-Id: <20191231135103.591741-1-avi@scylladb.com>	2020-01-07 11:48:56 +01:00
Avi Kivity	dd6dd97df9	schema_tables: make gratuitous generic lambda in merge_tables_and_views() concrete The generic lambda made IDE searches for create_table_from_mutations fail. Message-Id: <20191231135059.591681-1-avi@scylladb.com>	2020-01-07 11:48:39 +01:00
Avi Kivity	c63cf02745	canonical_mutation: add pretty printing Add type-aware printing of canonical_mutation objects.	2020-01-07 12:06:31 +02:00
Avi Kivity	e093121687	mutation_partition_view: add virtual visitor mutation_partition_view now supports a compile-time resolved visitor. This is performant but results in bloat when the performance is not needed. Furthermore, the template function that applies the object to the visitor is private and out-of-line, to reduce compile time. To allow visitation on mutation_partition_view objects, add a virtual visitor type and a non-template accept function. Note: mutation_partition_visitor is very similar to the new type, but different enough to break the template visitor which is used to implement the new visitor. The new visitor will be used to implement pretty printing for canonical_mutation.	2020-01-07 12:06:31 +02:00
Avi Kivity	75d9909b27	collection_mutation_view: add type-aware pretty printer Add a way for the user to associate a type with a collection_mutation_view and get a nice printout.	2020-01-07 12:06:29 +02:00
Rafael Ávila de Espíndola	b80852c447	main: Explicitly allow scylla core dumps I have not looked into the security reason for disabling it when a program has file capabilities. Fixes #5560 [avi: remove extraneous semicolon] Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200106231836.99052-1-espindola@scylladb.com>	2020-01-07 11:15:59 +02:00
Rafael Ávila de Espíndola	07f1cb53ea	tests: run with ASAN_OPTIONS='disable_coredump=0:abort_on_error=1' These are the same options we use in seastar. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200107001513.122238-1-espindola@scylladb.com>	2020-01-07 11:11:49 +02:00
Takuya ASADA	238a25a0f4	docker: fix typo of scylla-jmx script path (#5551 ) The path should /opt/scylladb/jmx, not /opt/scylladb/scripts/jmx. Fixes #5542	2020-01-07 10:54:16 +02:00
Asias He	401854dbaf	repair: Avoid duplicated partition_end write Consider this: 1) Write partition_start of p1 2) Write clustering_row of p1 3) Write partition_end of p1 4) Repair is stopped due to error before writing partition_start of p2 5) Repair calls repair_row_level_stop() to tear down which calls wait_for_writer_done(). A duplicate partition_end is written. To fix, track the partition_start and partition_end written, avoid unpaired writes. Backports: 3.1 and 3.2 Fixes: #5527	2020-01-06 14:06:02 +02:00
Eliran Sinvani	e64445d7e5	debian-reloc: Propagate PRODUCT variable to renaming command in debian pkg commit `21dec3881c` introduced a bug that will cause scylla debian build to fail. This is because the commit relied on the environment PRODUCT variable to be exported (and as a result, to propogate to the rename command that is executed by find in a subshell) This commit fixes it by explicitly passing the PRODUCT variable into the rename command. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20200106102229.24769-1-eliransin@scylladb.com>	2020-01-06 12:31:58 +02:00
Asias He	38d4015619	gossiper: Remove HIBERNATE status from dead state In scylla, the replacing node is set as HIBERNATE status. It is the only place we use HIBERNATE status. The replacing node is supposed to be alive and updating its heartbeat, so it is not supposed to be in dead state. This patch fixes the following problem in replacing. 1) start n1, n2 2) n2 is down 3) start n3 to replace n2, but kill n3 in the middle of the replace 4) start n4 to replace n2 After step 3 and step 4, the old n3 will stay in gossip forever until a full cluster shutdown. Note n3 will only stay in gossip but in system.peers table. User will see the annoying and infinite logs like on all the nodes rpc - client $ip_of_n3:7000: fail to connect: Connection refused Fixes: #5449 Tests: replace_address_test.py + manual test	2020-01-06 11:47:31 +02:00
Amos Kong	c5ec1e3ddc	scylla_ntp_setup: check redhat variant version by prase_version (#5434 ) VERSION_ID of centos7 is "7", but VERSION_ID of oel7.7 is "7.7" scylla_ntp_setup doesn't work on OEL7.7 for ValueError. - ValueError: invalid literal for int() with base 10: '7.7' This patch changed redhat_version() to return version string, and compare with parse_version(). Fixes #5433 Signed-off-by: Amos Kong <amos@scylladb.com>	2020-01-06 11:43:14 +02:00
Asias He	145fd0313a	streaming: Fix map access in stream_manager::get_progress When the progress is queried, e.g., query from nodetool netstats the progress info might not be updated yet. Fix it by checking before access the map to avoid errors like: std::out_of_range (_Map_base::at) Fixes: #5437 Tests: nodetool_additional_test.py:TestNodetool.netstats_test	2020-01-06 10:31:15 +02:00
Rafael Ávila de Espíndola	98cd8eddeb	tests: Run with halt_on_error=1:abort_on_error=1 This depends on the just emailed fixes to undefined behavior in tests. With this change we should quickly notice if a change introduces undefined behavior. Fixes #4054 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191230222646.89628-1-espindola@scylladb.com>	2020-01-05 17:20:31 +02:00
Rafael Ávila de Espíndola	dc5ecc9630	enum_option_test: Add explicit underlying types to enums We expect to be able to create variables with out of range values, so these enums needs explicit underlying types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200102173422.68704-1-espindola@scylladb.com>	2020-01-05 17:20:31 +02:00
Nadav Har'El	f0d8dd4094	merge: CDC rolling upgrade Merged pull request https://github.com/scylladb/scylla/pull/5538 from Avi Kivity and Piotr Jastrzębski. This series prepares CDC for rolling upgrade. This consists of reducing the footprint of cdc, when disabled, on the schema, adding a cluster feature, and redacting the cdc column when transferring it to other nodes. The latter is needed because we'll want to backport this to 3.2, which doesn't have canonical_mutations yet.	2020-01-05 17:13:12 +02:00
Gleb Natapov	720c0aa285	commitlog: update last sync timestamp when cycle a buffer If in memory buffer has not enough space for incoming mutation it is written into a file, but the code missed updating timestamp of a last sync, so we may sync to often. Message-Id: <20200102155049.21291-9-gleb@scylladb.com>	2020-01-05 16:13:59 +02:00
Gleb Natapov	14746e4218	commitlog: drop segment gate The code that enters the gate never defers before leaving, so the gate behaves like a flag. Lets use existing flag to prohibit adding data to a closed segment. Message-Id: <20200102155049.21291-8-gleb@scylladb.com>	2020-01-05 16:13:59 +02:00
Gleb Natapov	f8c8a5bd1f	test: fix error reporting in commitlog_test Message-Id: <20200102155049.21291-7-gleb@scylladb.com>	2020-01-05 16:13:58 +02:00
Gleb Natapov	680330ae70	commitlog: introduce segment::close() function. Currently segment closing code is spread over several functions and activated based on the _closed flag. Make segment closing explicit by moving all the code into close() function and call it where _closed flag is set. Message-Id: <20200102155049.21291-6-gleb@scylladb.com>	2020-01-05 16:13:55 +02:00
Gleb Natapov	a1ae08bb63	commitlog: remove unused segment::flush() parameter Message-Id: <20200102155049.21291-5-gleb@scylladb.com>	2020-01-05 16:13:55 +02:00
Gleb Natapov	1e15e1ef44	commitlog: cleanup segment sync() Call cycle() only once. Message-Id: <20200102155049.21291-4-gleb@scylladb.com>	2020-01-05 16:13:54 +02:00
Gleb Natapov	3d3d2c572e	commitlog: move segment shutdown code from sync() Currently sync() does two completely different things based on the shutdown parameter. Separate code into two different function. Message-Id: <20200102155049.21291-3-gleb@scylladb.com>	2020-01-05 16:13:54 +02:00
Gleb Natapov	89afb92b28	commitlog: drop superfluous this Message-Id: <20200102155049.21291-2-gleb@scylladb.com>	2020-01-05 16:13:53 +02:00
Piotr Jastrzebski	95feeece0b	scylla_tables: treat empty cdc props as disabled Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	396e35bf20	cdc: add schema_change test for cdc_options The original "test_schema_digest_does_not_change" test case ensures that schema digests will match for older nodes that do not support all the features yet (including computed columns). The additional case uses sstables generated after CDC was enabled and a table with CDC enabled is created, in order to make sure that the digest computed including CDC column does not change spuriously as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	c08e6985cd	cdc: allow cluster rolling upgrade Addition of cdc column in scylla_tables changes how schema digests are calculated, and affect the ABI of schema update messages (adding a column changes other columns' indexes in frozen_mutation). To fix this, extend the schema_tables mechanism with support for the cdc column, and adjust schemas and mutations to remove that column when sending schemas during upgrade. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	caa0a4e154	tests: disable CDC in schema_change_tests Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	129af99b94	cdc: Return reference from cluster_supports_cdc Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Piotr Jastrzebski	4639989964	cdc: Add CDC_OPTIONS schema_feature Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-05 14:39:23 +02:00
Avi Kivity	c150f2e5d7	schema_tables, cdc: don't store empty cdc columns in scylla_tables An empty cdc column in scylla_tables is hashed differently from a missing column. This causes schema mismatch when a schema is propagated to another node, because the other node will redact the schema column completely if the cluster feature isn't enabled, and an empty value is hashed differently from a missing value. Store a tombstone instead. Tombstones are removed before digesting, so they don't affect the outcome. This change also undoes the changes in `386221da84` ("schema_tables: handle 'cdc' options") to schema_change_test test_merging_does_not_alter_tables_which_didnt_change. That change enshrined the breakage into the test, instead of fixing the root cause, which was that we added an an extra mutation to the schema (for cdc options, which were disabled).	2020-01-05 14:36:18 +02:00
Rafael Ávila de Espíndola	3d641d4062	lua: Use existing cpp_int cast logic Different versions of boost have different rules for what conversions from cpp_int to smaller intergers are allowed. We already had a function that worked with all supported versions, but it was not being use by lua. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200104041028.215153-1-espindola@scylladb.com>	2020-01-05 12:10:54 +02:00
Rafael Ávila de Espíndola	88b5aadb05	tests: cql_test_env: wait for two futures starting internal services I noticed this while looking at the crashes next is currently experiencing. While I have no idea if this fixes the issue, it does avoid broken future warnings (for no_sharded_instance_exception) in a debug build. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200103201540.65324-1-espindola@scylladb.com>	2020-01-05 12:09:59 +02:00
Avi Kivity	4b8e2f5003	Update seastar submodule * seastar 0525bbb08...36cf5c5ff (6): > memcached: Fix use after free in shutdown > Revert "task: stop wrapping tasks with unique_ptr" > task: stop wrapping tasks with unique_ptr > http: Change exception formating to the generic seastar one > Merge "Avoid a few calls to ~exception_ptr" from Rafael > tests: fix core generation with asan	2020-01-03 15:48:53 +02:00
Nadav Har'El	44c2a44b54	alternator-test: test for ConditionExpression feature This patch adds a very comprehensive test for the ConditionExpression feature, i.e., the newer syntax of conditional writes replacing the old-style "Expected" - for the UpdateItem, PutItem and DeleteItem operations. I wrote these tests while closely following the DynamoDB ConditionExpression documentation, and attempted to cover all conceivable features, subfeatures and subcases of the ConditionExpression syntax - to serve as a test for a future support for this feature in Alternator (see issue #5053). As usual, all these tests pass on AWS DynamoDB, but because we haven't yet implemented this feature in Alternator, all but one xfail on Alternator. Refs #5053. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191229143556.24002-1-nyh@scylladb.com>	2020-01-03 15:48:20 +02:00
Nadav Har'El	aad5eeab51	alternator: better error messages when Alternator port is taken If Alternator is requested to be enabled on a specific port but the port is already taken, the boot fails as expected - but the error log is confusing; It currently looks something like this: WARN 2019-12-24 11:22:57,303 [shard 0] alternator-server - Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use) ... (many more messages about the server shutting down) INFO 2019-12-24 11:22:58,008 [shard 0] init - Startup failed: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use) There are two problems here. First, the "WARN" should really be an "ERROR", because it causes the server to be shut down and the user must see this error. Second, the final line in the log, something the user is likely to see first, contains only the ultimate cause for the exception (an address already in use) but not the information what this address was needed for. This patch solves both issues, and the log now looks like: ERROR 2019-12-24 14:00:54,496 [shard 0] alternator-server - Failed to set up Alterna tor HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system :98, posix_listen failed for address 0.0.0.0:8000: Address already in use) ... INFO 2019-12-24 14:00:55,056 [shard 0] init - Startup failed: std::_Nested_exception<std::runtime_error> (Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043): std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191224124127.7093-1-nyh@scylladb.com>	2020-01-03 15:48:20 +02:00
Nadav Har'El	1f64a3bbc9	alternator: error on unsupported ReturnValues option We don't support yet the ReturnValues option on PutItem, UpdateItem or DeleteItem operations (see issue #5053), but if a user tries to use such an option anyway, we silently ignore this option. It's better to fail, reporting the unsupported option. In this patch we check the ReturnValues option and if it is anything but the supported default ("NONE"), we report an error. Also added a test to confirm this fix. The test verifies that "NONE" is allowed, and something which is unsupported (e.g., "DOG") is not ignored but rather causes an error. Refs #5053. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191216193310.20060-1-nyh@scylladb.com>	2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola	dc93228b66	reloc: Turn the default flags into common flags These are flags we always want to enable. In particular, we want them to be used by the bots, but the bots run this script with --configure-flags, so they were being discarded. We put the user option later so that they can override the common options. Fixes #5505 Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Takuya ASADA <syuu@scylladb.com> Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-03 15:48:20 +02:00
Rafael Ávila de Espíndola	d4dfb6ff84	build-id: Handle the binary having multiple PT_NOTE headers There is no requirement that all notes be placed in a single PT_NOTE. It looks like recent lld's actually put each section in its own PT_NOTE. This change looks for build-id in all PT_NOTE headers. Fixes #5525 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191227000311.421843-1-espindola@scylladb.com>	2020-01-03 15:48:20 +02:00
Avi Kivity	1e9237d814	dist: redhat: use parallel compression for rpm payload rpm compression uses xz, which is painfully slow. Adjust the compression settings to run on all threads. The xz utility documentation suggests that 0 threads is equivalent to all CPUs, but apparently the library interface (which rpmbuild uses) doesn't think the same way. Message-Id: <20200101141544.1054176-1-avi@scylladb.com>	2020-01-03 15:48:20 +02:00
Nadav Har'El	de1171181c	user defined types: fix support for case-sensitive type names In the current code, support for case-sensitive (quoted) user-defined type names is broken. For example, a test doing: CREATE TYPE "PHone" (country_code int, number text) CREATE TABLE cf (pk blob, pn "PHone", PRIMARY KEY (pk)) Fails - the first line creates the type with the case-sensitive name PHone, but the second line wrongly ends up looking for the lowercased name phone, and fails with an exception "Unknown type ks.phone". The problem is in cql3_type_name_impl. This class is used to convert a type object into its proper CQL syntax - for example frozen<list<int>>. The problem is that for a user-defined type, we forgot to quote its name if not lowercase, and the result is wrong CQL; For example, a list of PHone will be written as list<PHone> - but this is wrong because the CQL parser, when it sees this expression, lowercases the unquoted type name PHone and it becomes just phone. It should be list<"PHone">, not list<PHone>. The solution is for cql3_type_name_impl to use for a user-defined type its get_name_as_cql_string() method instead of get_name_as_string(). get_name_as_cql_string() is a new method which prints the name of the user type as it should be in a CQL expression, i.e., quoted if necessary. The bug in the above test was apparently caused when our code serialized the type name to disk as the string PHone (without any quoting), and then later deserialized it using the CQL type parser, which converted it into a lowercase phone. With this patch, the type's name is serialized as "PHone", with the quotes, and deserialized properly as the type PHone. While the extra quotes may seem excessive, they are necessary for the correct CQL type expression - remember that the type expression may be significantly more complex, e.g., frozen<list<"PHone">> and all of this, including the quotes, is necessary for our parser to be able to translate this string back into a type object. This patch may cause breakage to existing databases which used case- sensitive user-defined types, but I argue that these use cases were already broken (as demonstrated by this test) so we won't break anything that actually worked before. Fixes #5544 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200101160805.15847-1-nyh@scylladb.com>	2020-01-03 15:48:20 +02:00
Pavel Emelyanov	34f8762c4d	storage_service: Drop _update_jobs This field is write-only. Leftover from `83ffae1` (storage_service: Drop block_until_update_pending_ranges_finished) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191226091210.20966-1-xemul@scylladb.com>	2020-01-03 15:48:20 +02:00
Pavel Emelyanov	f2b20e7083	cache_hitrate_calculator: Do not reinvent the peering_sharded_service The class in question wants to run its own instances on different shards, for this sake it keeps reference on sharded self to call invoke_on() on. There's a handy peering_sharded_service<> in seastar for the same, using it makes the code nicer and shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191226112401.23960-1-xemul@scylladb.com>	2020-01-03 15:48:19 +02:00
Rafael Ávila de Espíndola	bbed9cac35	cql3: move function creation to a .cc file We had a lot of code in a .hh file, that while using templeates, was only used from creating functions during startup. This moves it to a new .cc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200101002158.246736-1-espindola@scylladb.com>	2020-01-03 15:48:19 +02:00
Benny Halevy	c0883407fe	scripts: Add cpp-name-format: pretty printer Pretty-print cpp-names, useful for deciphering complex backtraces. For example, the following line: service::storage_proxy::init_messaging_service()::{lambda(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>)#1}::operator()(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector<frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info>) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360 Is formatted as: service::storage_proxy::init_messaging_service()::{ lambda( seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector< frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info> )#1 }::operator()( seastar::rpc::client_info const&, seastar::rpc::opt_time_point, std::vector< frozen_mutation, std::allocator<frozen_mutation> >, db::consistency_level, std::optional<tracing::trace_info> ) const at /local/home/bhalevy/dev/scylla/service/storage_proxy.cc:4360 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191226142212.37260-1-bhalevy@scylladb.com>	2020-01-01 12:08:12 +02:00
Rafael Ávila de Espíndola	75817d1fe7	sstable: Add checks to help track problems with large_data_handler use after free I can't quite figure out how we were trying to write a sstable with the large data handler already stopped, but the backtrace suggests a good place to add extra checks. This patch adds two check. One at the start and one at the end of sstable::write_components. The first one should give us better backtraces if the large_data_handler is already stopped. The second one should help catch some race condition. Refs: #5470 Message-Id: <20191231173237.19040-1-espindola@scylladb.com>	2020-01-01 12:03:31 +02:00
Rafael Ávila de Espíndola	3c34e2f585	types: Avoid an unaligned load in json integer serialization The patch also adds a test that makes the fixed issue easier to reproduce. Fixes #5413 Message-Id: <20191231171406.15980-1-espindola@scylladb.com>	2019-12-31 19:23:42 +02:00
Gleb Natapov	bae5cb9f37	commitlog: remove unused argument during segment creation Since `99a5a77234` all segments are created equal and "active" argument is never true, so drop it. Message-Id: <20191231150639.GR9084@scylladb.com>	2019-12-31 17:14:03 +02:00
Rafael Ávila de Espíndola	aa535a385d	enum_option_test: Add an explicit underlying type to an enum We expect to be able to create a variable with an out of range value, so the enum needs an explicit underlying type. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191230222029.88942-1-espindola@scylladb.com>	2019-12-31 16:59:00 +02:00
Nadav Har'El	48a914c291	Fix uninitialized members Merged pull request https://github.com/scylladb/scylla/pull/5532 from Benny Halevy: Initialize bool members in row_level_repair and _storage_service causing ubsan errors. Fixes #5531	2019-12-31 10:32:54 +02:00
Takuya ASADA	aa87169670	dist/debian: add procps on Depends We require procps package to use sysctl on postinst script for scylla-kernel-conf. Fixes #5494 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191218234100.37844-1-syuu@scylladb.com>	2019-12-30 19:30:35 +02:00
Avi Kivity	972127e3a8	atomic_cell: add type-aware pretty printing The standard printer for atomic_cell prints the value as hex, because atomic_cell does not include the type. Add a type-aware printer that allows the user to provide the type.	2019-12-30 18:27:04 +02:00
Avi Kivity	19f68412ad	atomic_cell: move pretty printers from database.cc to atomic_cell.cc atomic_cell.cc is the logical home for atomic_cell pretty printers, and since we plan to add more pretty printers, start by tidying up.	2019-12-30 18:20:30 +02:00
Eliran Sinvani	21dec3881c	debian-reloc: rename buld product to the name specified in SCYLLA-VERSION-GEN When the product name is other than "scylla", the debian packaging scripts go over all files that starts with "scylla-" and change the prefix to be the actual product name. However, if there are no such files in the directory the script will fail since the renaming command will get the wildcard string instrad of an actual file name. This patch replaces the command with a command with an equivalent desired effect that only operates on files if there are any. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20191230143250.18101-1-eliransin@scylladb.com>	2019-12-30 17:45:50 +02:00
Takuya ASADA	263385cb4b	dist: stop replacing /usr/lib/scylla with symlink (#5530 ) Since we merged /usr/lib/scylla with /opt/scylladb, we removed /usr/lib/scylla and replace it with the symlink point to /opt/scylladb. However, RPM does not support replacing a directory with a symlink, we are doing some dirty hack using RPM scriptlet, but it causes multiple issues on upgrade/downgrade. (See: https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/) To minimize Scylla upgrading/downgrade issues on user side, it's better to keep /usr/lib/scylla directory. Instead of creating single symlink /usr/lib/scylla -> /opt/scylladb, we can create symlinks for each setup scripts like /usr/lib/scylla/<script> -> /opt/scylladb/scripts/<script>. Fixes #5522 Fixes #4585 Fixes #4611	2019-12-30 13:52:24 +02:00
Hagit Segev	9d454b7dc6	reloc/build_rpm.sh: Fix '--builddir' option handling (#5519 ) The '--builddir' option value is assigned to the "builddir" variable, which is wrong. The correct variable is "BUILDDIR" so use that instead to fix the '--builddir' option. Also, add logging to the script when executing the "dist/redhat_build.rpm.sh" script to simplify debugging.	2019-12-30 13:25:22 +02:00
Benny Halevy	8aa5d84dd8	storage_service: initialize _is_bootstrap_mode Hit the following ubsan error with bootstrap_test:TestBootstrap.manual_bootstrap_test in debug mode: service/storage_service.cc:3519:37: runtime error: load of value 190, which is not a valid value for type 'bool' The use site is: service::storage_service::is_cleanup_allowed(seastar::basic_sstring<char, unsigned int, 15u, true>)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&) const at /local/home/bhalevy/dev/scylla/service/storage_service.cc:3519 While at it, initialize `_initialized` to false as well, just in case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-30 11:44:58 +02:00
Benny Halevy	474ffb6e54	repair: initialize row_level_repair: _zero_rows Avoid following UBSAN error: repair/row_level.cc:2141:7: runtime error: load of value 240, which is not a valid value for type 'bool' Fixes #5531 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-30 11:44:58 +02:00
Fabiano Lucchese	d7795b1efa	scylla_setup: Support for enforcing optimal Linux clocksource setting (#5499 ) A Linux machine typically has multiple clocksources with distinct performances. Setting a high-performant clocksource might result in better performance for ScyllaDB, so this should be considered whenever starting it up. This patch introduces the possibility of enforcing optimized Linux clocksource to Scylla's setup/start-up processes. It does so by adding an interactive question about enforcing clocksource setting to scylla_setup, which modifies the parameter "CLOCKSOURCE" in scylla_server configuration file. This parameter is read by perftune.py which, if set to "yes", proceeds to (non persistently) setting the clocksource. On x86, TSC clocksource is used. Fixes #4474 Fixes #5474 Fixes #5480	2019-12-30 10:54:14 +02:00
Avi Kivity	e223154268	cdc: options: return an empty options map when cdc is disabled This is compatible with 3.1 and below, which didn't have that schema field at all.	2019-12-29 16:34:37 +02:00
Benny Halevy	27e0aee358	docs/debugging.md: fix anchor links Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191229074136.13516-1-bhalevy@scylladb.com>	2019-12-29 16:26:26 +02:00
Pavel Solodovnikov	aba9a11ff0	cql: pass variable_specifications via lw_shared_ptr Instances of `variable_specifications` are passed around as shared_ptr's, which are redundant in this case since the class is marked as `final`. Use `lw_shared_ptr` instead since we know for sure it's not a polymorphic pointer. Tests: unit(debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191225232853.45395-1-pa.solodovnikov@scylladb.com>	2019-12-29 16:26:26 +02:00
Benny Halevy	4c884908bb	directories: Keep a unique set of directories to initialize If any two directories of data/commitlog/hints/view_hints are the same we still end up running verify_owner_and_mode and disk_sanity(check_direct_io_support) in parallel on the same directoriea and hit #5510. This change uses std::set rather than std::vector to collect a unique set of directories that need initialization. Fixes #5510 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191225160645.2051184-1-bhalevy@scylladb.com>	2019-12-29 16:26:26 +02:00
Gleb Natapov	60a851d3a5	commitlog: always flush segments atomically with writing db::commitlog::segment::batch_cycle() assumes that after a write for a certain position completes (as reported by _pending_ops.wait_for_pending()) it will also be flushed, but this is true only if writing and flushing are atomic wrt _pending_ops lock. It usually is unless flush_after is set to false when cycle() is called. In this case only writing is done under the lock. This is exactly what happens when a segment is closed. Flush is skipped because zero header is added after the last entry and then flushed, but this optimization breaks batch_cycle() assumption. Fix it by flushing after the write atomically even if a segment is being closed. Fixes #5496 Message-Id: <20191224115814.GA6398@scylladb.com>	2019-12-24 14:52:23 +02:00
Pavel Emelyanov	a5cdfea799	directories: Do not mess with per-shard base dir The hints and view_hints directory has per-shard sub-dirs, and the directories code tries to create, check and lock all of them, including the base one. The manipulations in question are excessive -- it's enough to check and lock either the base dir, or all the per-shard ones, but not everything. Let's take the latter approach for its simplicity. Fixes #5510 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Looks-good-to: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223142429.28448-1-xemul@scylladb.com>	2019-12-24 14:49:28 +02:00
Benny Halevy	f8f5db42ca	dbuild: try to pull image if not present locally Pekka Enberg <penberg@scylladb.com> wrote: > Image might not be present, but the subsequent "docker run" command will automatically pull it. Just letting "docker run" fail produces kinda confusing error message, referring to docker help, but the we want to provide the user with our own help, so still fail early, just also try to pull the image if "docker image inspect" failed, indicating it's not present locally. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223085219.1253342-4-bhalevy@scylladb.com>	2019-12-24 11:13:23 +02:00
Benny Halevy	ee2f97680a	dbuild: just die when no image-id is provided Suggested-by: Pekka Enberg <penberg@scylladb.com> > This will print all the available Docker images, > many (most?) of them completely unrelated. > Why not just print an error saying that no image was specified, > and then perhaps print usage. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223085219.1253342-3-bhalevy@scylladb.com>	2019-12-24 11:13:22 +02:00
Benny Halevy	87b2f189f7	dbuild: s/usage/die/ Suggested-by: Dejan Mircevski <dejan@scylladb.com> > The use pattern of this function strongly suggests a name like `die`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223085219.1253342-2-bhalevy@scylladb.com>	2019-12-24 11:13:21 +02:00
Benny Halevy	718e9eb341	table: move_sstables_from_staging: fix use after free of shared_sstable Introduced in `4b3243f5b9` Reproducible with materialized_views_test:TestMaterializedViews.mv_populating_from_existing_data_during_node_remove_test and read_amplification_test:ReadAmplificationTest.no_read_amplification_on_repair_with_mv_test ==955382==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200023de18 at pc 0x00000051d788 bp 0x7f8a0563fcc0 sp 0x7f8a0563fcb0 READ of size 8 at 0x60200023de18 thread T1 (reactor-1) #0 0x51d787 in seastar::lw_shared_ptr<sstables::sstable>::lw_shared_ptr(seastar::lw_shared_ptr<sstables::sstable> const&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:289 #1 0x10ba189 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1530 #2 0x109c4f1 in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>&, const seastar::lw_shared_ptr<sstables::sstabl e>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1556 #3 0x106941a in do_for_each<__gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >, table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda( std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:618 #4 0x1069203 in operator() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future-util.hh:626 #5 0x10ba589 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36 #6 0x10ba668 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging (std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44 #7 0x10ba7c0 in apply<seastar::do_for_each(Iterator, Iterator, AsyncAction) [with Iterator = __gnu_cxx::__normal_iterator<const seastar::lw_shared_ptr<sstables::sstable>, std::vector<seastar::lw_shared_ptr<sstables::sstable> > >; AsyncAction = table::move_sstables_from_staging (std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>::<lambda(std::set<seastar::basic_sstring<char, unsigned int, 15> >&)>::<lambda(sstables::shared_sstable)>]::<lambda()>&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563 ... 0x60200023de18 is located 8 bytes inside of 16-byte region [0x60200023de10,0x60200023de20) freed by thread T1 (reactor-1) here: #0 0x7f8a153b796f in operator delete(void) (/lib64/libasan.so.5+0x11096f) #1 0x6ab4d1 in __gnu_cxx::new_allocator<seastar::lw_shared_ptr<sstables::sstable> >::deallocate(seastar::lw_shared_ptr<sstables::sstable>, unsigned long) /usr/include/c++/9/ext/new_allocator.h:128 #2 0x612052 in std::allocator_traits<std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::deallocate(std::allocator<seastar::lw_shared_ptr<sstables::sstable> >&, seastar::lw_shared_ptr<sstables::sstable>, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:470 #3 0x58fdfb in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::_M_deallocate(seastar::lw_shared_ptr<sstables::sstable>*, unsigned long) /usr/include/c++/9/bits/stl_vector.h:351 #4 0x52a790 in std::_Vector_base<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~_Vector_base() /usr/include/c++/9/bits/stl_vector.h:332 #5 0x52a99b in std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >::~vector() /usr/include/c++/9/bits/stl_vector.h:680 #6 0xff60fa in ~<lambda> /local/home/bhalevy/dev/scylla/table.cc:2477 #7 0xff7202 in operator() /local/home/bhalevy/dev/scylla/table.cc:2496 #8 0x106af5b in apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1573 #9 0x102f5d5 in futurize_apply<table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1645 #10 0x102f9ee in operator()<seastar::semaphore_units<seastar::named_semaphore_exception_factory> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/semaphore.hh:488 #11 0x109d2f1 in apply /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:36 #12 0x109d42c in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/apply.hh:44 #13 0x109d595 in apply<seastar::with_semaphore(seastar::basic_semaphore<ExceptionFactory, Clock>&, size_t, Func&&) [with ExceptionFactory = seastar::named_semaphore_exception_factory; Func = table::move_sstables_from_staging(std::vector<seastar::lw_shared_ptr<sstables::sstable> >)::<lambda()>; Clock = std::chrono::_V2::steady_clock]::<lambda(auto:51)>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&&> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1563 ... Fixes #5511 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191222214326.1229714-1-bhalevy@scylladb.com>	2019-12-23 15:20:41 +02:00
Konstantin Osipov	476fbc60be	test.py: prepare to remove custom colors Add dbuild dependency on python3-colorama, which will be used in test.py instead of a hand-made palette. [avi: update tools/toolchain/image] Message-Id: <20191223125251.92064-2-kostja@scylladb.com>	2019-12-23 15:13:22 +02:00
Pavel Emelyanov	d361894b9d	batchlog_manager: Speed up token_metadata endpoints counting a bit In this place we only need to know the number of endpoints, while current code additionally shuffles them before counting. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:45 +02:00
Pavel Emelyanov	6e06c88b4c	token_metadata: Remove unused helper There are two _identical_ methods in token_metadata class: get_all_endpoints_count() and number_of_endpoints(). The former one is used (called) the latter one is not used, so let's remove it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:43 +02:00
Pavel Emelyanov	2662d9c596	migration_manager: Remove run_may_throw() first argument It's unused in this function. Also this helps getting rid of global instances of components. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:42 +02:00
Pavel Emelyanov	703b16516a	storage_service: Remove unused helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:41 +02:00
Takuya ASADA	e0071b1756	reloc: don't archive dist/ami/files/.rpm on relocatable package We should skip archiving dist/ami/files/.rpm on relocatable package, since it doesn't used. Also packer and variables.json, too. Fixes #5508 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191223121044.163861-1-syuu@scylladb.com>	2019-12-23 14:19:51 +02:00
Tomasz Grabiec	28dec80342	db/schema_tables: Add trace-level logging of schema digesting This greatly helps to narrow down the source of schema digest mismatch between nodes. Intented use is to enable this logger on disagreeing nodes and trigger schema digest recalculation and observe which mutations differ in digest and then examine their content. Message-Id: <1574872791-27634-1-git-send-email-tgrabiec@scylladb.com>	2019-12-23 12:28:22 +02:00
Konstantin Osipov	1116700bc9	test.py: do not return 0 if there are failed tests Fix a return value regression introduced when switching to asyncio. Message-Id: <20191222134706.16616-2-kostja@scylladb.com>	2019-12-22 16:14:32 +02:00
Asias He	7322b749e0	repair: Do not return working_row_buf_nr in get combined row hash verb In commit `b463d7039c` (repair: Introduce get_combined_row_hash_response), working_row_buf_nr is returned in REPAIR_GET_COMBINED_ROW_HASH in addition to the combined hash. It is scheduled to be part of 3.1 release. However it is not backported to 3.1 by accident. In order to be compatible between 3.1 and 3.2 repair. We need to drop the working_row_buf_nr in 3.2 release. Fixes: #5490 Backports: 3.2 Tests: Run repair in a mixed 3.1 and 3.2 cluster	2019-12-21 20:13:15 +02:00
Takuya ASADA	8eaecc5ed6	dist/common/scripts/scylla_setup: add swap existance check Show warnings when no swap is configured on the node. Closes #2511 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191220080222.46607-1-syuu@scylladb.com>	2019-12-21 20:03:58 +02:00
Pavel Solodovnikov	5a15bed569	cql3: return `result_set` by cref in `cql3::result::result_set` Changes summary: * make `cql3::result_set` movable-only * change signature of `cql3::result::result_set` to return by cref * adjust available call sites to the aforementioned method to accept cref Motivation behind this change is elimination of dangerous API, which can easily set a trap for developers who don't expect that result_set would be returned by value. There is no point in copying the `result_set` around, so make `cql3::result::result_set` to cache `result_set` internally in a `unique_ptr` member variable and return a const reference so to minimize unnecessary copies here and there. Tests: unit(debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191220115100.21528-1-pa.solodovnikov@scylladb.com>	2019-12-21 16:56:42 +02:00
Takuya ASADA	3a6cb0ed8c	install.sh: drop limits.d from nonroot mode The file only required for root mode. Fixes #5507 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191220101940.52596-1-syuu@scylladb.com>	2019-12-21 15:26:08 +02:00
Botond Dénes	08bb0bd6aa	mutation_fragment_stream_validator: wrap exceptions into own exception type So a higher level component using the validator to validate a stream can catch only validation errors, and let any other incidental exception through. This allows building data correctors on top of the `mutation_fragment_stream_validator`, by filtering a fragment stream through a validator, catching invalid fragment stream exceptions and dropping the respective fragments from the stream. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191220073443.530750-1-bdenes@scylladb.com>	2019-12-20 12:05:00 +01:00
Rafael Ávila de Espíndola	91c7f5bf44	Print build-id on startup Fixes #5426 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191218031556.120089-1-espindola@scylladb.com>	2019-12-19 15:43:04 +02:00
Avi Kivity	440ad6abcc	Revert "relocatable: Check that patchelf didn't mangle the PT_LOAD headers" This reverts commit `237ba74743`. While it works for the scylla executable, it fails for iotune, which is built by seastar. It should be reinstated after we pass the correct link parameters to the seastar build system.	2019-12-19 11:20:34 +02:00
Pekka Enberg	c0aea19419	Merge "Add a timeout for housekeeping for offline installs" from Amnon " These series solves an issue with scylla_setup and prevent it from waiting forever if housekeeping cannot look for the new Scylla version. Fixes #5302 It should be backported to versions that support offline installations. " * 'scylla_setup_timeout' of git://github.com/amnonh/scylla: scylla_setup: do not wait forever if no reply is return housekeeping scylla_util.py: Add optional timeout to out function	2019-12-19 08:18:19 +02:00
Rafael Ávila de Espíndola	8d777b3ad5	relocatable: Use a super long path for the dynamic linker Having a long path allows patchelf to change the interpreter without changing the PT_LOAD headers and therefore without moving the build-id out of the first page. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191213224803.316783-1-espindola@scylladb.com>	2019-12-18 19:10:59 +02:00
Pavel Solodovnikov	c451f6d82a	LWT: Fix required participants calculation for LOCAL_SERIAL CL Suppose we have a multi-dc setup (e.g. 9 nodes distributed across 3 datacenters: [dc1, dc2, dc3] -> [3, 3, 3]). When a query that uses LWT is executed with LOCAL_SERIAL consistency level, the `storage_proxy::get_paxos_participants` function incorrectly calculates the number of required participants to serve the query. In the example above it's calculated to be 5 (i.e. the number of nodes needed for a regular QUORUM) instead of 2 (for LOCAL_SERIAL, which is equivalent to LOCAL_QUORUM cl in this case). This behavior results in an exception being thrown when executing the following query with LOCAL_SERIAL cl: INSERT INTO users (userid, firstname, lastname, age) VALUES (0, 'first0', 'last0', 30) IF NOT EXISTS Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl LOCAL_SERIAL. Requires 5, alive 3" info={'required_replicas': 5, 'alive_replicas': 3, 'consistency': 'LOCAL_SERIAL'} Tests: unit(dev), dtest(consistency_test.py) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191216151732.64230-1-pa.solodovnikov@scylladb.com>	2019-12-18 16:58:32 +01:00
Botond Dénes	cd6bf3cb28	scylla-gdb.py: static_vector: update for changed storage The actual buffer is now in a member called 'data'. Leave the old `dummy.dummy` and `dummy` as fall-back. This seems to change every Fedora release. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191218153544.511421-1-bdenes@scylladb.com>	2019-12-18 17:39:56 +02:00
Tomasz Grabiec	5865d08d6c	migration_manager: Recalculate schema only on shard 0 Schema is node-global, update_schema_version_and_announce() updates all shards. We don't need to recalculate it from every shard, so install the listeners only on shard 0. Reduces noise in the logs. Message-Id: <1574872860-27899-1-git-send-email-tgrabiec@scylladb.com>	2019-12-18 16:43:26 +02:00
Pavel Emelyanov	998f51579a	storage_service: Rip join_ring config option The option in question apparently does not work, several sharded objects are start()-ed (and thus instanciated) in join_roken_ring, while instances themselves of these objects are used during init of other stuff. This leads to broken seastar local_is_initialized assertion on sys_dist_ks, but reading the code shows more examples, e.g. the auth_service is started on join, but is used for thrift and cql servers initialization. The suggestion is to remove the option instead of fixing. The is_joined logic is kept since on-start joining still can take some time and it's safer to report real status from the API. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191203140717.14521-1-xemul@scylladb.com>	2019-12-18 12:45:13 +02:00
Nadav Har'El	8157f530f5	merge: CDC: handle schema changes Merged pull request https://github.com/scylladb/scylla/pull/5366 from Calle Wilund: Moves schema creation/alter/drop awareness to use new "before" callbacks from migration manager, and adds/modifies log and streams table as part of the base table modification. Makes schema changes semi-atomic per node. While this does not deal with updates coming in before a schema change has propagated cluster, it now falls into the same pit as when this happens without CDC. Added side effect is also that now schemas are transparent across all subsystems, not just cql. Patches: cdc_test: Add small test for altering base schema (add column) cdc: Handle schema changes via migration manager callbacks migration_manager: Invoke "before" callbacks for table operations migration_listener: Add empty base class and "before" callbacks for tables cql_test_env: Include cdc service in cql tests cdc: Add sharded service that does nothing. cdc: Move "options" to separate header to avoid to much header inclusion cdc: Remove some code from header	2019-12-17 23:04:36 +02:00
Avi Kivity	1157ee16a5	Update seastar submodule * seastar 00da4c8760...0525bbb08f (7): > future: Simplify future_state_base::any move constructor > future: don't create temporary tuple on future::get(). > future: don't instantiate new future on future::then_wrapped(). > future: clean-up the Result handling in then_wrapped(). > Merge "Fix core dumps when asan is enabled" from Rafael > future: Move ignore to the base class > future: Don't delete in ignore	2019-12-17 19:47:50 +02:00
Botond Dénes	638623b56b	configure.py: make build.ninja target depend on SCYLLA-VERSION-GEN Currently `SCYLLA-VERSION-GEN` is not a dependency of any target and hence changes done to it will not be picked up by ninja. To trigger a rebuild and hence version changes to appear in the `scylla` target binary, one has to do `touch configure.py`. This is counter intuitive and frustrating to people who don't know about it and wonder why their changed version is not appearing as the output of `scylla --version`. This patch makes `SCYLLA-VERSION-GEN` a dependency of `build.ninja, making the `build.ninja` target out-of-date whenever `SCYLLA-VERSION-GEN` is changed and hence will trigger a rerun of `configure.py` when the next target is built, allowing a build of e.g. `scylla` to pick up any changes done to the version automatically. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191217123955.404172-1-bdenes@scylladb.com>	2019-12-17 17:40:04 +02:00
Avi Kivity	7152ba0c70	Merge "tests: automatically search for unit tests" from Kostja " This patch set rearranges the test files so that it is now possible to search for tests automatically, and adds this functionality to test.py " * 'test.py.requeue' of ssh://github.com/scylladb/scylla-dev: cmake: update CMakeLists.txt to scan test/ rather than tests/ test.py: automatically lookup all unit and boost tests tests: move all test source files to their new locations tests: move a few remaining headers tests: move another set of headers to the new test layout tests: move .hh files and resources to new locations tests: remove executable property from data_listeners_test.cc	2019-12-17 17:32:18 +02:00
Amnon Heiman	dd42f83013	scylla_setup: do not wait forever if no reply is return housekeeping When scylla is installed without a network connectivity, the test if a newer version is available can cause scylla_setup to wait forever. This patch adds a limit to the time scylla_setup will wait for a reply. When there is no reply, the relevent error will be shown that it was unable to check for newer version, but this will not block the setup script. Fixes #5302 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-12-17 14:56:47 +02:00
Nadav Har'El	aa1de5a171	merge: Synchronize snapshot and staging sstable deletion using sem Merged pull request https://github.com/scylladb/scylla/pull/5343 from Benny Halevy. Fixes #5340 Hold the sstable_deletion_sem table::move_sstables_from_subdirs to serialize access to the staging directory. It now synchronizes snapshot, compaction deletion of sstables, and view_update_generator moving of sstables from staging. Tests: unit (dev) [expect test_user_function_timestamp_return that fails for me locally, but also on master] snapshot_test.py (dev)	2019-12-17 14:06:02 +02:00
Juliusz Stasiewicz	7fdc8563bf	system_keyspace: Added infrastructure for table `system.clients' I used the following as a reference: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/virtual/ClientsTable.java At this moment there is only info about IP, clients outgoing port, client 'type' (i.e. CQL/thrift/alternator), shard ID and username. Column `request_count' is NOT present and CK consists of (`port', `client_type'), contrary to what C's has: (`port'). Code that notifies `system.clients` about new connections goes to top-level files `connection_notifier.`. Currently only CQL clients are observed, but enum `client_type` can be used in future to notify about connections with other protocols.	2019-12-17 11:31:28 +01:00
Benny Halevy	4b3243f5b9	table: move_sstables_from_staging_in_thread with _sstable_deletion_sem Hold the _sstable_deletion_sem while moving sstables from the staging directory so not to move them under the feet of table::snapshot. Fixes #5340 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	0446ce712a	view_update_generator::start: use variable binding Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	5d7c80c148	view_update_generator::start: fix indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	02784f46b9	view_update_generator: handle errors when processing sstable Consumer may throw, in this case, break from the loop and retry. move_sstable_from_staging_in_thread may theoretically throw too, ignore the error in this case since the sstable was already processed, individual move failures are already ignored and moving from staging will be retried upon restart. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	abda12107f	sstables: move_to_new_dir: add do_sync_dirs param To be used for "batch" move of several sstables from staging to the base directory, allowing the caller to sync the directories once when all are moved rather than for each one of them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	6efef84185	sstable: return future from move_to_new_dir distributed_loader::probe_file needlessly creates a seastar thread for it and the next patch will use it as part of a parallel_for_each loop to move a list of sstables (and sync the directories once at the end). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Benny Halevy	0d2a7111b2	view_update_generator: sstable_with_table: std::move constructor args Just a small optimization. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:19:55 +02:00
Nadav Har'El	fc85c49491	alternator: error on unsupported parallel scan We do not yet support the parallel Scan options (TotalSegments, Segment), as reported in issue #5059. But even before implementing this feature, it is important that we produce an error if a user attempts to use it - instead of outright ignoring this parameter. This is what this patch does. The patch also adds a full test, test_scan.py::test_scan_parallel, for the parallel scan feature. The test passes on DynamoDB, and still xfails on Alternator after this patch - but now the Scan request fails immediately reporting the unsupported option - instead of what the pre-patch code did: returning the wrong results and the test failing just when the results do not match the expectations. Refs #5059. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191217084917.26191-1-nyh@scylladb.com>	2019-12-17 11:27:56 +02:00
Avi Kivity	f7d69b0428	Revert "Merge "bouncing lwt request to an owning shard" from Gleb" This reverts commit `64cade15cc`, reversing changes made to `9f62a3538c`. This commit is suspected of corrupting the response stream. Fixes #5479.	2019-12-17 11:06:10 +02:00
Rafael Ávila de Espíndola	237ba74743	relocatable: Check that patchelf didn't mangle the PT_LOAD headers Should avoid issue #4983 showing up again. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191213224803.316783-2-espindola@scylladb.com>	2019-12-16 20:18:32 +02:00
Avi Kivity	3b7aca3406	Merge "db: Don't create a reference to nullptr" from Rafael " Only the first patch is needed to fix the undefined behavior, but the followup ones simplify the memory management around user types. " * 'espindola/fix-5193-v2' of ssh://github.com/espindola/scylla: db: Don't use lw_shared_ptr for user_types_metadata user_types_metadata: don't implement enable_lw_shared_from_this cql3: pass a const user_types_metadata& to prepare_internal db: drop special case for top level UDTs db: simplify db::cql_type_parser::parse db: Don't create a reference to nullptr Add test for loading a schema with a non native type	2019-12-16 17:10:58 +02:00
Konstantin Osipov	d6bc7cae67	cmake: update CMakeLists.txt to scan test/ rather than tests/ A follow up on directory rename.	2019-12-16 17:47:42 +03:00
Konstantin Osipov	e079a04f2a	test.py: automatically lookup all unit and boost tests	2019-12-16 17:47:42 +03:00
Konstantin Osipov	1c8736f998	tests: move all test source files to their new locations 1. Move tests to test (using singular seems to be a convention in the rest of the code base) 2. Move boost tests to test/boost, other (non-boost) unit tests to test/unit, tests which are expected to be run manually to test/manual. Update configure.py and test.py with new paths to tests.	2019-12-16 17:47:42 +03:00
Konstantin Osipov	2fca24e267	tests: move a few remaining headers Move sstable_test.hh, test_table.hh and cql_assertions.hh from tests/ to test/lib or test/boost and update dependent .cc files. Move tests/perf_sstable.hh to test/perf/perf_sstable.hh	2019-12-16 17:47:42 +03:00
Konstantin Osipov	b9bf1fbede	tests: move another set of headers to the new test layout Move another small subset of headers to test/ with the same goals: - preserve bisectability - make the revision history traceable after a move Update dependent files.	2019-12-16 17:47:42 +03:00
Konstantin Osipov	8047d24c48	tests: move .hh files and resources to new locations The plan is to move the unstructured content of tests/ directory into the following directories of test/: test/lib - shared header and source files for unit tests test/boost - boost unit tests test/unit - non-boost unit tests test/manual - tests intended to be run manually test/resource - binary test resources and configuration files In order to not break git bisect and preserve the file history, first move most of the header files and resources. Update paths to these files in .cc files, which are not moved.	2019-12-16 17:47:42 +03:00
Konstantin Osipov	644595e15f	tests: remove executable property from data_listeners_test.cc Executable flag must be committed to git by mistake.	2019-12-16 17:47:41 +03:00
Benny Halevy	d2e00abe13	tests: commitlog_test: test_allocation_failure: improve error reporting We're seeing the following error from test from time to time: fatal error: in "test_allocation_failure": std::runtime_error: Did not get expected exception from writing too large record This is not reproducible and the error string does not contain enough information to figure out what happened exactly, therefore this patch adds an exception if the call succeeded unexpectedly and also prints the unexpected exception if one was caught. Refs #4714 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191215052434.129641-1-bhalevy@scylladb.com>	2019-12-16 15:38:48 +01:00
Asias He	6b7344f6e5	streaming: Fix typo in stream_result_future::maybe_complete s/progess/progress/ Refs: #5437	2019-12-16 11:12:03 +02:00
Dejan Mircevski	f3883cd935	dbuild: Fix podman invocation (#5481 ) The is_podman check was depending on `docker -v` printing "podman" in the output, but that doesn't actually work, since podman prints $0. Use `docker --help` instead, which will output "podman". Also return podman's return status, which was previously being dropped. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-16 11:11:48 +02:00
Avi Kivity	00ae4af94c	Merge "Sanitize and speed-up (a bit) directories set up" from Pavel " On start there are two things that scylla does on data/commitlog/etc. dirs: locks and verifies permissions. Right now these two actions are managed by different approaches, it's convenient to merge them. Also the introduced in this set directories class makes a ground for better --workdir option handling. In particular, right now the db::config entries are modified after options parse to update directories with the workdir prefix. With the directories class at hands will be able to stop doing this. " * 'br-directories-cleanup' of https://github.com/xemul/scylla: directories: Make internals work on fs::path directories: Cleanup adding dirs to the vector to work on directories: Drop seastar::async usage directories: Do touch_and_lock and verify sequentially directories: Do touch_and_lock in parallel directories: Move the whole stuff into own .cc file directories: Move all the dirs code into .init method file_lock: Work with fs::path, not sstring	2019-12-15 16:02:46 +02:00
Takuya ASADA	5e502ccea9	install.sh: setup workdir correctly on nonroot mode Specify correct workdir on nonroot mode, to set correct path of data / commitlog / hints directories at once. Fixes #5475 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191213012755.194145-1-syuu@scylladb.com>	2019-12-15 16:00:57 +02:00
Avi Kivity	c25d51a4ea	Revert "scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379 )" This reverts commit `4333b37f9e`. It breaks upgrades, and the user question is not informative enough for the user to make a correct decision. Fixes #5478. Fixes #5480.	2019-12-15 14:37:40 +02:00
Pavel Emelyanov	23a8d32920	directories: Make internals work on fs::path Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	373fcfdb3e	directories: Cleanup adding dirs to the vector to work on The unordered_set is turned into vector since for fs::path there's no hash() method that's needed for set. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	14437da769	directories: Drop seastar::async usage Now the only future-able operation remained is the call to parallel_for_each(), all the rest is non-blocking preparation, so we can drop the seastar::async and just return the future from parallel_for_each. The indendation is now good, as in previous patch is was prepared just for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	06f4f3e6d8	directories: Do touch_and_lock and verify sequentially The goal is to drop the seastar::async() usage. Currently we have two places that return futures -- calls to parallel_for_each-s. We can either chain them together or, since both are working on the same set of directories, chain actions inside them. For code simplicity I propose to chain actions. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	8d0c820aa1	directories: Do touch_and_lock in parallel The list of paths that should be touch-and-locked is already at hands, this shortens the code and makes it slightly faster (in theory). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	71a528d404	directories: Move the whole stuff into own .cc file In order not to pollute the root dir place the code in utils/ directory, "utils" namespace. While doing this -- move the touch_and_lock from the class declaration. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Benny Halevy	9ec98324ed	messaging_service: unregister_handler: return rpc unregister_handler future Now that seastar returns it. Fixes https://github.com/scylladb/scylla/issues/5228 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191212143214.99328-1-bhalevy@scylladb.com>	2019-12-12 16:38:36 +02:00
Pavel Emelyanov	f2b3c17e66	directories: Move all the dirs code into .init method The seastar::async usage is tempoarary, added for bisect-safety, soon it will go away. For this reason the indentation in the .init method is not "canonical", but is prepared for one-patch drop of the seastar::async. The hinted_handoff_enabled arg is there, as it's not just a parameter on config, it had been parsed in main.cc. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 17:33:11 +03:00
Pavel Emelyanov	82ef2a7730	file_lock: Work with fs::path, not sstring The main.cc code that converts sstring to fs::path will be patched soon, the file_desc::open belongs to seastar and works on sstrings. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 17:32:10 +03:00
Konstantin Osipov	bc482ee666	test.py: remove an unused option Message-Id: <20191204142622.89920-2-kostja@scylladb.com>	2019-12-12 15:53:35 +02:00
Avi Kivity	64cade15cc	Merge "bouncing lwt request to an owning shard" from Gleb " LWT is much more efficient if a request is processed on a shard that owns a token for the request. This is because otherwise the processing will bounce to an owning shard multiple times. The patch proposes a way to move request to correct shard before running lwt. It works by returning an error from lwt code if a shard is incorrect one specifying the shard the request should be moved to. The error is processed by the transport code that jumps to a correct shard and re-process incoming message there. " * 'gleb/bounce_lwt_request' of github.com:scylladb/seastar-dev: lwt: take raw lock for entire cas duration lwt: drop invoke_on in paxos_state prepare and accept lwt: Process lwt request on a owning shard storage_service: move start_native_transport into a thread transport: change make_result to takes a reference to cql result instead of shared_ptr	2019-12-12 15:50:22 +02:00
Nadav Har'El	9f62a3538c	alternator: fix BEGINS_WITH operator for blobs The implementation of Expected's BEGINS_WITH operator on blobs was incorrect, naively comparing the base64-encoded strings, which doesn't work. This patches fixes the code to compare the decoded strings. The reason why the BEGINS_WITH test missed this bug was that we forgot to check the blob case and only tested the string case; So this patch also adds the missing test - which reproduces this bug, and verifies its fix. Fixes #5457 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191211115526.29862-1-nyh@scylladb.com>	2019-12-12 14:02:56 +01:00
Dejan Mircevski	27b8b6fe9d	cql3: Fix needs_filtering() for clustering columns The LIKE operator requires filtering, so needs_filtering() must check is_LIKE(). This already happens for partition columns, but it was overlooked for clustering columns in the initial implementation of LIKE. Fixes #5400. Tests: unit(dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-12 01:19:13 +02:00
Benny Halevy	d1bcb39e7f	hinted handoff: log message after removing hints directory (#5372 ) To be used by dtest as an indicator that endpoint's hints were drained and hints directory is removed. Refs #5354 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-12 01:16:19 +02:00
Rafael Ávila de Espíndola	3b61cf3f0b	db: Don't use lw_shared_ptr for user_types_metadata The user_types_metadata can simply be owned by the keyspace. This simplifies the code since we never have to worry about nulls and the ownership is now explicit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	a55838323b	user_types_metadata: don't implement enable_lw_shared_from_this It looks like this was done just to avoid including user_types_metadata.hh, which seems a bit much considering that it requires adding specialization to the seastar namespace. A followup patch will also stop using lw_shared_ptr for user_types_metadata. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	f7c2c60b07	cql3: pass a const user_types_metadata& to prepare_internal We never modify the user_types_metadata via prepare_internal, so we can pass it a const reference. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	99cb8965be	db: drop special case for top level UDTs This was originally done in `7f64a6ec4b`, but that commit was reverted in reverted in `8517eecc28`. The revert was done because the original change would call parse_raw for non UDT types. Unlike the old patch, this one doesn't change the behavior of non UDT types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	7ae9955c5f	db: simplify db::cql_type_parser::parse The variant of db::cql_type_parser::parse that has a user_types_metadata argument was only used from the variant that didn't. This inlines one in the other. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	2092e1ef6f	db: Don't create a reference to nullptr The user_types variable can be null during db startup since we have to create types before reading the system table defining user types. This avoids undefined behavior, but is unlikely that it was causing more serious problems since the variable is only used when creating user types and we don't create any until after all system tables are read, in which case the user_types variable is not null. Fixes #5193 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	6143941535	Add test for loading a schema with a non native type This would have found the error with the previous version of the patch series. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:43:34 -08:00
Gleb Natapov	64cfb9b1f6	lwt: take raw lock for entire cas duration It will prevent parallel update by the same coordinator and should reduce contention.	2019-12-11 14:41:31 +02:00
Gleb Natapov	898d2330a2	lwt: drop invoke_on in paxos_state prepare and accept Since lwt requests are now running on an owning shard there is no longer a need to invoke cross shard call.	2019-12-11 14:41:31 +02:00
Gleb Natapov	964c532c4f	lwt: Process lwt request on a owning shard LWT is much more efficient if a request is processed on a shard that owns a token for the request. This is because otherwise the processing will bounce to an owning shard multiple times. The patch proposes a way to move request to correct shard before running lwt. It works by returning an error from lwt code if a shard is incorrect one specifying the shard the request should be moved to. The error is processed by transport code that jumps to a correct shard and re-process incoming message there.	2019-12-11 14:41:31 +02:00
Gleb Natapov	54be057af3	storage_service: move start_native_transport into a thread The code runs only once and it is simple if it runs in a seastar thread.	2019-12-11 14:41:31 +02:00
Gleb Natapov	007ba3e38e	transport: change make_result to takes a reference to cql result instead of shared_ptr	2019-12-11 14:41:31 +02:00
Nadav Har'El	9e5c6995a3	alternator-test: add tests for ReturnValues parameter This patch adds comprehensive tests for the ReturnValue parameter of the write operations (PutItem, UpdateItem, DeleteItem), which can return pre-write or post-write values of the modified item. The tests are in a new test file, alternator-test/test_returnvalues.py. This feature is not yet implemented in Alternator, so all the new tests xfail on Alternator (and all pass on AWS). Refs #5053 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191127163735.19499-1-nyh@scylladb.com>	2019-12-11 13:26:39 +01:00
Nadav Har'El	ab69bfc111	alternator-test: add xfailing tests for ScanIndexForward This patch adds tests for Query's "ScanIndexForward" parameter, which can be used to return items in reversed sort order. We test that a Limit works and returns the given number of last items in the sort order, and also that such reverse queries can be resumed, i.e., paging works in the reverse order. These tests pass against AWS DynamoDB, but fail against Alternator (which doesn't support ScanIndexForward yet), so it is marked xfail. Refs #5153. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191127114657.14953-1-nyh@scylladb.com>	2019-12-11 13:26:39 +01:00
Pekka Enberg	6bc18ba713	storage_proxy: Remove reference to MBean interface The JMX interface is implemented by the scylla-jmx project, not scylla. Therefore, let's remove this historical reference to MBeans from storage_proxy. Message-Id: <20191211121652.22461-1-penberg@scylladb.com>	2019-12-11 14:24:28 +02:00
Avi Kivity	63474a3380	Merge "Add `experimental_features` option" from Dejan " Add --experimental-features -- a vector of features to unlock. Make corresponding changes in the YAML parser. Fixes #5338 " * 'vecexper' of https://github.com/dekimir/scylla: config: Add `experimental_features` option utils: Add enum_option	2019-12-11 14:23:08 +02:00
Avi Kivity	56b9bdc90f	Update seastar submodule * seastar e440e831c8...00da4c8760 (7): > Merge "reactor: fix iocb pool underflow due to unaccounted aio fsync" from Avi Fixes #5443. > install-dependencies.sh: fix arch dependencies > Merge " rpc: fix use-after-free during rpc teardown vs. rpc server message handling" from Benny > Merge "testing: improve the observability of abandoned failed futures" from Botond > rework the fair_queue tester > directory_test: Update to use run instead of run_deprecated > log: support fmt 6.0 branch with chrono.h for log	2019-12-11 14:17:49 +02:00
Benny Halevy	105c8ef5a9	messaging_service: wait on unregister_handler Prepare for returning future<> from seastar rpc unregister_handler. Refs https://github.com/scylladb/scylla/issues/5228 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191208153924.1953-1-bhalevy@scylladb.com>	2019-12-11 14:17:41 +02:00
Nadav Har'El	06c3802a1a	storage_proxy: avoid overflow in view-backlog delay calculation In the calculate_delay() code for view-backlog flow control, we calculate a delay and cap it at a "budget" - the remaining timeout. This timeout is measured in milliseconds, but the capping calculation converted it into microseconds, which overflowed if the timeout is very large. This causes some tests which enable the UB sanitizer to fail. We fix this problem by comparing the delay to the budget in millisecond resolution, not in microsecond resolution. Then, if the calculated delay is short enough, we return it using its full microsecond resolution. Fixes #5412 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191205131130.16793-1-nyh@scylladb.com>	2019-12-11 14:10:54 +02:00
Nadav Har'El	2824d8f6aa	Merge: alternator: Fix EQ operator for sets Merged pull request https://github.com/scylladb/scylla/pull/5453 from Piotr Sarna: Checking the EQ relation for alternator attributes is usually performed simply by comparing underlying JSON objects, but sets (SS, BS, NS types) need a special routine, as we need to make sure that sets stored in a different order underneath are still equal, e.g: [1, 3, 2] == [1, 2, 3] Fixes #5021	2019-12-11 13:20:25 +02:00
Piotr Sarna	421db1dc9d	alternator-test: remove XFAIL from set EQ test With this series merged, test_update_expected_1_eq_set from test_expected.py suite starts passing.	2019-12-11 12:07:39 +01:00
Piotr Sarna	a8e45683cb	alternator: add EQ comparison for sets Checking the EQ relation for alternator attributes is usually performed simply by comparing underlying JSON objects, but sets (SS, BS, NS types) need a special routine, as we need to make sure that sets stored in a different order underneath are still equal, e.g: [1, 3, 2] == [1, 2, 3] Fixes #5021	2019-12-11 12:07:39 +01:00
Piotr Sarna	fb37394995	schema_tables: notify table deletions before creations If a set of mutations contains both an entry that deletes a table and an entry that adds a table with the same name, it's expected to be a replacement operation (delete old + create new), rather than a useless "try to create a table even though it exists already and then immediately delete the original one" operation. As such, notifications about the deletions should be performed before notifications about the creations. The place that originally suffered from this wrong order is view building - which in this case created an incorrect duplicated entry in the view building bookkeeping, and then immediately deleted it, resulting in having old, deprecated entries with stale UUIDS lying in the build queue and never proceeding, because the underlying table is long gone. The issue is fixed by ensuring the order of notifications: - drops are announced first, view drops are announced before table drops; - creations follow, table creations are announced before views; - finally, changes to tables and views are announced; Fixes #4382 Tests: unit(dev), mv_populating_from_existing_data_during_node_stop_test	2019-12-11 12:48:29 +02:00
Benny Halevy	d544df6c3c	dist/ami/build_ami.sh: support incremental build of rpms (#5191 ) Iterate over an array holding all rpm names to see if any of them is missing from `dist/ami/files`. If they are missing, look them up in build/redhat/RPMS/x86_64 so that if reloc/build_rpm.sh was run manually before dist/ami/build_ami.sh we can just collect the built rpms from its output dir. If we're still missing any rpms, then run reloc/build_rpm.sh and copy the required rpms from build/redhat/RPMS/x86_64. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com>	2019-12-11 12:48:29 +02:00
Amnon Heiman	f43285f39a	api: replace swagger definition to use long instead of int (#5380 ) In swagger 1.2 int is defined as int32. We originally used int following the jmx definition, in practice internally we use uint and int64 in many places. While the API format the type correctly, an external system that uses swagger-based code generator can face a type issue problem. This patch replace all use of int in a return type with long that is defined as int64. Changing the return type, have no impact on the system, but it does help external systems that use code generator from swagger. Fixes #5347 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-12-11 12:48:29 +02:00
Nadav Har'El	2abac32f2e	Merged: alternator: Implement CONTAINS and NOT_CONTAINS in Expected Merged pull request https://github.com/scylladb/scylla/pull/5447 by Dejan Mircevski. Adds the last missing operators in the "Expected" parameter and re-enable their tests. Fixes #5034.	2019-12-11 12:48:29 +02:00
Cem Sancak	86b8036502	Fix DPDK mode in prepare script Fixes #5455.	2019-12-11 12:48:29 +02:00
Calle Wilund	35089da983	conf/config: Add better descriptive text on server/client encryption Provide some explanation on prio strings + direction to gnutls manual. Document client auth option. Remove confusing/misleading statement on "custom options" Message-Id: <20191210123714.12278-1-calle@scylladb.com>	2019-12-11 12:48:28 +02:00
Dejan Mircevski	32af150f1d	alternator: Implement NOT_CONTAINS operator in Expected Enable existing NOT_CONTAINS test, add NOT_CONTAINS to the list of recognized operators, implement check_NOT_CONTAINS, and hook it up to verify_expected_one(). Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-10 15:31:47 -05:00
Dejan Mircevski	bd2bd3c7c8	alternator: Implement CONTAINS operator in Expected Enable existing CONTAINS test, implement check_CONTAINS, and hook it up to verify_expected_one(). Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-10 15:31:47 -05:00
Dejan Mircevski	5a56fd384c	config: Add `experimental_features` option When the user wants to turn on only some experimental features, they can use this new option. The existing `experimental` option is preserved for backwards compatibility. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-10 11:47:03 -05:00
Piotr Sarna	9504bbf5a4	alternator: move unwrap_set to serialization header The utility function for unwrapping a set is going to be useful across source files, so it's moved to serialization.hh/serialization.cc.	2019-12-10 15:08:47 +01:00
Piotr Sarna	4660e58088	alternator: move rjson value comparison to rjson.hh The comparison struct is going to be useful across source files, so it's moved into rjson header, where it conceptually belongs anyway.	2019-12-10 15:08:47 +01:00
Botond Dénes	db0e2d8f90	scylla-gdb.py: document and add safety net to seastar::thread related commands Almost all commands provided by `scylla-gdb.py` are safe to use. The worst that could happen if they fail is that you won't get the desired information. There is one notable exception: `scylla thread`. If anything goes wrong while this command is executed - gdb crashes, a bug in the command, etc. - there is a good change the process under examination will crash. Sometimes this is fine, but other times e.g. when live debugging a production node, this is unacceptable. To avoid any accidents add documentation to all commands working with `seastar::thread`. And since most people don't read documentation, especially when debugging under pressure, add a safety net to the `scylla thread` command. When run, this command will now warn of the dangers and will ask for explicit acknowledgment of the risk of crash, by means of passing an `--iamsure` flag. When this flag is missing, it will refuse to run. I am sure this will be very annoying but I am also sure that the avoided crashes are worth it. As part of making `scylla thread` safe, its argument parsing code is migrated to `argparse`. This changes the usage but this should be fine because it is well documented. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191129092838.390878-1-bdenes@scylladb.com>	2019-12-10 11:51:57 +02:00
Eliran Sinvani	765db5d14f	build_ami: Trim ami description attribute to the allowed size The ami description attribute is only allowed to be 255 characters long. When build_ami.sh generates an ami, it generates an ami description which is a concatenation of all of the componnents version strings. It can happen that the description string is too long which eventually causes the ami build to fail. This patch trims the description string to 255 characters. It is ok since the individual versions of the components are also saved in tags attached to the image. Tests: 1. Reproduced with a long description and validated that it doesn't fail after the fix. Fixes #5435 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20191209141143.28893-1-eliransin@scylladb.com>	2019-12-10 11:51:57 +02:00
Fabiano Lucchese	4333b37f9e	scylla_setup: Support for enforcing optimal Linux clocksource setting (#5379 ) A Linux machine typically has multiple clocksources with distinct performances. Setting a high-performant clocksource might result in better performance for ScyllaDB, so this should be considered whenever starting it up. This patch introduces the possibility of enforcing optimized Linux clocksource to Scylla's setup/start-up processes. It does so by adding an interactive question about enforcing clocksource setting to scylla_setup, which modifies the parameter "CLOCKSOURCE" in scylla_server configuration file. This parameter is read by perftune.py which, if set to "yes", proceeds to (non persistently) setting the clocksource. On x86, TSC clocksource is used. Fixes #4474	2019-12-10 11:51:57 +02:00
Pavel Emelyanov	3a21419fdb	features: Remove _FEATURE suffix from hinted_handoff feature name All the other features are named w/o one. The internal const-s are all different, but I'm fixing it separately. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191209154310.21649-1-xemul@scylladb.com>	2019-12-10 11:51:57 +02:00
Dejan Mircevski	a26bd9b847	utils: Add enum_option This allows us to accept command-line options with a predefined set of valid arguments. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-09 09:45:59 -05:00
Calle Wilund	7c5e4c527d	cdc_test: Add small test for altering base schema (add column)	2019-12-09 14:35:04 +00:00
Calle Wilund	cb0117eb44	cdc: Handle schema changes via migration manager callbacks This allows us to create/alter/drop log and desc tables "atomically" with the base, by including these mutations in the original mutation set, i.e. batch create/alter tables. Note that population does not happen until types are actually already put into database (duh), thus there _is_ still a gap between creating cdc and it being truly usable. This may or may not need handling later.	2019-12-09 14:35:04 +00:00
Rafael Ávila de Espíndola	761b19cee5	build: Split the build and host linker flags A general build system knows about 3 machines: * build: where the building is running * host: where the built software will run * target: the machine the software will produce code for The target machine is only relevant for compilers, so we can ignore it. Until now we could ignore the build and host distinction too. This patch adds the first difference: don't use host ld_flags when linking build tools (gen_crc_combine_table). The reason for this change is to make it possible to build with -Wl,--dynamic-linker pointing to a path that will exist on the host machine, but may not exist on the build machine. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191207030408.987508-1-espindola@scylladb.com>	2019-12-09 15:54:57 +02:00
Calle Wilund	27183f648d	migration_manager: Invoke "before" callbacks for table operations Potentially allowing (cdc) augmentation of mutations. Note: only does the listener part in seastar::thread, to avoid changing call behaviour.	2019-12-09 12:12:09 +00:00
Calle Wilund	f78a3bf656	migration_listener: Add empty base class and "before" callbacks for tables Empty base type makes for less boiler plate in implementations. The "before" callbacks are for listeners who need to potentially react/augment type creation/alteration _before_ actually committing type to schema tables (and holding the semaphore for this). I.e. it is for cdc to add/modify log/desc tables "atomically" with base.	2019-12-09 12:12:09 +00:00
Calle Wilund	4e406105b1	cql_test_env: Include cdc service in cql tests	2019-12-09 12:12:09 +00:00
Calle Wilund	a21e140169	cdc: Add sharded service that does nothing. But can be used to hang functionality into eventually.	2019-12-09 12:12:09 +00:00
Calle Wilund	2787b0c4f8	cdc: Move "options" to separate header to avoid to much header inclusion cdc should not contaminate the whole universe.	2019-12-09 12:12:09 +00:00
fastio	8f326b28f4	Redis: Combine all the source files redis/commands/* into redis/commands.{hh,cc} Fixes: #5394 Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>	2019-12-08 13:54:33 +02:00
Avi Kivity	9c63cd8da5	sysctl: reduce kernel tendency to swap anonymous pages relative to page cache (#5417 ) The vm.swappiness sysctl controls the kernel's prefernce for swapping anonymous memory vs page cache. Since Scylla uses very large amounts of anonymous memory, and tiny amounts of page cache, the correct setting is to prefer swapping page cache. If the kernel swaps anonymous memory the reactor will stall until the page fault is satisfied. On the other hand, page cache pages usually belong to other applications, usually backup processes that read Scylla files. This setting has been used in production in Scylla Cloud for a while with good results. Users can opt out by not installing the scylla-kernel-conf package (same as with the other kernel tunables).	2019-12-08 13:04:25 +02:00
Avi Kivity	0e319e0359	Update seastar submodule * seastar 166061da3...e440e831c (8): > Fail tests on ubsan errors > future: make a couple of asserts more strict > future: Move make_ready out of line > config: Do not allow zero rates Fixes #5360 > future: add new state to avoid temporaries in get_available_state(). > future: avoid temporary future_state on get_available_state(). > future: inline future::abandoned > noncopyable_function: Avoid uninitialized warning on empty types	2019-12-06 18:33:23 +02:00
Piotr Sarna	0718ff5133	Merge 'min/max on collections returns human-readable result' from Juliusz Previously, scylla used min/max(blob)->blob overload for collections, tuples and UDTs; effectively making the results being printed as blobs. This PR adds "dynamically"-typed min()/max() functions for compound types. These types can be complicated, like map<int,set<tuple<..., and created in runtime, so functions for them are created on-demand, similarly to tojson(). The comparison remains unchanged - underneath this is still byte-by-byte weak lex ordering. Fixes #5139 * jul-stas/5139-minmax-bad-printing-collections: cql_query_tests: Added tests for min/max/count on collections cql3: min()/max() for collections/tuples/UDTs do not cast to blobs	2019-12-06 16:40:17 +01:00
Juliusz Stasiewicz	75955beb0b	cql_query_tests: Added tests for min/max/count on collections This tests new min/max function for collections and tuples. CFs in test suite were named according to types being tested, e.g. `cf_map<int,text>' what is not a valid CF name. Therefore, these names required "escaping" of invalid characters, here: simply replacing with '_'.	2019-12-06 12:15:49 +01:00
Juliusz Stasiewicz	9efad36fb8	cql3: min()/max() for collections/tuples/UDTs do not cast to blobs Before: cqlsh> insert into ks.list_types (id, val) values (1, [3,4,5]); cqlsh> select max(val) from ks.list_types; system.max(val) ------------------------------------------------------------ 0x00000003000000040000000300000004000000040000000400000005 After: cqlsh> select max(val) from ks.list_types; system.max(val) -------------------- [3, 4, 5] This is accomplished similarly to `tojson()`/`fromjson()`: functions are generated on demand from within `cql3::functions::get()`. Because collections can have a variety of types, including UDTs and tuples, it would be impossible to statically define max(T t)->T for every T. Until now, max(blob)->blob overload was used. Because `impl_max/min_function_for` is templated with the input/output type, which can be defined in runtime, we need type-erased ("dynamic") versions of these functors. They work identically, i.e. they compare byte representations of lhs and rhs with `bytes::operator<`. Resolves #5139	2019-12-06 12:14:51 +01:00
Avi Kivity	a18a921308	docs: maintainer.md: use command line to merge multi-commit pull requests If you merge a pull request that contains multiple patches via the github interface, it will document itself as the committer. Work around this brain damage by using the command line.	2019-12-06 10:59:46 +01:00
Botond Dénes	7b37a700e1	configure.py: make tests explicitely depend on libseastar_testing.a So that changes to libseastar_testing.a make all test target out of date. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191205142436.560823-1-bdenes@scylladb.com>	2019-12-05 19:30:34 +02:00
Piotr Sarna	3a46b1bb2b	Merge "handle hints on separate connection and scheduling group" from Piotr Introduce a new verb dedicated for receiving and sending hints: HINT_MUTATION. It is handled on the streaming connection, which is separate from the one used for handling mutations sent by coordinator during a write. The intent of using a separate connection is to increase fairness while handling hints and user requests - this way, a situation can be avoided in which one type of requests saturate the connection, negatively impacting the other one. Information about new RPC support is propagated through new gossip feature HINTED_HANDOFF_SEPARATE_CONNECTION. Fixes #4974. Tests: unit(release)	2019-12-05 17:25:26 +01:00
Calle Wilund	c11874d851	gms::inet_address: Use special ostream formatting to match Java To make gms::inet_address::to_string() similar in output to origin. The sole purpose being quick and easy fix of API/JMX ipv6 formatting of endpoints etc, where strings are used as lexical comparisons instead of textual representation. A better, but more work, solution is to fix the scylla-jmx bridge to do explicit parse + re-format of addresses, but there are many such callpoints. An even better solution would be to fix nodetool to not make this mistake of doing lexical comparisons, but then we risk breaking merge compatibility. But could be an option for a separate nodeprobe impl. Message-Id: <20191204135319.1142-1-calle@scylladb.com>	2019-12-05 17:01:26 +02:00
Gleb Natapov	4893bc9139	tracing: split adding prepared query parameters from stopping of a trace Currently query_options objects is passed to a trace stopping function which makes it mandatory to make them alive until the end of the query. The reason for that is to add prepared statement parameters to the trace. All other query options that we want to put in the trace are copied into trace_state::params_values, so lets copy prepared statement parameters there too. Trace enabled case will become a little bit more expensive but on the other hand we can drop a continuation that holds query_options object alive from a fast path. It is safe to drop the call to stop_foreground_prepared() here since The tracing will be stopped in process_request_one(). Message-Id: <20191205102026.GJ9084@scylladb.com>	2019-12-05 17:00:47 +02:00
Tomasz Grabiec	aa173898d6	Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz Selected semaphores' names are now included in exception messages in case of timeout or when admission queue overflows. Resolves #5281	2019-12-05 14:19:56 +01:00
Nadav Har'El	5b2f35a21a	Merge "Redis: fix the options related to Redis API, fix the DEL and GET command" Merged pull request https://github.com/scylladb/scylla/pull/5381 by Peng Jian, fixing multiple small issues with Redis: * Rename the options related to Redis API, and describe them clearly. * Rename redis_transport_port to redis_port * Rename redis_transport_port_ssl to redis_ssl_port * Rename redis_default_database_count to redis_database_count * Remove unnecessary option enable_redis_protocol * Modify the default value of opition redis_read_consistency_level and redis_write_consistency_level to LOCAL_QUORUM * Fix the DEL command: support to delete mutilple keys in one command. * Fix the GET command: return the empty string when the required key is not exists. * Fix the redis-test/test_del_non_existent_key: mark xfail.	2019-12-05 11:58:34 +02:00
Avi Kivity	85822c7786	database: fix schema use-after-move in make_multishard_streaming_reader On aarch64, asan detected a use-after-move. It doesn't happen on x86_64, likely due to different argument evaluation order. Fix by evaluating full_slice before moving the schema. Note: I used "auto&&" and "std::move()" even though full_slice() returns a reference. I think this is safer in case full_slice() changes, and works just as well with a reference. Fixes #5419.	2019-12-05 11:58:34 +02:00
Piotr Sarna	79c3a508f4	table: Reduce read amplification in view update generation This commit makes sure that single-partition readers for read-before-write do not have fast-forwarding enabled, as it may lead to huge read amplification. The observed case was: 1. Creating an index. CREATE INDEX index1 ON myks2.standard1 ("C1"); 2. Running cassandra-stress in order to generate view updates. cassandra-stress write no-warmup n=1000000 cl=ONE -schema \ 'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \ keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors skip-read-validation -node 127.0.0.1; Without disabling fast-forwarding, single-partition readers were turned into scanning readers in cache, which resulted in reading 36GB (sic!) on a workload which generates less than 1GB of view updates. After applying the fix, the number dropped down to less than 1GB, as expected. Refs #5409 Fixes #4615 Fixes #5418	2019-12-05 11:58:34 +02:00
Konstantin Osipov	6a5e7c0e22	tests: reduce the number of iterations of dynamic_bitset_test This test execution time dominates by a serious margin test execution time in dev/release mode: reducing its execution time improves the test.py turnaround by over 70%. Message-Id: <20191204135315.86374-2-kostja@scylladb.com>	2019-12-05 11:58:34 +02:00
Avi Kivity	07427c89a2	gdb: change 'scylla thread' command to access fs_base register directly Currently, 'scylla thread' uses arch_prctl() to extract the value of fsbase, used to reference thread local variables. gdb 8 added support for directly accessing the value as $fs_base, so use that instead. This works from core dumps as well as live processes, as you don't need to execute inferior functions. The patch is required for debugging threads in core dumps, but not sufficient, as we still need to set $rip and $rsp, and gdb still[1] doesn't allow this. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=9370	2019-12-05 11:58:34 +02:00
Piotr Dulikowski	adfa7d7b8d	messaging_service: don't move `unsigned` values in handlers Performing std::move on integral types is pointless. This commit gets rid of moves of values of `unsigned` type in rpc handlers.	2019-12-05 00:58:31 +01:00
Piotr Dulikowski	77d2ceaeba	storage_proxy: handle hints through separate rpc verb	2019-12-05 00:51:52 +01:00
Piotr Dulikowski	2609065090	storage_proxy: move register_mutation handler to local lambda This refactor makes it possible to reuse the lambda in following commits.	2019-12-05 00:51:52 +01:00
Piotr Dulikowski	6198ee2735	hh: introduce HINTED_HANDOFF_SEPARATE_CONNECTION feature The feature introduced by this commit declares that hints can be sent using the new dedicated RPC verb. Before using the new verb, nodes need to know if other nodes in the cluster will be able to handle the new RPC verb.	2019-12-05 00:51:52 +01:00
Piotr Dulikowski	2e802ca650	hh: add HINT_MUTATION verb Introduce a new verb dedicated for receiving and sending hints: HINT_MUTATION. It is handled on the streaming connection, which is separate from the one used for handling mutations sent by coordinator during a write. The intent of using a separate connection is to increase fariness while handling hints and user requests - this way, a situation can be avoided in which one type of requests saturate the connection, negatively impacting the other one.	2019-12-05 00:51:49 +01:00
Avi Kivity	fd951a36e3	Merge "Let compaction wait on background deletions" from Benny " In several cases in distributed testing (dtest) we trigger compaction using nodetool compact assuming that when it is done, it is indeed really done. However, the way compaction is currently implemented in scylla, it may leave behind some background tasks to delete the old sstables that were compacted. This commit changes major compaction (triggered via the ss::force_keyspace_compaction api) so it would wait on the background deletes and will return only when they finish. Fixes #4909 Tests: unit(dev), nodetool_refresh_with_data_perms_test, test_nodetool_snapshot_during_major_compaction "	2019-12-04 11:18:41 +02:00
Takuya ASADA	c9d8606786	dist/common/scripts/scylla_ntp_setup: relax RHEL version check We may able to use chrony setup script on future version of RHEL/CentOS, it better to run chrony setup when RHEL version >= 8, not only 8. Note that on Fedora it still provides ntp/ntpdate package, so we run ntp setup on it for now. (same on debian variants) Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191203192812.5861-1-syuu@scylladb.com>	2019-12-04 10:59:14 +02:00
Juliusz Stasiewicz	430b2ad19d	commitlog+region_group: timeout exceptions with names `segment_manager' now uses a decorated version of `timed_out_error' with hardcoded name. On the other hand `region_group' uses named `on_request_expiry' within its `expiring_fifo'.	2019-12-03 19:07:19 +01:00
Avi Kivity	91d3f2afce	docs: maintainers.md: fix typo in git push --force-with-lease Just one lease, not many. Reported by Piotr Sarna.	2019-12-03 18:17:46 +01:00
Calle Wilund	56a5e0a251	commitlog_replayer: Ensure applied frozen_mutation is safe during apply Fixes #5211 In `79935df959` replay apply-call was changed from one with no continuation to one with. But the frozen mutation arg was still just lambda local. Change to use do_with for this case as well. Message-Id: <20191203162606.1664-1-calle@scylladb.com>	2019-12-03 18:28:01 +02:00
Juliusz Stasiewicz	d043393f52	db+semaphores+tests: mandatory `name' param in reader_concurrency_semaphore Exception messages contain semaphore's name (provided in ctor). This affects the queue overflow exception as well as timeout exception. Also, custom throwing function in ctor was changed to `prethrow_action', i.e. metrics can still be updated there but now callers have no control over the type of the exception being thrown. This affected `restricted_reader_max_queue_length' test. `reader_concurrency_semaphore'-s docs are updated accordingly.	2019-12-03 15:41:34 +01:00
Amos Kong	e26b396f16	scylla-docker: fix default data_directories in scyllasetup.py (#5399 ) Use default data_file_directories if it's not assigned in scylla.yaml Fixes #5398 Signed-off-by: Amos Kong <amos@scylladb.com>	2019-12-03 13:58:17 +02:00
Rafael Ávila de Espíndola	1cd17887fa	build: strip debug when configured with --debuginfo 0 In a build configured with --debuginfo 0 the scylla binary still ends up with some debug info from the libraries that are statically linked in. We should avoid compiling subprojects (including seastar) with debug info when none is needed, but this at least avoids it showing up in the binary. The main motivation for this is that it is confusing to get a binary with some debug info in it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191127215843.44992-1-espindola@scylladb.com>	2019-12-03 12:41:04 +02:00
Tomasz Grabiec	0a453e5d30	Merge "Use fragmented buffers for collection de/serialization" from Botond This series refactors the collection de/serialization code to use fragmented buffers, avoiding the large allocations and the associated pains when working with large collections. Currently all operations that involve collections require deserializing them, executing the operation, then serializing them again to their internal storage format. The de/serialization operations happen in linearized buffers, which means that we have to allocate a buffer large enough to hold the entire collection. This can cause immense pressure on the memory allocator, which, in the face of memory fragmentation, might be unable to serve the allocation at all. We've seen this causing all sorts of nasty problems, including but not limited to: failing compactions, failing memtable flush, OOM crash and etc. Users are strongly discouraged from using large collections, yet they are still a fact of life and have been haunting us since forever. The proper solution for these problems would be to come up with an in-memory format for collections, however that is a major effort, with a lot of unknowns. This is something we plan on doing at some point but until it happens we should make life less painful for those with large collections. The goal of this series is to avoid the need of allocating these large buffers. Serialization now happens into a `bytes_ostream` which automatically fragments the values internally. Deserialization happens with `utils::linearizing_input_stream` (introduced by this series), which linearizes only the individual collection cells, but not the entire collection. An important goal of this series was to introduce the least amount of risk, and hence the least amount of code. This series does not try to make a revolution and completely revamp and optimize the de/serialization codepaths. These codepaths have their days numbered so investing a lot of effort into them is in vain. We can apply incremental optimizations where we deem it necessary. Fixes: #5341	2019-12-03 10:31:34 +01:00
fastio	01599ffbae	Redis API: Support the syntax of deleting multiple keys in one DEL command, fix the returning value for GET command. Support to delete multiple keys in one DEL command. The feature of returning number of the really deleted keys is still not supported. Return empty string to client for GET command when the required key is not exists. Fixes: #5334 Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>	2019-12-03 17:27:40 +08:00
fastio	039b83ad3b	Redis API: Rename options related to Redis API, describe them clearly, and remove unnecessary one. Rename option redis_transport_port to redis_port, which the redis transport listens on for clients. Rename option redis_transport_port_ssl to redis_ssl_port, which the redis TLS transport listens on for clients. Rename option redis_database_count. Set the redis dabase count. Rename option redis_keyspace_opitons to redis_keyspace_replication_strategy_options. Set the replication strategy for redis keyspace. Remove option enable_redis_protocol, which is unnecessary. Fixes: #5335 Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>	2019-12-03 17:13:35 +08:00
Nadav Har'El	7b93360c8d	Merge: redis: skip processing request of EOF Merged pull request https://github.com/scylladb/scylla/pull/5393/ by Amos Kong: ` When I test the redis cmd by echo and nc, there is a redundant error in the end. I checked by strace, currently if client read nothing from stdin, it will shutdown the socket, redis server will read nothing (0 byte) from socket. But it tries to process the empty command and returns an error. $ echo -n -e '1\r\n$4\r\nping\r\n' \|strace nc localhost 6379 \| ... \| read(0, "1\r\n$4\r\nping\r\n", 8192) = 14 \| select(5, [4], [4], [], NULL) = 1 (out [4]) \|>>> sendto(4, "1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14 \| select(5, [0 4], [], [], NULL) = 1 (in [0]) \| recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket) \| read(0, "", 8192) = 0 \|>>> shutdown(4, SHUT_WR) = 0 \| select(5, [4], [], [], NULL) = 1 (in [4]) \| recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32 \| write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG \| -ERR unknown command '' \| ) = 32 \| select(5, [4], [], [], NULL) = 1 (in [4]) \| recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0 \| close(1) = 0 \| close(4) = 0 Current result: $ echo -n -e '' \|nc localhost 6379 -ERR unknown command '' $ echo -n -e '1\r\n$4\r\nping\r\n' \|nc localhost 6379 +PONG -ERR unknown command '' Expected: $ echo -n -e '' \|nc localhost 6379 $ echo -n -e '*1\r\n$4\r\nping\r\n' \|nc localhost 6379 +PONG	2019-12-03 10:40:20 +02:00
Avi Kivity	83feb9ea77	tools: toolchain: update frozen image Commit `96009881d8` added diffutils to the dependencies via Seastar's install-dependencies.sh, after it was inadvertantly dropped in `1164ff5329` (update to Fedora 31; diffutils is no longer brought in as a side effect of something else). Regenerate the image to include diffutils. Ref #5401.	2019-12-03 10:36:55 +02:00
Amos Kong	fb9af2a86b	redis-test: add test_raw_cmd.py This patch added subtests for EOF process, it reads and writes the socket directly by using protocol cmds. We can add more tests in future, tests with Redis module will hide some protocol error. Signed-off-by: Amos Kong <amos@scylladb.com>	2019-12-03 10:47:56 +08:00
Amos Kong	4fa862adf4	redis: skip processing request of EOF When I test the redis cmd by echo and nc, there is a redundant error in the end. I checked by strace, currently if client read nothing from stdin, it will shutdown the socket, redis server will read nothing (0 byte) from socket. But it tries to process the empty command and returns an error. $ echo -n -e '1\r\n$4\r\nping\r\n' \|strace nc localhost 6379 \| ... \| read(0, "1\r\n$4\r\nping\r\n", 8192) = 14 \| select(5, [4], [4], [], NULL) = 1 (out [4]) \|>>> sendto(4, "1\r\n$4\r\nping\r\n", 14, 0, NULL, 0) = 14 \| select(5, [0 4], [], [], NULL) = 1 (in [0]) \| recvfrom(0, 0x7ffe4d5b6c70, 8192, 0, 0x7ffe4d5b6bf0, 0x7ffe4d5b6bec) = -1 ENOTSOCK (Socket operation on non-socket) \| read(0, "", 8192) = 0 \|>>> shutdown(4, SHUT_WR) = 0 \| select(5, [4], [], [], NULL) = 1 (in [4]) \| recvfrom(4, "+PONG\r\n-ERR unknown command ''\r\n", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 32 \| write(1, "+PONG\r\n-ERR unknown command ''\r\n", 32+PONG \| -ERR unknown command '' \| ) = 32 \| select(5, [4], [], [], NULL) = 1 (in [4]) \| recvfrom(4, "", 8192, 0, 0x7ffe4d5b6bf0, [0]) = 0 \| close(1) = 0 \| close(4) = 0 Current result: $ echo -n -e '' \|nc localhost 6379 -ERR unknown command '' $ echo -n -e '1\r\n$4\r\nping\r\n' \|nc localhost 6379 +PONG -ERR unknown command '' Expected: $ echo -n -e '' \|nc localhost 6379 $ echo -n -e '*1\r\n$4\r\nping\r\n' \|nc localhost 6379 +PONG Signed-off-by: Amos Kong <amos@scylladb.com>	2019-12-03 10:47:56 +08:00
Rafael Ávila de Espíndola	bb114de023	dbuild: Fix confusion about relabeling podman needs to relabel directories in exactly the same cases docker does. The difference is that podman cannot relabel /tmp. The reason it was working before is that in practice anyone using dbuild has already relabeled any directories that need relabeling, with the exception of /tmp, since it is recreated on every boot. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191201235614.10511-2-espindola@scylladb.com>	2019-12-02 18:38:16 +02:00
Rafael Ávila de Espíndola	867cdbda28	dbuild: Use a temporary directory for /tmp With this we don't have to use --security-opt label=disable. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191201235614.10511-1-espindola@scylladb.com>	2019-12-02 18:38:14 +02:00
Botond Dénes	1d1f8b0d82	tests: mutation_test: add large collection allocation test Checking that there are no large allocations when a large collection is de/serialized.	2019-12-02 17:13:53 +02:00
Avi Kivity	28355af134	docs: add maintainer's handbook (#5396 ) This is a list of recipes used by maintainers to maintain scylla.git.	2019-12-02 15:01:54 +02:00
Calle Wilund	8c6d6254cf	cdc: Remove some code from header	2019-12-02 13:00:19 +00:00
Botond Dénes	4c59487502	collection_mutation: don't linearize the buffer on deserialization Use `utils::linearizing_input_stream` for the deserizalization of the collection. Allows for avoiding the linearization of the entire cell value, instead only linearizing individual values as they are deserialized from the buffer.	2019-12-02 10:10:31 +02:00
Botond Dénes	690e9d2b44	utils: introduce linearizing_input_stream `linearizing_input_stream` allows transparently reading linearized values from a fragmented buffer. This is done by linearizing on-the-fly only those read values that happen to be split across multiple fragments. This reduces the size of the largest allocation from the size of the entire buffer (when the entire buffer is linearized) to the size of the largest read value. This is a huge gain when the buffer contains loads of small objects, and modest gains when the buffer contains few large objects. But the even in the worst case the size of the largest allocation will be less or equal compared to the case where the entire buffer is linearized. This stream is planned to be used as glue code between the fragmented cell value and the collection deserialization code which expects to be reading linearized values.	2019-12-02 10:10:31 +02:00
Botond Dénes	065d8d37eb	tests: random-utils: get_string(): add overload that takes engine parameter	2019-12-02 10:10:31 +02:00
Botond Dénes	2f9307c973	collection_mutation: use a fragmented buffer for serialization For the serialization `bytes_ostream` is used.	2019-12-02 10:10:31 +02:00
Botond Dénes	fc5b096f73	imr: value_writer::write_to_destination(): don't dereference chunk iterator eagerly Currently the loop which writes the data from the fragmented origin to the destination, moves to the next chunk eagerly after writing the value of the current chunk, if the current chunk is exhausted. This presents a problem when we are writing the last piece of data from the last chunk, as the chunk will be exhausted and we eagerly attempt to move to the next chunk, which doesn't exist and dereferencing it will fail. The solution is to not be eager about moving to the next chunk and only attempt it if we actually have more data to write and hence expect more chunks.	2019-12-02 10:10:31 +02:00
Botond Dénes	875314fc4b	bytes_ostream: make it a FragmentRange The presence of `const_iterator` seems to be a requirement as well although it is not part of the concept. But perhaps it is just an assumption made by code using it.	2019-12-02 10:10:31 +02:00
Botond Dénes	4054ba0c45	serialization: accept any CharOutputIterator Not just bytes::output_iterator. Allow writing into streams other than just `bytes`. In fact we should be very careful with writing into `bytes` as they require potentially large contiguous allocations. The `write()` method is now templatized also on the type of its first argument, which now accepts any CharOutputIterator. Due to our poor usage of namespace this now collides with `write` defined inside `db/commitlog/commitlog.cc`. Luckily, the latter doesn't really have to be templatized on the data type it reads from, and de-templatizing it resolves the clash.	2019-12-02 10:10:31 +02:00
Botond Dénes	07007edab9	bytes_ostream: add output_iterator To allow it being used for serialization code, which works in terms of output iterators.	2019-12-02 10:10:31 +02:00
Takuya ASADA	c5a95210fe	dist/common/scripts/scylla_setup: list virtio-blk devices correctly on interactive RAID setup Currently interactive RAID setup prompt does not list virtio-blk devices due to following reasons: - We fail matching '-p' option on 'lsblk --help' output since misusage of regex functon, list_block_devices() always skipping to use lsblk output. - We don't check existance of /dev/vd* when we skipping to use lsblk. - We mistakenly excluded virtio-blk devices on 'lsblk -pnr' output using '-e' option, but we actually needed them. To fix the problem we need to use re.search() instead of re.match() to match '-p' option on 'lsblk --help', need to add '/dev/vd*' on block device list, then need to stop '-e 252' option on lsblk which excludes virtio-blk. Additionally, it better to parse 'TYPE' field of lsblk output, we should skip 'loop' devices and 'rom' devices since these are not disk devices. Fixes #4066 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191201160143.219456-1-syuu@scylladb.com>	2019-12-01 18:36:48 +02:00
Takuya ASADA	124da83103	dist/common/scripts: use chrony as NTP server on RHEL8/CentOS8 We need to use chrony as NTP server on RHEL8/CentOS8, since it dropped ntpd/ntpdate. Fixes #4571 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20191101174032.29171-1-syuu@scylladb.com>	2019-12-01 18:35:03 +02:00
Nadav Har'El	b82417ba27	Merge "alternator: Implement Expected operators LE, GE, and BETWEEN" Merged pull request https://github.com/scylladb/scylla/pull/5392 from Dejan Mircevski. Refs #5034 The patches: alternator: Implement LE operator in Expected alternator: Implement GE operator in Expected alternator: Make cmp diagnostic a value, not funct utils: Add operator<< for big_decimal alternator: Implement BETWEEN operator in Expected	2019-12-01 16:11:11 +02:00
Nadav Har'El	8614c30bcf	Merge "implement echo command" Merged pull request https://github.com/scylladb/scylla/pull/5387 from Amos Kong: This patch implemented echo command, which return the string back to client. Reference: https://redis.io/commands/echo	2019-12-01 10:29:57 +02:00
Amos Kong	49fee4120e	redis-test: add test_echo Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-30 13:32:00 +08:00
Amos Kong	3e2034f07b	redis: implement echo command This patch implemented echo command, which return the string back to client. Reference: - https://redis.io/commands/echo Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-30 13:30:35 +08:00
Dejan Mircevski	dcb1b360ba	alternator: Implement BETWEEN operator in Expected Enable existing BETWEEN test, and add some more coverage to it. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 16:47:21 -05:00
Dejan Mircevski	c43b286f35	utils: Add operator<< for big_decimal ... and remove an existing duplicate from lua.cc. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 15:32:09 -05:00
Dejan Mircevski	e0d77739cc	alternator: Make cmp diagnostic a value, not funct All check_compare diagnostics are static strings, so there's no need to call functions to get them. Instead of a function, make diagnostic a simple value. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 15:09:05 -05:00
Dejan Mircevski	65cb84150a	alternator: Implement GE operator in Expected Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 12:29:08 -05:00
Dejan Mircevski	f201f0eaee	alternator: Implement LE operator in Expected Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 11:59:52 -05:00
Avi Kivity	96009881d8	Update seastar submodule * seastar 8eb6a67a4...166061da3 (3): > install-dependencies.sh: add diffutils > reactor: replace std::optional (in _network_stack_ready) with compat::optional > noncopyable_function: disable -Wuninitialized warning in noncopyable_function_base Ref #5386.	2019-11-29 12:50:48 +02:00
Tomasz Grabiec	6562c60c86	Merge "test.py: terminate children upon signal" from Kostja Allows a signal to terminate the outstanding test tasks, to avoid dangling children.	2019-11-29 12:05:03 +02:00
Pekka Enberg	bb227cf2b4	Merge "Fix default directories in Scylla setup scripts" from Amos "Fix two problem in scylla_io_setup: - Problem 1: paths of default directories is invalid, introduced by commit `5ec1915` ("scylla_io_setup: assume default directories under /var/lib/scylla"). - Problem 2: wrong path join, introduced by commit `31ddb21` ("dist/common/scripts: support nonroot mode on setup scripts"). Fix a problem in scylla_io_setup, scylla_fstrim and scylla_blocktune.py: - Fixed default scylla directories when they aren't assigned in scylla.yaml" Fixes #5370 Reviewed-by: Pavel Emelyanov <xemul@scylladb.com> * 'scylla_io_setup' of git://github.com/amoskong/scylla: use parse_scylla_dirs_with_default to get scylla directories scylla_io_setup: fix data_file_directories check scylla_util: introduce helper to process the default scylla directories scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml scylla_io_setup: fix path join of default scylla directories	2019-11-29 12:05:03 +02:00
Ultrabug	61f1e6e99c	test.py: fix undefined variable 'options' in write_xunit_report()	2019-11-28 19:06:22 +03:00
Ultrabug	5bdc0386c4	test.py: comparison to False should be 'if cond is False:'	2019-11-28 19:06:22 +03:00
Ultrabug	737b1cff5e	test.py: use isinstance() for type comparison	2019-11-28 19:06:22 +03:00
Konstantin Osipov	c611325381	test.py: terminate children upon signal Use asyncio as a more modern way to work with concurrency, Process signals in an event loop, terminate all outstanding tests before exiting. Breaking change: this commit requires Python 3.7 or newer to run this script. The patch adds a version check and a message to enforce it.	2019-11-28 19:06:22 +03:00
Botond Dénes	cf24f4fe30	imr: move documentation to docs/ Where all the other documentation is, and hence where people would be looking for it. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191128144612.378244-1-bdenes@scylladb.com>	2019-11-28 16:47:52 +02:00
Avi Kivity	36dd0140a8	Update seastar submodule * seastar 5c25de907a...8eb6a67a4b (1): > util/backtrace.hh: add missing print.hh include	2019-11-28 16:47:16 +02:00
Benny Halevy	7aef39e400	tracing: one_session_records: keep local tracing ptr Similar to trace_state keep shared_ptr<tracing> _local_tracing_ptr in one_session_records when constructed so it can be used during shutdown. Fixes #5243 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-11-28 15:24:10 +01:00
Gleb Natapov	75499896ab	client_state: store _user as optional instead of shared_ptr _user cannot outlive client_state class instance, so there is no point in holding it in shared_ptr. Tested: debug test.py and dtest auth_test.py Message-Id: <20191128131217.26294-5-gleb@scylladb.com>	2019-11-28 15:48:59 +02:00
Gleb Natapov	1538cea043	cql: modification_statement: store _restrictions as optional instead of shared_ptr _restrictions can be optional since its lifetime is managed by modification_statement class explicitly. Message-Id: <20191128131217.26294-4-gleb@scylladb.com>	2019-11-28 15:48:54 +02:00
Gleb Natapov	ce5d6d5eee	storage_service: store thrift server as an optional instead of shared_ptr Only do_stop_rpc_server uses the shared_ptr to prolong server's lifetime until stop() completes, but do_with() can be used to achieve the same. Message-Id: <20191128131217.26294-3-gleb@scylladb.com>	2019-11-28 15:48:51 +02:00
Gleb Natapov	b9b99431a8	storage_service: store cql server as an optional instead of shared_ptr Only do_stop_native_transport() uses the shared_ptr to prolong server's lifetime until stop() completes, but do_with() can be used to achieve the same. Message-Id: <20191128131217.26294-2-gleb@scylladb.com>	2019-11-28 15:48:47 +02:00
Avi Kivity	2b7e97514a	Update seastar submodule * seastar 6f0ef32514...5c25de907a (7): > shared_future: Fix crash when all returned futures time out Fixes #5322. > future: don't create temporaries on get_value(). > reactor: lower the default stall threshold to 200ms > reactor: Simplify network initialization > reactor: Replace most std::function with noncopyable_function > futures: Avoid extra moves in SEASTAR_TYPE_ERASE_MORE mode > inet_address: Make inet_address == operator ignore scope (again)	2019-11-28 14:48:01 +02:00
Juliusz Stasiewicz	fa12394dfe	reader_concurrency_semaphore: cosmetic changes Added line breaks, replaced unused include, included seastarx.hh instead of `using namespace seastar`.	2019-11-28 13:39:08 +01:00
Nadav Har'El	fde336a882	Merged "5139 minmax bad printing" Merged pull request https://github.com/scylladb/scylla/pull/5311 from Juliusz Stasiewicz: This is a partial solution to #5139 (only for two types) because of the above and because collections are much harder to do. They are coming in a separate PR.	2019-11-28 14:06:43 +02:00
Juliusz Stasiewicz	3b9ebca269	tests/cql_query_test: add test for aggregates on inet+time_type This is a test to max(), min() and count() system functions on the arguments of types: `net::inet_address` and `time_native_type`.	2019-11-28 11:20:43 +01:00
Juliusz Stasiewicz	9c23d89531	cql3/functions: add missing min/max/count for inet and time type References #5139. Aggregate functions, like max(), when invoked on `inet_address' and `time_native_type' used to choose max(blob)->blob overload, with casting of argument and result to bytes. This is because appropriate calls to `aggregate_fcts::make_XXX_function()' were missing. This commit adds them. Functioning remains the same but now clients see user-friendly representations of aggregate result, not binary. Comparing inet addresses without inet::operator< is performed by trick, where ADL is bypassed by wrapping the name of std::min/max and providing an overload of wrapper on inet type.	2019-11-28 11:18:31 +01:00
Pavel Emelyanov	8532093c61	cql: The cql_server does not need proxy reference Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191127153842.4098-1-xemul@scylladb.com>	2019-11-28 10:58:46 +01:00
Amos Kong	e2eb754d03	use parse_scylla_dirs_with_default to get scylla directories Use default data_file_directories/commitlog_directory if it's not assigned in scylla.yaml Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 15:48:14 +08:00
Amos Kong	bd265bda4f	scylla_io_setup: fix data_file_directories check Use default data_file_directories if it's not assigned in scylla.yaml Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 15:47:56 +08:00
Amos Kong	123c791366	scylla_util: introduce helper to process the default scylla directories Currently we support to assign workdir from scylla.yaml, and we use many hardcode '/var/lib/scylla' in setup scripts. Some setup scripts get scylla directories by parsing scylla.yaml, introduced parse_scylla_dirs_with_default() that adds default values if scylla directories aren't assigned in scylla.yaml Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 14:54:32 +08:00
Amos Kong	b75061b4bc	scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 14:38:01 +08:00
Amos Kong	ada0e92b85	scylla_io_setup: fix path join of default scylla directories Currently we are checking an invalid path of some default scylla directories, the directories don't exist, so the tune will always be skipped. It caused by two problem. Problem 1: paths of default directories is invalid Introduced by commit `5ec191536e`, we try to tune some scylla default directories if they exist. But the directory paths we try are wrong. For example: - What we check: /var/lib/scylla/commitlog_directory - Correct one: /var/lib/scylla/commitlog Problem 2: wrong path join Introduced by commit `31ddb2145a`, default_path might be replaced from '/var/lib/scylla/' to '/var/lib/scylla'. Our code tries to check an invalid path that is wrongly join, eg: '/var/lib/scyllacommitlog' Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-28 14:37:58 +08:00
Amos Kong	d4a26f2ad0	scylla_util: get_scylla_dirs: return default data/commitlog directories if they aren't set (#5358 ) The default values of data_file_directories and commitlog_directory were commented by commit `e0f40ed16a`. It causes scylla_util.py:get_scylla_dirs() to fail in checking the values. This patch changed get_scylla_dirs() to return default data/commitlog directories if they aren't set. Fixes #5358 Reviewed-by: Pavel Emelyanov <xemul@scylladb.com> Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-27 13:52:05 +02:00
Nadav Har'El	cb1ed5eab2	alternator-test: test Query's Limit parameter Add a test, test_query.py::test_query_limit, to verify that the Limit parameter correctly limits the number of rows returned by the Query. This was supposed to already work correctly - but we never had a test for it. As we hoped, the test passes (on both Alternator and DynamoDB). Another test, test_query.py::test_query_limit_paging, verifies that paging can be done with any setting of Limit. We already had tests for paging of the Scan operation, but not for the Query operation. Refs #5153 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-27 12:27:26 +01:00
Nadav Har'El	c01ca661a0	alternator-test: Select parameter of Query and Scan This is a comprehensive test for the "Select" parameter of Query and Scan operations, but only for the base-table case, not index, so another future patch should add similar tests in test_gsi.py and test_lsi.py as well. The main use of the Select parameter is to allow returning just the count of items, instead of their content, but it also has other esoteric options, all of which we test here. The test currently succeeds on AWS DynamoDB, demonstrating that the test is correct, but fails on Alternator because the "Select" parameter is not yet supported. So the test is marked xfail. Refs #5058 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-27 12:22:33 +01:00
Botond Dénes	9d09f57ba5	scylla-gdb.py: scylla_smp_queues: use lazy initalization Currently the command tries to read all seastar smp queues in its initialization code in the constructor. This constructor is run each time `scylla-gdb.py` is sourced in `gdb` which leads to slowdowns and sometimes also annoying errors because the sourcing happens in the wrong context and seastar symbols are not available. Avoid this by running this initializing code lazily, on the first invocation. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191127095408.112101-1-bdenes@scylladb.com>	2019-11-27 12:04:57 +01:00
Tomasz Grabiec	87b72dad3e	Merge "treewide: add missing const qualifiers" from Pavel Solodovnikov This patchset adds missing "const" function qualifiers throughout the Scylla code base, which would make code less error-prone. The changeset incorporates Kostja's work regarding const qualifiers in the cql code hierarchy along with a follow-up patch addressing the review comment of the corresponding patch set (the patch subject is "cql: propagate const property through prepared statement tree.").	2019-11-27 10:56:20 +01:00
Rafael Ávila de Espíndola	91b43f1f06	dbuild: fix podman with selinux enabled With this change I am able to run tests using docker-podman. The option also exists in docker. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191126194101.25221-1-espindola@scylladb.com>	2019-11-26 21:50:56 +02:00
Rafael Ávila de Espíndola	480055d3b5	dbuild: Fix missing docker options With the recent changes docker was missing a few options. In particular, it was missing -u. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191126194347.25699-1-espindola@scylladb.com>	2019-11-26 21:45:31 +02:00
Rafael Ávila de Espíndola	c0a2cd70ff	lua: fix test with boost 1.66 The boost 1.67 release notes says Changed maximum supported year from 10000 to 9999 to resolve various issues So change the test to use a larger number so that we get an exception with both boost 1.66 and boost 1.67. Fixes #5344 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191126180327.93545-1-espindola@scylladb.com>	2019-11-26 21:17:15 +02:00
Pavel Solodovnikov	55a1d46133	cql: some more missing const qualifiers There are several virtual functions in public interfaces named "is_*" that clearly should be marked as "const", so fix that.	2019-11-26 17:57:51 +03:00
Pavel Solodovnikov	412f1f946a	cql: remove "mutable" on _opts in select_statement _opts initialization can be safely done in the constructor, hence no need to make it mutable.	2019-11-26 17:55:10 +03:00
Piotr Sarna	d90dbd6ab0	Merge "support podman as a replacement to docker" from Avi Docker on Fedora 31 is flakey, and is not supported at all on RHEL 8. Podman is a drop-in replacement for docker; this series adds support for using podman in dbuild. Apart from actually working on Fedora 31 hosts, podman is nicer in being more secure and not requiring a daemon. Fixes #5332	2019-11-26 15:17:49 +01:00
Tomasz Grabiec	5c9fe83615	Merge "Sanitize sub-modules shutting down" from Pavel As suggested in issue #4586 here is the helper that prints "shutting down foo" message, then shuts the foo down, then prints the "[it] was successull" one. In between it catches the exception (if any) and warns this in logs. By "then" I mean literally then, not the seastar's then() :) Fixes: #4586	2019-11-26 15:14:22 +02:00
Piotr Sarna	9c5a5a5ac2	treewide: add names to semaphores By default, semaphore exceptions bring along very little context: either that a semaphore was broken or that it timed out. In order to make debugging easier without introducing significant runtime costs, a notion of named semaphore is added. A named semaphore is simply a semaphore with statically defined name, which is present in its errors, bringing valuable context. A semaphore defined as: auto sem = semaphore(0); will present the following message when it breaks: "Semaphore broken" However, a named semaphore: auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"}); will present a message with at least some debugging context: "Semaphore broken: io_concurrency_sem" It's not much, but it would really help in pinpointing bugs without having to inspect core dumps. At the same time, it does not incur any costs for normal semaphore operations (except for its creation), but instead only uses more CPU in case an error is actually thrown, which is considered rare and not to be on the hot path. Refs #4999 Tests: unit(dev), manual: hardcoding a failure in view building code	2019-11-26 15:14:21 +02:00
Avi Kivity	6fbb724140	conf: remove unsupported options from scylla.yaml (#5299 ) These unsupported options do nothing except to confuse users who try to tune them. Options removed: hinted_handoff_throttle_in_kb max_hints_delivery_threads batchlog_replay_throttle_in_kb key_cache_size_in_mb key_cache_save_period key_cache_keys_to_save row_cache_size_in_mb row_cache_save_period row_cache_keys_to_save counter_cache_size_in_mb counter_cache_save_period counter_cache_keys_to_save memory_allocator saved_caches_directory concurrent_reads concurrent_writes concurrent_counter_writes file_cache_size_in_mb index_summary_capacity_in_mb index_summary_resize_interval_in_minutes trickle_fsync trickle_fsync_interval_in_kb internode_authenticator native_transport_max_threads native_transport_max_concurrent_connections native_transport_max_concurrent_connections_per_ip rpc_server_type rpc_min_threads rpc_max_threads rpc_send_buff_size_in_bytes rpc_recv_buff_size_in_bytes internode_send_buff_size_in_bytes internode_recv_buff_size_in_bytes thrift_framed_transport_size_in_mb concurrent_compactors compaction_throughput_mb_per_sec sstable_preemptive_open_interval_in_mb inter_dc_stream_throughput_outbound_megabits_per_sec cross_node_timeout streaming_socket_timeout_in_ms dynamic_snitch_update_interval_in_ms dynamic_snitch_reset_interval_in_ms dynamic_snitch_badness_threshold request_scheduler request_scheduler_options throttle_limit default_weight weights request_scheduler_id	2019-11-26 15:14:21 +02:00
Amos Kong	817f34d1a9	ami: support new aws instance types: c5d, m5d, m5ad, r5d, z1d (#5330 ) Currently scylla_io_setup will skip in scylla_setup, because we didn't support those new instance types. I manually executed scylla_io_setup, and the scylla-server started and worked well. Let's apply this patch first, then check if there is some new problem in ami-test. Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-26 15:14:21 +02:00
Konstantin Osipov	90346236ac	cql: propagate const property through prepared statement tree. cql_statement is a class representing a prepared statement in Scylla. It is used concurrently during execution, so it is important that its change is not changed by execution. Add const qualifier to the execution methods family, throghout the cql hierarchy. Mark a few places which do mutate prepared statement state during execution as mutable. While these are not affecting production today, as code ages, they may become a source of latent bugs and should be moved out of the prepared state or evaluated at prepare eventually: cf_property_defs::_compaction_strategy_class list_permissions_statement::_resource permission_altering_statement::_resource property_definitions::_properties select_statement::_opts	2019-11-26 14:18:17 +03:00
Pavel Solodovnikov	2f442f28af	treewide: add const qualifiers throughout the code base	2019-11-26 02:24:49 +03:00
Pavel Emelyanov	50a1ededde	main: Remove now unused defer-with-log helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	a0f92d40ee	main: Shut down sighup handler with verbose helper And (!) fix the misprinted variable name. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	0719369d83	repair: Remove extra logging on shutdown The shutdown start/finish messages are already printed in verbose_shutdown() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	2d64fc3a3e	main: Shut down database with verbose_shutdown helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	636c300db5	main: Shut down prometheus with verbose_shutdown() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> --- v2: - Have stop easrlier so that exception in start/listen do not prevent prometheu.stop from calling	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	804b152527	main: Sanitize shutting down callbacks As suggested in issue #4586 here is the helper that prints "shutting down foo" message, then shuts the foo down, then prints the "shutting down foo was successfull". In between it catches the exception (if any) and warns this in logs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:45:49 +03:00
Nadav Har'El	4160b3630d	Merge "Return preimage from CDC only when it's enabled" Merged pull request https://github.com/scylladb/scylla/pull/5218 from Piotr Jastrzębski: Users should be able to decide whether they need preimage or not. There is already an option for that but it's not respected by the implementation. This PR adds support for this functionality. Tests: unit(dev). Individual patches: cdc: Don't take storage_proxy as transformer::pre_image_select param cdc::append_log_mutations: use do_with instead of shared_ptr cdc::append_log_mutations: fix undefined behavior cdc: enable preimage in test_pre_image_logging test cdc: Return preimage only when it's requested cdc: test both enabled and disabled preimage in test_pre_image_logging	2019-11-25 14:32:17 +02:00
Pavel Emelyanov	f6ac969f1e	mm: Stop migration manager Before stopping the db itself, stop the migration service. It must be stopped before RPC, but RPC is not stopped yet itself, so we should be safe here. Here's the tail of the resulting logs: INFO 2019-11-20 11:22:35,193 [shard 0] init - shutdown migration manager INFO 2019-11-20 11:22:35,193 [shard 0] migration_manager - stopping migration service INFO 2019-11-20 11:22:35,193 [shard 1] migration_manager - stopping migration service INFO 2019-11-20 11:22:35,193 [shard 0] init - Shutdown database started INFO 2019-11-20 11:22:35,193 [shard 0] init - Shutdown database finished INFO 2019-11-20 11:22:35,193 [shard 0] init - stopping prometheus API server INFO 2019-11-20 11:22:35,193 [shard 0] init - Scylla version 666.development-0.20191120.25820980f shutdown complete. Also -- stop the mm on drain before the commitlog it stopped. [Tomasz: mm needs the cl because pulling schema changes from other nodes involves applying them into the database. So cl/db needs to be stopped after mm is stopped.] The drain logs would look like ... INFO 2019-11-25 11:00:40,562 [shard 0] migration_manager - stopping migration service INFO 2019-11-25 11:00:40,562 [shard 1] migration_manager - stopping migration service INFO 2019-11-25 11:00:40,563 [shard 0] storage_service - DRAINED: and then on stop ... INFO 2019-11-25 11:00:46,427 [shard 0] init - shutdown migration manager INFO 2019-11-25 11:00:46,427 [shard 0] init - Shutdown database started INFO 2019-11-25 11:00:46,427 [shard 0] init - Shutdown database finished INFO 2019-11-25 11:00:46,427 [shard 0] init - stopping prometheus API server INFO 2019-11-25 11:00:46,427 [shard 0] init - Scylla version 666.development-0.20191125.3eab6cd54 shutdown complete. Fixes #5300 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191125080605.7661-1-xemul@scylladb.com>	2019-11-25 12:59:01 +01:00
Asias He	6ec602ff2c	repair: Fix rx_hashes_nr metrics (#5213 ) In get_full_row_hashes_with_rpc_stream and repair_get_row_diff_with_rpc_stream_process_op which were introduced in the "Repair switch to rpc stream" series, rx_hashes_nr metrics are not updated correctly. In the test we have 3 nodes and run repair on node3, we makes sure the following metrics are correct. assertEqual(node1_metrics['scylla_repair_tx_hashes_nr'] + node2_metrics['scylla_repair_tx_hashes_nr'], node3_metrics['scylla_repair_rx_hashes_nr']) assertEqual(node1_metrics['scylla_repair_rx_hashes_nr'] + node2_metrics['scylla_repair_rx_hashes_nr'], node3_metrics['scylla_repair_tx_hashes_nr']) assertEqual(node1_metrics['scylla_repair_tx_row_nr'] + node2_metrics['scylla_repair_tx_row_nr'], node3_metrics['scylla_repair_rx_row_nr']) assertEqual(node1_metrics['scylla_repair_rx_row_nr'] + node2_metrics['scylla_repair_rx_row_nr'], node3_metrics['scylla_repair_tx_row_nr']) assertEqual(node1_metrics['scylla_repair_tx_row_bytes'] + node2_metrics['scylla_repair_tx_row_bytes'], node3_metrics['scylla_repair_rx_row_bytes']) assertEqual(node1_metrics['scylla_repair_rx_row_bytes'] + node2_metrics['scylla_repair_rx_row_bytes'], node3_metrics['scylla_repair_tx_row_bytes']) Tests: repair_additional_test.py:RepairAdditionalTest.repair_almost_synced_3nodes_test Fixes: #5339 Backports: 3.2	2019-11-25 13:57:37 +02:00
Piotr Jastrzebski	2999cb5576	cdc: test both enabled and disabled preimage in test_pre_image_logging Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	222b94c707	cdc: Return preimage only when it's requested Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	c94a5947b7	cdc: enable preimage in test_pre_image_logging test Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	595c9f9d32	cdc::append_log_mutations: fix undefined behavior The code was iterating over a collection that was modified at the same time. Iterators were used for that and collection modification can invalidate all iterators. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	f0f44f9c51	cdc::append_log_mutations: use do_with instead of shared_ptr This will not only safe some allocations but also improve code readability. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Piotr Jastrzebski	b8d9158c21	cdc: Don't take storage_proxy as transformer::pre_image_select param transformer has access to storage_proxy through its _ctx field. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-11-25 12:43:39 +01:00
Nadav Har'El	3eab6cd549	Merged "toolchain: update to Fedora 31" Merged pull request https://github.com/scylladb/scylla/pull/5310 from Avi Kivity: This is a minor update as gcc and boost versions did not change. A noteable update is patchelf 0.10, which adds support to large binaries. A few minor issues exposed by the update are fixed in preparatory patches. Patches: dist: rpm: correct systemd post-uninstall scriptlet build: force xz compression on rpm binary payload tools: toolchain: update to Fedora 31	2019-11-24 13:38:45 +02:00
Tomasz Grabiec	e3d025d014	row_cache: Fix abort on bad_alloc during cache update Since `90d6c0b`, cache will abort when trying to detach partition entries while they're updated. This should never happen. It can happen though, when the update fails on bad_alloc, because the cleanup guard invalidates the cache before it releases partition snapshots (held by "update" coroutine). Fix by destroying the coroutine first. Fixes #5327. Tests: - row_cache_test (dev) Message-Id: <1574360259-10132-1-git-send-email-tgrabiec@scylladb.com>	2019-11-24 12:06:51 +02:00
Rafael Ávila de Espíndola	8599f8205b	rpmbuild: don't use dwz By default rpm uses dwz to merge the debug info from various binaries. Unfortunately, it looks like addr2line has not been updated to handle this: // This works $ addr2line -e build/release/scylla 0x1234567 $ dwz -m build/release/common.debug build/release/scylla.debug build/release/iotune.debug // now this fails $ addr2line -e build/release/scylla 0x1234567 I think the issue is https://sourceware.org/bugzilla/show_bug.cgi?id=23652 Fixes #5289 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191123015734.89331-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	25d5d39b3c	reloc: Force using sha1 for build-ids The default build-id used by lld is xxhash, which is 8 bytes long. rpm requires build-ids to be at least 16 bytes long (https://github.com/rpm-software-management/rpm/issues/950). We force using sha1 for now. That has no impact in gold and bfd since that is their default. We set it in here instead of configure.py to not slow down regular builds. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191123020801.89750-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	b5667b9c31	build: don't compress debug info in executables By default we were compressing debug info only in release executables. The idea, if I understand it correctly, is that those are the ones we ship, so we want a more compact binary. I don't think that was doing anything useful. The compression is just gzip, so when we ship a .tar.xz, having the debug info compressed inside the scylla binary probably reduces the overall compression a bit. When building a rpm the situation in amusing. As part of the rpm build process the debug info is decompressed and extracted to an external file. Given that most of the link time goes to compressing debug info, it is probably a good idea to just skip that. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191123022825.102837-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Tomasz Grabiec	d84859475e	Merge "Refactor test.py and cleanup resources" from Kostja Structure the code to be able to introduce futures. Apply trivial cleanups. Switch to asyncio and use it to work with processes and handle signals. Cleanup all processes upon signal.	2019-11-24 11:35:29 +02:00
Tomasz Grabiec	e166fdfa26	Merge "Optimize LWT query phase" from Vladimir Davydov This patch implements a simple optimization for LWT: it makes PAXOS prepare phase query locally and return the current value of the modified key so that a separate query is not necessary. For more details see patch 6. Patch 1 fixes a bug in next. Patches 2-5 contain trivial preparatory refactoring.	2019-11-24 11:35:29 +02:00
Pavel Solodovnikov	4879db70a6	system_keyspace: support timeouts in queries to `system.paxos` table. Also introduce supplementary `execute_cql_with_timeout` function. Remove redundant comment for `execute_cql`. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191121214148.57921-1-pa.solodovnikov@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	bf5f864d80	paxos: piggyback result query on prepare response Current LWT implementation uses at least three network round trips: - first, execute PAXOS prepare phase - second, query the current value of the updated key - third, propose the change to participating replicas (there's also learn phase, but we don't wait for it to complete). The idea behind the optimization implemented by this patch is simple: piggyback the current value of the updated key on the prepare response to eliminate one round trip. To generate less network traffic, only the closest to the coordinator replica sends data while other participating replicas send digests which are used to check data consistency. Note, this patch changes the API of some RPC calls used by PAXOS, but this should be okay as long as the feature in the early development stage and marked experimental. To assess the impact of this optimization on LWT performance, I ran a simple benchmark that starts a number of concurrent clients each of which updates its own key (uncontended case) stored in a cluster of three AWS i3.2xlarge nodes located in the same region (us-west-1) and measures the aggregate bandwidth and latency. The test uses shard-aware gocql driver. Here are the results: latency 99% (ms) bandwidth (rq/s) timeouts (rq/s) clients before after before after before after 1 2 2 626 637 0 0 5 4 3 2616 2843 0 0 10 3 3 4493 4767 0 0 50 7 7 10567 10833 0 0 100 15 15 12265 12934 0 0 200 48 30 13593 14317 0 0 400 185 60 14796 15549 0 0 600 290 94 14416 15669 0 0 800 568 118 14077 15820 2 0 1000 710 118 13088 15830 9 0 2000 1388 232 13342 15658 85 0 3000 1110 363 13282 15422 233 0 4000 1735 454 13387 15385 329 0 That is, this optimization improves max LWT bandwidth by about 15% and allows to run 3-4x more clients while maintaining the same level of system responsiveness.	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	6160b9017d	commitlog: make sure a file is closed If allocate or truncate throws, we have to close the file. Fixes #4877 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191114174810.49004-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	3d1d4b018f	paxos: remove unnecessary move constructor invocations invoke_on() guarantees that captures object won't be destroyed until the future returned by the invoked function is resolved so there's no need to move key, token, proposal for calling paxos_state::*_impl helpers.	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	cfb079b2c9	types: Refactor duplicated value_cast implementation The two implementations of value_cast were almost identical. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-3-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	ef2e96c47c	storage_proxy: factor out helper to sort endpoints by proximity We need it for PAXOS.	2019-11-24 11:35:29 +02:00
Nadav Har'El	854e6c8d7b	alternator-test: test_health_only_works_for_root_path: remove wrong check The test_health_only_works_for_root_path test checks that while Alternator's HTTP server responds to a "GET /" request with success ("health check"), it should respond to different URLs with failures (page not found). One of the URLs it tested was "/..", but unfortunately some versions of Python's HTTP client canonize this request to just a "/", causing the request to unexpectedly succeed - and the test to fail. So this patch just drops the "/.." check. A few other nonsense URLs are attempted by the test - e.g., "/abc". Fixes #5321 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	63d4590336	storage_proxy: move digest_algorithm upper We need it for PAXOS. Mark it as static inline while we are at it.	2019-11-24 11:35:29 +02:00
Nadav Har'El	43d3e8adaf	alternator: make DescribeTable return table schema One of the fields still missing in DescribeTable's response (Refs #5026) was the table's schema - KeySchema and AttributeDefinitions. This patch adds this missing feature, and enables the previously-xfailing test test_describe_table_schema. A complication of this patch is that in a table with secondary indexes, we need to return not just the base table's schema, but also the indexes' schema. The existing tests did not cover that feature, so we add here two more tests in test_gsi.py for that. One of these secondary-index schema tests, test_gsi_2_describe_table_schema, still fails, because it outputs a range-key which Scylla added to a view because of its own implementation needs, but wasn't in the user's definition of the GSI. I opened a separate issue #5320 for that. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	f5c2a23118	serializer: add reference_wrapper handling Serialize reference_wrapper<T> as T and make sure is_equivalent<> treats reference_wrapper<T> wrapped in std::optional<> or std::variant<>, or std::tuple<> as T. We need it to avoid copying query::result while serializing paxos::promise.	2019-11-24 11:35:29 +02:00
Botond Dénes	89f9b89a89	scylla-gdb.py: scylla task_histogram: scan all tasks with -a or -s 0 Currently even if `-a` or `-s 0` is provided, `scylla task_histogram` will scan a limited amount of pages due to a bug in the scan loop's stop condition, which will be trigger a stop once the default sample limit is reached. Fix the loop by skipping this check when the user wants to scan all tasks. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191121141706.29476-1-bdenes@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	1452653fbc	query_context: fix use after free of timeout_config in execute_cql_with_timeout timeout_config is used by reference by cql3::query_processor::process(), see cql3::query_options, so the caller must make sure it doesn't go away.	2019-11-24 11:35:29 +02:00
Avi Kivity	ff7e78330c	tools: toolchain: dbuild: work around "podman logs --follow" hang At least some versions of 'podman logs --follow' hang when the container eventually exits (also happens with docker on recent versions). Fortunately, we don't need to use 'podman logs --follow' and can use the more natural non-detached 'podman run', because podman does not proxy SIGTERM and instead shuts down the container when it receives it. So, to work around the problem, use the same code path in interactive and non-interactive runs, when podman is in use instead of docker.	2019-11-22 13:59:05 +02:00
Avi Kivity	702834d0e4	tools: dbuild: avoid uid/gid/selinux hacks when using podman With docker, we went to considerable lengths to ensure that access to mounted volume was done using the calling user, including supplementary groups. This avoids root-owned files being left around after a build, and ensures that access to group-shared files (like /var/cache/ccache) works as expected. All of this is unnecessary and broken when using podman. Podman uses a proxy to access files on behalf of the container, so naturally all access is done using the calling user's identity. Since it remaps user and group IDs, assigning the host uid/gid is meaningless. Using --userns host also breaks, because sudo no longer works. Fix this by making all the uid/gid/selinux games specific to docker and ignore them when using podman. To preserve the functionality of tools that depend on $HOME, set that according to the host setting.	2019-11-22 13:58:29 +02:00
Tomasz Grabiec	9d7f8f18ab	database: Avoid OOMing with flush continuations after failed memtable flush The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717	2019-11-22 12:08:36 +01:00
Tomasz Grabiec	fb28543116	lsa: Introduce operator bool() to occupancy_stats	2019-11-22 12:08:28 +01:00
Tomasz Grabiec	a69fda819c	lsa: Expose region_impl::evictable_occupancy in the region class	2019-11-22 12:08:10 +01:00
Avi Kivity	1c181c1b85	tools: dbuild: don't mount duplicate volumes podman refuses to start with duplicate volumes, which routinely happen if the toplevel directory is the working directory. Detect this and avoid the duplicate.	2019-11-22 10:13:30 +02:00
Konstantin Osipov	b8b5834cf1	test.py: simplify message output in run_test()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	90a8f79d7e	test.py: use UnitTest class where possible	2019-11-21 23:16:22 +03:00
Konstantin Osipov	8cd8cfc307	test.py: rename harness command line arguments to 'options' UnitTest class uses juggles with the name 'args' quite a bit to construct the command line for a unit test, so let's spread the harness command line arguments from the unit test command line arguments a bit apart by consistently calling the harness command line arguments 'options', and unit test command line arguments 'args'. Rename usage() to parse_cmd_line().	2019-11-21 23:16:22 +03:00
Konstantin Osipov	e5d624d055	test.py: consolidate argument handling in UnitTest constructor Create unique UnitTest objects in find_tests() for each found match, including repeat, to ensure each test has its own unique id. This will also be used to store execution state in the test.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	dd60673cef	test.py: move --collectd to standard args	2019-11-21 23:16:22 +03:00
Konstantin Osipov	fe12f73d7f	test.py: introduce class UnitTest	2019-11-21 23:16:22 +03:00
Konstantin Osipov	bbcdee37f7	test.py: add add_test_list() to find_tests()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	4723afa09c	test.py: add long tests with add_test()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	13f1e2abc6	test.py: store the non-default seastar arguments along with definition	2019-11-21 23:16:22 +03:00
Konstantin Osipov	72ef11eb79	test.py: introduce add_test() to find_tests() To avoid code duplication, and to build upon later.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	b50b24a8a7	test.py: avoid an unnecessary loop in find_tests()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	a5103d0092	test.py: move args.repeat processing to find_tests() It somewhat stands in the way of using asyncio This patch also implements a more comprehensive fix for #5303, since we not only have --repeat, but run some tests in different configurations, in which case xml output is also overwritten.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	0f0a49b811	test.py: introduce print_summary() and write_xunit_report() (One more moving of the code around).	2019-11-21 23:16:22 +03:00
Konstantin Osipov	22166771ef	test.py: rename test_to_run tests_to_run	2019-11-21 23:16:22 +03:00
Konstantin Osipov	1d94d9827e	test.py: introduce run_all_tests()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	29087e1349	test.py: move out run_test() routine (Trivial code refactoring.)	2019-11-21 23:16:22 +03:00
Konstantin Osipov	79506fc5ab	test.py: introduce find_tests() Trivial code refactoring.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	a44a1c4124	test.py: remove print_status_succint (Trivial code cleanup.)	2019-11-21 23:16:22 +03:00
Konstantin Osipov	b9605c1d37	test.py: move mode list evaluation to usage()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	0c4df5a548	test.py: add usage()	2019-11-21 23:16:22 +03:00
Pavel Emelyanov	e0f40ed16a	cli: Add the --workdir\|-W option When starting scylla daemon as non-root the initialization fails because standard /var/lib/scylla is not accessible by regular users. Making the default dir accessible for user is not very convenient either, as it will cause conflicts if two or more instances of scylla are in use. This problem can be resolved by specifying --commitlog-directory, --data-file-directories, etc on start, but it's too much typing. I propose to revive Nadav's --home option that allows to move all the directories under the same prefix in one go. Unlike Nadav's approach the --workdir option doesn't do any tricky manipulations with existing directories. Insead, as Pekka suggested, the individual directories are placed under the workir if and only if the respective option is NOT provided. Otherwise the directory configuration is taken as is regardless of whether its absolute or relative path. The values substutution is done early on start. Avi suggested that this is unsafe wrt HUP config re-read and proper paths must be resolved on the fly, but this patch doesn't address that yet, here's why. First of all, the respective options are MustRestart now and the substitution is done before HUP handler is installed. Next, commitlog and data_file values are copied on start, so marking the options as LiveUpdate won't make any effect. Finally, the existing named_value::operator() returns a reference, so returning a calculated (and thus temporary) value is not possible (from my current understanding, correct me if I'm wrong). Thus if we want the _directory() to return calculated value all callers of them must be patched to call something different (e.g. _directory.get() ?) which will lead to more confusion and errors. Changes v3: - the option is --workdir back again - the existing *directory are only affected if unset - default config doesn't have any of these set - added the short -W alias Changes v2: - the option is --home now - all other paths are changed to be relative Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191119130059.18066-1-xemul@scylladb.com>	2019-11-21 15:07:39 +02:00
Rafael Ávila de Espíndola	5417c5356b	types: Move get_castas_fctn to cql3 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-9-espindola@scylladb.com>	2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola	f06d6df4df	types: Simplify casts to string These now just use the to_string member functions, which makes it possible to move the code to another file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-8-espindola@scylladb.com>	2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola	786b1ec364	types: Move json code to its own file Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-7-espindola@scylladb.com>	2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola	af8e207491	types: Avoid using deserialize_value in json code This makes it independent of internal functions and makes it possible to move it to another file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-6-espindola@scylladb.com>	2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola	ed65e2c848	types: Move cql3_kind to the cql3 directory Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-5-espindola@scylladb.com>	2019-11-21 12:08:47 +02:00
Rafael Ávila de Espíndola	bd560e5520	types: Fix dynamic types of some data_value objects I found these mismatched types while converting some member functions to standalone functions, since they have to use the public API that has more type checks. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-4-espindola@scylladb.com>	2019-11-21 12:08:46 +02:00
Rafael Ávila de Espíndola	0d953d8a35	types: Add a test for value_cast We had no tests on when value_cast throws or when it moves the value. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-2-espindola@scylladb.com>	2019-11-21 12:08:45 +02:00
Konstantin Osipov	002ff51053	lua: make sure the latest master builds on Debian/Ubuntu Use pkg-config to search for Lua dependencies rather than hard-code include and link paths. Avoid using boost internals, not present in earlier versions of boost. Reviewed-by: Rafael Avila de Espindola <espindola@scylladb.com> Message-Id: <20191120170005.49649-1-kostja@scylladb.com>	2019-11-21 07:57:12 +02:00
Pavel Solodovnikov	d910899d61	configure.py: support multi-threaded linking via `gold` Use `-Wl,--threads` flag to enable multi-threaded linking when using `ld.gold` linker. Additional compilation test is required because it depends on whether or not the `gold` linker has been compiled with `--enable-threads` option. This patch introduces a substantial improvement to the link times of `scylla` binary in release and debug modes (around 30 percent). Local setup reports the following numbers with release build for linking only build/release/scylla: Single-threaded mode: Elapsed (wall clock) time (h:mm:ss or m:ss): 1:09.30 Multi-threaded mode: Elapsed (wall clock) time (h:mm:ss or m:ss): 0:51.57 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191120163922.21462-1-pa.solodovnikov@scylladb.com>	2019-11-20 19:28:00 +02:00
Nadav Har'El	89d6d668cb	Merge "Redis API in Scylla" Merged patch series from Peng Jian, adding optionally-enabled Redis API support to Scylla. This feature is experimental, and partial - the extent of this support is detailed in docs/redis/redis.md. Patches: Document: add docs/redis/redis.md redis: Redis API in Scylla Redis API: graft redis module to Scylla redis-test: add test cases for Redis API	2019-11-20 16:59:13 +02:00
Piotr Sarna	086e744f8f	scripts/find-maintainer: refresh maintainers list This commit attempts to make the maintainers list up-to-date to the best of my knowledge, because it got really stale over the time. Message-Id: <eab6d3f481712907eb83e91ed2b8dbfa0872155f.1574261533.git.sarna@scylladb.com>	2019-11-20 16:56:31 +02:00
Glauber Costa	73aff1fc95	api: export system uptime via REST This will be useful for tools like nodetool that want to query the uptime of the system. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190619110850.14206-1-glauber@scylladb.com>	2019-11-20 16:44:11 +02:00
Tomasz Grabiec	9a686ac551	Merge "scylla-gdb: active sstables: support k_l/mc sstable readers" from Benny Fixes #5277	2019-11-19 23:49:39 +01:00
Avi Kivity	1164ff5329	tools: toolchain: update to Fedora 31 This is a minor update as gcc and boost versions do not change. glibc-langpack-en no longer gets pulled in by default. As it is required by some locale use somewhere, it is added to the explicit dependencies.	2019-11-20 00:08:30 +02:00
Avi Kivity	301c835cbf	build: force xz compression on rpm binary payload Fedora 31 switched the default compression to zstd, which isn't readable by some older rpm distributions (CentOS 7 in particular). Tell it to use the older xz compression instead, so packages produced on Fedora 31 can be installed on older distributions.	2019-11-20 00:08:24 +02:00
Avi Kivity	3ebd68ef8a	dist: rpm: correct systemd post-uninstall scriptlet The post-uninstall scriptlet requires a parameter, but older versions of rpm survived without it. Fedora 31's rpm is more strict, so supply this parameter.	2019-11-20 00:03:49 +02:00
Peng Jian	e6adddd8ef	redis-test: add test cases for Redis API Signed-off-by: Peng Jian <pengjian.uestc@gmail.com> Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-20 04:56:16 +08:00
Peng Jian	f2801feb66	Redis API: graft redis module to Scylla In this document, the detailed design and implementation of Redis API in Scylla is provided. v2: build: work around ragel 7 generated code bug (suggested by Avi) Ragel 7 incorrectly emits some unused variables that don't compile. As a workaround, sed them away. Signed-off-by: Peng Jian <pengjian.uestc@gmail.com> Signed-off-by: Amos Kong <amos@scylladb.com>	2019-11-20 04:55:58 +08:00
Peng Jian	0737d9e84d	redis: Redis API in Scylla Scylla has advantage and amazing features. If Redis build on the top of Scylla, it has the above features automatically. It's achived great progress in cluster master managment, data persistence, failover and replication. The benefits to the users are easy to use and develop in their production environment, and taking avantages of Scylla. Using the Ragel to parse the Redis request, server abtains the command name and the parameters from the request, invokes the Scylla's internal API to read and write the data, then replies to client. Signed-off-by: Peng Jian, <pengjian.uestc@gmail.com>	2019-11-20 04:55:56 +08:00
Peng Jian	708a42c284	Document: add docs/redis/redis.md In this document, the detailed design and implementation of Redis API in Scylla is provided. Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>	2019-11-20 04:46:33 +08:00
Nadav Har'El	9b9609c65b	merge: row_marker: correct row expiry condition Merged patch set by Piotr Dulikowski: This change corrects condition on which a row was considered expired by its TTL. The logic that decides when a row becomes expired was inconsistent with the logic that decides if a single cell is expired. A single cell becomes expired when expiry_timestamp <= now, while a row became expired when expiry_timestamp < now (notice the strict inequality). For rows inserted with TTL, this caused non-key cells to expire (change their values to null) one second before the row disappeared. Now, row expiry logic uses non-strict inequality. Fixes #4263, Fixes #5290. Tests: unit(dev) python test described in issue #5290	2019-11-19 18:14:15 +02:00
Amnon Heiman	9df10e2d4b	scylla_util.py: Add optional timeout to out function It is useful to have an option to limit the execution time of a shell script. This patch adds an optional timeout parameter, if a parameter will be provided a command will return and failure if the duration is passed. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-11-19 17:30:28 +02:00
Nadav Har'El	b38c3f1288	Merge "Add separate counters for accesses to system tables" Merged patch series from Juliusz Stasiewicz: Welcome to my first PR to Scylla! The task was intended as a warm-up ("noob") exercise; its description is here: #4182 Sorry, I also couldn't help it and did some scouting: edited descriptions of some metrics and shortened few annoyingly long LoC.	2019-11-19 15:21:56 +02:00
Piotr Dulikowski	9be842d3d8	row_marker: tests for row expiration	2019-11-19 13:45:30 +01:00
Tomasz Grabiec	5e4abd75cc	main: Abort on EBADF and ENOTSOCK by default Those are typically symptoms of use-after-free or memory corruption in the program. It's better to catch such error sooner than later. That situation is also dangerous since if a valid descriptor would land under the invalid access, not the one which was intended for the operation, then the operation may be performed on the wrong file and result in corruption. Message-Id: <1565206788-31254-1-git-send-email-tgrabiec@scylladb.com>	2019-11-19 13:07:33 +02:00
Piotr Dulikowski	589313a110	row_marker: correct expiration condition This change corrects condition on which a row was considered expired by its TTL. The logic that decides when a row becomes expired was inconsistent with the logic that decides if a single cell is expired. A single cell becomes expired when `expiry_timestamp <= now`, while a row became expired when `expiry_timestamp < now` (notice the strict inequality). For rows inserted with TTL, this caused non-key cells to expire (change their values to null) one second before the row disappeared. Now, row expiry logic uses non-strict inequality. Fixes: #4263, #5290. Tests: - unit(dev) - python test described in issue #5290	2019-11-19 11:46:59 +01:00
Pekka Enberg	505f2c1008	test.py: Append test repeat cycle to output XML filename Currently, we overwrite the same XML output file for each test repeat cycle. This can cause invalid XML to be generated if the XML contents don't match exactly for every iteration. Fix the problem by appending the test repeat cycle in the XML filename as follows: $ ./test.py --repeat 3 --name vint_serialization_test --mode dev --jenkins jenkins_test $ ls -1 *.xml jenkins_test.release.vint_serialization_test.0.boost.xml jenkins_test.release.vint_serialization_test.1.boost.xml jenkins_test.release.vint_serialization_test.2.boost.xml Fixes #5303. Message-Id: <20191119092048.16419-1-penberg@scylladb.com>	2019-11-19 11:30:47 +02:00
Rafael Ávila de Espíndola	750adee6e3	lua: fix build with boost 1.67 and older vs fmt It is not completely clear why the fmt base code fails with boost 1.67, but it is easy to avoid. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191118210540.129603-1-espindola@scylladb.com>	2019-11-19 11:14:00 +02:00
Tomasz Grabiec	ff567649fa	Merge "gossip: Limit number of pending gossip ACK and ACK2 messages" from Asias In a cross-dc large cluster, the receiver node of the gossip SYN message might be slow to send the gossip ACK message. The ack messages can be large if the payload of the application state is big, e.g., CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK message can consume unlimited amount of memory which causes OOM eventually. To fix, this patch queues the SYN message and handles it later if the previous ACK message is still being sent. However, we only store the latest SYN message. Since the latest SYN message from peer has the latest information, so it is safe to drop the previous SYN message and keep the latest one only. After this patch, there can be at most 1 pending SYN message and 1 pending ACK message per peer node.	2019-11-18 10:52:38 +01:00
Benny Halevy	f9e93bba38	sstables: compaction: move cleanup parameter to compaction_descriptor Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191117165806.3234-1-bhalevy@scylladb.com>	2019-11-18 10:52:20 +01:00
Avi Kivity	1fe062aed4	Merge "Add basic UDF support" from Rafael " This patch series adds only UDF support, UDA will be in the next patch series. With this all CQL types are mapped to Lua. Right now we setup a new lua state and copy the values for each argument and return. This will be optimized once profiled. We require --experimental to enable UDF in case there is some change to the table format. " * 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits) Lua: Document the conversions between Lua and CQL Lua: Implement decimal subtraction Lua: Implement decimal addition Lua: Implement support for returning decimal Lua: Implement decimal to string conversion Lua: Implement decimal to floating point conversion Lua: Implement support for decimal arguments Lua: Implement support for returning varint Lua: Implement support for returning duration Lua: Implement support for duration arguments Lua: Implement support for returning inet Lua: Implement support for inet arguments Lua: Implement support for returning time Lua: Implement support for time arguments Lua: Implement support for returning timeuuid Lua: Implement support for returning uuid Lua: Implement support for uuid and timeuuid arguments Lua: Implement support for returning date Lua: Implement support for date arguments Lua: Implement support for returning timestamp ...	2019-11-17 16:38:19 +02:00
Konstantin Osipov	48f3ca0fcb	test.py: use the configured build modes from ninja mode_list Add mode_list rule to ninja build and use it by default when searching for tests in test.py. Now it is no longer necessary to explicitly specify the test mode when invoking test.py. (cherry picked from commit a211ff30c7f2de12166d8f6f10d259207b462d4b)	2019-11-17 13:42:10 +01:00
Nadav Har'El	2fb2eb27a2	sstables: allow non-traditional characters in table name The goal of this patch is to fix issue #5280, a rather serious Alternator bug, where Scylla fails to restart when an Alternator table has secondary indexes (LSI or GSI). Traditionally, Cassandra allows table names to contain only alphanumeric characters and underscores. However, most of our internal implementation doesn't actually have this restriction. So Alternator uses the characters ':' and '!' in the table names to mark global and local secondary indexes, respectively. And this actually works. Or almost... This patch fixes a problem of listing, during boot, the sstables stored for tables with such non-traditional names. The sstable listing code needlessly assumes that the directory name, i.e., the CF names, matches the "\w+" regular expression. When an sstable is found in a directory not matching such regular expression, the boot fails. But there is no real reason to require such a strict regular expression. So this patch relaxes this requirement, and allows Scylla to boot with Alternator's GSI and LSI tables and their names which include the ":" and "!" characters, and in fact any other name allowed as a directory name. Fixes #5280. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191114153811.17386-1-nyh@scylladb.com>	2019-11-17 14:27:47 +02:00
Shlomi Livne	3e873812a4	Document backport queue and procedure (#5282 ) This document adds information about how fixes are tracked to be backported into releases and what is the procedure that is followed to backport those fixes. Signed-off-by: Shlomi Livne <shlomi@scylladb.com>	2019-11-17 01:45:24 -08:00
Benny Halevy	c215ad79a9	scylla-gdb: resolve: add startswith parameter Allow filtering the resolved addresses by a startswith string. The common use case if for resolving vtable ptrs, when resolving the output of `find_vptrs` that may be too long for the host (running gdb) memory size. In this case the number of vtable ptrs is considerably smaller than the total number of objects returned by find_ptrs (e.g. 462 vs. 69625 in a OOM core I examined from scylla --smp=2 --memory=1024M) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-11-17 11:40:54 +02:00
Benny Halevy	2f688dcf08	scylla-gdb.py: find_single_sstable_readers: fix support for sstable_mutation_reader provide template arguments for k_l and m readers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-11-17 11:02:05 +02:00
Kamil Braun	a67e887dea	sstables: fix sstable file I/O CQL tracing when reading multiple files (#5285 ) CQL tracing would only report file I/O involving one sstable, even if multiple sstables were read from during the query. Steps to reproduce: create a table with NullCompactionStrategy insert row, flush memtables insert row, flush memtables restart Scylla tracing on select * from table The trace would only report DMA reads from one of the two sstables. Kudos to @denesb for catching this. Related issue: #4908	2019-11-17 00:38:37 -08:00
Tomasz Grabiec	a384d0af76	Merge "A set of cleanups over main() code" from Pavel E. There are ... signs of massive start/stop code rework in the main() function. While fixing the sub-modules interdependencies during start/stop I've polished these signs too, so here's the simplest ones.	2019-11-15 15:25:18 +01:00
Pavel Emelyanov	1dc490c81c	tracing: Move register_tracing_keyspace_backend forward decl into proper header Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	7e81df71ba	main: Shorten developer_mode() evaluation Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	1bd68d87fc	main: Do not carry pctx all over the code v2: - do not use struct initialization extention Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	655b6d0d1e	main: Hide start_thrift Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	26f2b2ce5e	main,db: Kill some unused .hh includes Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	f5b345604f	main: Factor out get_conf_sub Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Pavel Emelyanov	924d52573d	main: Remove unused return_value variable (and capture) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-14 19:59:03 +03:00
Juliusz Stasiewicz	1cfa458409	metrics: separate counters for `system' KS accesses Resolves #4182. Metrics per system tables are accumulated separately, depending on the origin of query (DB internals vs clients).	2019-11-14 13:14:39 +01:00
Juliusz Stasiewicz	b1e4d222ed	cql3: cosmetics - improved description of metrics	2019-11-14 10:35:42 +01:00
Rafael Ávila de Espíndola	10bcbaf348	Lua: Document the conversions between Lua and CQL Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	6ffddeae5e	Lua: Implement decimal subtraction Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	aba8e531d1	Lua: Implement decimal addition Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	bb84eabbb3	Lua: Implement support for returning decimal Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	bc17312a86	Lua: Implement decimal to string conversion Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	e83d5bf375	Lua: Implement decimal to floating point conversion Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	b568bf4f54	Lua: Implement support for decimal arguments This is just the minimum to pass a value to Lua. Right now you can't actually do anything with it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	6c3f050eb4	Lua: Implement support for returning varint Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	dc377abd68	Lua: Implement support for returning duration Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	c3f021d2e4	Lua: Implement support for duration arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	9208b2f498	Lua: Implement support for returning inet Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	64be94ab01	Lua: Implement support for inet arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	faf029d472	Lua: Implement support for returning time Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	772f2a4982	Lua: Implement support for time arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	484f498534	Lua: Implement support for returning timeuuid Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	9c2daf6554	Lua: Implement support for returning uuid Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ae1a1a4085	Lua: Implement support for uuid and timeuuid arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	f8aeed5beb	Lua: Implement support for returning date Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	384effa54b	Lua: Implement support for date arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	63bc960152	Lua: Implement support for returning timestamp Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ee95756f62	Lua: Implement support for timestamp arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	1c6d5507b4	Lua: Implement support for returning counter Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	0d9d53b5da	Lua: Implement support for counter arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	74c4e58b6b	Lua: Add a test for nested types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	b226511ce8	Lua: Implement support for returning maps Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	5c8d1a797f	Lua: Implement support for map arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	b5b15ce4e6	Lua: Implement support for returning set Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	cf7ba441e4	Lua: Implement support for set arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	02f076be43	Lua: Implement support for returning udt Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	92c8e94d9a	Lua: Implement support for udt arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	a7c3f6f297	Lua: Implement support for returning list Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	688736f5ff	Lua: Implement support for returning tuple Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ab5708a711	Lua: Implement support for list and tuple arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	534f29172c	Lua: Implement support for returning boolean Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	b03c580493	Lua: Implement support for boolean arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	dcfe397eb6	Lua: Implement support for returning floating point Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	cf4b7ab39a	Lua: Implement support for returning blob Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	3d22433cd4	Lua: Implement support for blob arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	dd754fcf01	Lua: Implement support for returning ascii Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	affb1f8efd	Lua: Implement support for returning text Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	f8ed347ee7	Lua: Implement support for string arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	0e4f047113	Lua: Implement a visitor for return values This adds support for all integer types. Followup commits will implement the missing types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	34b770e2fb	Lua: Push varint as decimal This makes it substantially simpler to support both varint and decimal, which will be implemented in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	9b3cab8865	Lua: Implement support for varint to integer conversion Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	5a40264d97	Lua: Implement support for varint arguments Right now it is not possible to do anything with the value. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	3230b8bd86	Lua: Implement support for floating point arguments Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	9ad2cc2850	Lua: Implement a visitor for arguments With this we support all simple integer types. Followup patches will implement the missing types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ee1d87a600	Lua: Plug in the interpreter This add a wrapper around the lua interpreter so that function executions are interruptible and return futures. With this patch it is possible to write and use simple UDFs that take and return integer values. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	bc3bba1064	Lua: Add lua.cc and lua.hh skeleton files Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	7015e219ca	Lua: Link with liblua Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	61200ebb04	Lua: Add config options This patch just adds the config options that we will expose for the lua runtime. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	d9337152f3	Use threads when executing user functions This adds a requires_thread predicate to functions and propagates that up until we get to code that already returns futures. We can then use the predicate to decide if we need to use seastar::async. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	52b48b415c	Test that schema digests with UDFs don't change This refactors test_schema_digest_does_not_change to also test a schema with user defined functions and user defined aggregates. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	fc72a64c67	Add schema propagation and storage for UDF With this it is possible to create user defined functions and aggregates and they are saved to disk and the schema change is propagated. It is just not possible to call them yet. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Rafael Ávila de Espíndola	ce6304d920	UDF: Add a feature and config option to track if udf is enabled It can only be enabled with --experimental. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:40:47 -08:00
Rafael Ávila de Espíndola	dd17dfcbef	Reject "OR REPLACE ... IF NOT EXISTS" in the grammar The parser now rejects having both OR REPLACE and IF NOT EXISTS in the same statement. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	e7e3dab4aa	Convert UDF parsing code to c++ For now this just constructs the corresponding c++ classes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	5c45f3b573	Update UDF syntax This updates UDF syntax to the current specification. In particular, this removes DETERMINISTIC and adds "CALLED ON NULL INPUT" and "RETURNS NULL ON NULL INPUT". Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	c75cd5989c	transport: Add support for FUNCTION and AGGREGATE to schema_change While at it, modernize the code a bit and add a test. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	dac3cf5059	Clear functions between cql_test_env runs At some point we should make the function list non static, but this allows us to write tests for now. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	de1a970b93	cql: convert functions to add, remove and replace functions Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	33f9d196f9	Add iterator version of functions::find This avoids allocating a std::vector and is more flexible since the iterator can be passed to erase. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	7f9dadee5c	Implement functions::type_equals. Since the types are uniqued we can just use ==. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	5cef5a1b38	types: Add a friend visitor over data_value This is a simple wrapper that allows code that is not in the types hierarchy to visit a data_value. Will be used by UDF. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Rafael Ávila de Espíndola	9bf9a84e4d	types: Move the data_value visitor to a header It will be used by the UDF implementation. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:19:52 -08:00
Asias He	f32ae00510	gossip: Limit number of pending gossip ACK2 messages Similar to "gossip: Limit number of pending gossip ACK messages", limit the number of pending gossip ACK2 messages in gossiper::handle_ack_msg. Fixes #5210	2019-10-25 12:44:28 +08:00
Asias He	15148182ab	gossip: Limit number of pending gossip ACK messages In a cross-dc large cluster, the receiver node of the gossip SYN message might be slow to send the gossip ACK message. The ack messages can be large if the payload of the application state is big, e.g., CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK message can consume unlimited amount of memory which causes OOM eventually. To fix, this patch queues the SYN message and handles it later if the previous ACK message is still being sent. However, we only store the latest SYN message. Since the latest SYN message from peer has the latest information, so it is safe to drop the previous SYN message and keep the latest one only. After this patch, there can be at most 1 pending SYN message and 1 pending ACK message per peer node. Fixes #5210	2019-10-25 12:44:28 +08:00
Benny Halevy	7827e3f11d	tests: test_large_data: do not stop database Now that compaction returns only after the compacted sstables are deleted we no longer need to stop the base to force waiting for deletes (that were previously done asynchronously) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:38 +03:00
Benny Halevy	19b67d82c9	table::on_compaction_completion: fix indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:38 +03:00
Benny Halevy	8dd6e13468	table::on_compaction_completion: wait for background deletes Don't let background deletes accumulate uncontrollably. Fixes #4909 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:38 +03:00
Benny Halevy	da6645dc2c	table: refresh_snapshot before deleting any sstables The row cache must not hold refrences to any sstable we're about to delete. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-02 12:15:29 +03:00

4291 changed files with 144440 additions and 53698 deletions

1

.dockerignore

View File

@@ -1,3 +1,4 @@
 .git
 build
 seastar/build
 testlog

87

.github/CODEOWNERS vendored Normal file

View File

@@ -0,0 +1,87 @@
 # AUTH
 auth/* @elcallio @vladzcloudius
 # CACHE
 row_cache* @tgrabiec @haaawk
 *mutation* @tgrabiec @haaawk
 tests/mvcc* @tgrabiec @haaawk
 # CDC
 cdc/* @haaawk @kbr- @elcallio @piodul @jul-stas
 test/cql/cdc_* @haaawk @kbr- @elcallio @piodul @jul-stas
 test/boost/cdc_* @haaawk @kbr- @elcallio @piodul @jul-stas
 # COMMITLOG / BATCHLOG
 db/commitlog/* @elcallio
 db/batch* @elcallio
 # COORDINATOR
 service/storage_proxy* @gleb-cloudius
 # COMPACTION
 sstables/compaction* @raphaelsc @nyh
 # CQL TRANSPORT LAYER
 transport/* @penberg
 # CQL QUERY LANGUAGE
 cql3/* @tgrabiec @penberg @psarna
 # COUNTERS
 counters* @haaawk @jul-stas
 tests/counter_test* @haaawk @jul-stas
 # GOSSIP
 gms/* @tgrabiec @asias
 # DOCKER
 dist/docker/* @penberg
 # LSA
 utils/logalloc* @tgrabiec
 # MATERIALIZED VIEWS
 db/view/* @nyh @psarna
 cql3/statements/*view* @nyh @psarna
 test/boost/view_* @nyh @psarna
 # PACKAGING
 dist/* @syuu1228
 # REPAIR
 repair/* @tgrabiec @asias @nyh
 # SCHEMA MANAGEMENT
 db/schema_tables* @tgrabiec @nyh
 db/legacy_schema_migrator* @tgrabiec @nyh
 service/migration* @tgrabiec @nyh
 schema* @tgrabiec @nyh
 # SECONDARY INDEXES
 db/index/* @nyh @penberg @psarna
 cql3/statements/*index* @nyh @penberg @psarna
 test/boost/*index* @nyh @penberg @psarna
 # SSTABLES
 sstables/* @tgrabiec @raphaelsc @nyh
 # STREAMING
 streaming/* @tgrabiec @asias
 service/storage_service.* @tgrabiec @asias
 # ALTERNATOR
 alternator/* @nyh @psarna
 test/alternator/* @nyh @psarna
 # HINTED HANDOFF
 db/hints/* @haaawk @piodul @vladzcloudius
 # REDIS
 redis/* @nyh @syuu1228
 redis-test/* @nyh @syuu1228
 # READERS
 reader_* @denesb
 querier* @denesb
 test/boost/mutation_reader_test.cc @denesb
 test/boost/querier_cache_test.cc @denesb

									
										33

.github/workflows/pages.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,33 @@

				name: "CI Docs"

				on:

				  push:

				    branches:

				    - master

				    paths:

				    - 'docs/**'

				jobs:

				  release:

				    name: Build

				    runs-on: ubuntu-latest

				    env:

				      LATEST_VERSION: master

				    steps:

				    - name: Checkout

				      uses: actions/checkout@v2

				      with:

				        persist-credentials: false

				        fetch-depth: 0

				    - name: Set up Python

				      uses: actions/setup-python@v1

				      with:

				        python-version: 3.7

				    - name: Build docs

				      run: |

				        export PATH=$PATH:~/.local/bin

				        cd docs

				        make multiversion

				    - name: Deploy

				      run : ./docs/_utils/deploy.sh

				      env:

				        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

5

.gitignore vendored

View File

@@ -22,3 +22,8 @@ resources
 .pytest_cache
 /expressions.tokens
 tags
 testlog
 test/*/*.reject
 .vscode
 docs/_build
 docs/poetry.lock

18

.gitmodules vendored

View File

@@ -6,12 +6,18 @@
 	path = swagger-ui
 	url = ../scylla-swagger-ui
 	ignore = dirty
 [submodule "xxHash"]
 	path = xxHash
 	url = ../xxHash
 [submodule "libdeflate"]
 	path = libdeflate
 	url = ../libdeflate
 [submodule "zstd"]
 	path = zstd
 	url = ../zstd
 [submodule "abseil"]
 	path = abseil
 	url = ../abseil-cpp
 [submodule "scylla-jmx"]
 	path = tools/jmx
 	url = ../scylla-jmx
 [submodule "scylla-tools"]
 	path = tools/java
 	url = ../scylla-tools-java
 [submodule "scylla-python3"]
 	path = tools/python3
 	url = ../scylla-python3

									
										852

CMakeLists.txt
									
												View File
												
				@@ -1,142 +1,756 @@

				##

				## For best results, first compile the project using the Ninja build-system.

				##

				cmake_minimum_required(VERSION 3.18)

				cmake_minimum_required(VERSION 3.7)

				project(scylla)

				if (NOT DEFINED FOR_IDE AND NOT DEFINED ENV{FOR_IDE} AND NOT DEFINED ENV{CLION_IDE})

				    message(FATAL_ERROR "This CMakeLists.txt file is only valid for use in IDEs, please define FOR_IDE to acknowledge this.")

				if(NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)

				  message(STATUS "Setting build type to 'Release' as none was specified.")

				  set(CMAKE_BUILD_TYPE "Release" CACHE

				      STRING "Choose the type of build." FORCE)

				  # Set the possible values of build type for cmake-gui

				  set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS

				    "Debug" "Release" "Dev" "Sanitize")

				endif()

				# Default value. A more accurate list is populated through `pkg-config` below if `seastar.pc` is available.

				set(SEASTAR_INCLUDE_DIRS "seastar")

				if(CMAKE_BUILD_TYPE)

				    string(TOLOWER "${CMAKE_BUILD_TYPE}" BUILD_TYPE)

				else()

				    set(BUILD_TYPE "release")

				endif()

				# These paths are always available, since they're included in the repository. Additional DPDK headers are placed while

				# Seastar is built, and are captured in `SEASTAR_INCLUDE_DIRS` through parsing the Seastar pkg-config file (below).

				set(SEASTAR_DPDK_INCLUDE_DIRS

				        seastar/dpdk/lib/librte_eal/common/include

				        seastar/dpdk/lib/librte_eal/common/include/generic

				        seastar/dpdk/lib/librte_eal/common/include/x86

				        seastar/dpdk/lib/librte_ether)

				find_package(PkgConfig REQUIRED)

				set(ENV{PKG_CONFIG_PATH} "${CMAKE_SOURCE_DIR}/seastar/build/release:$ENV{PKG_CONFIG_PATH}")

				pkg_check_modules(SEASTAR seastar)

				find_package(Boost COMPONENTS filesystem program_options system thread)

				##

				## Populate the names of all source and header files in the indicated paths in a designated variable.

				##

				## When RECURSIVE is specified, directories are traversed recursively.

				##

				## Use: scan_scylla_source_directories(VAR my_result_var [RECURSIVE] PATHS [path1 path2 ...])

				##

				function (scan_scylla_source_directories)

				    set(options RECURSIVE)

				    set(oneValueArgs VAR)

				    set(multiValueArgs PATHS)

				    cmake_parse_arguments(args "${options}" "${oneValueArgs}" "${multiValueArgs}" "${ARGN}")

				    set(globs "")

				    foreach (dir ${args_PATHS})

				        list(APPEND globs "${dir}/*.cc" "${dir}/*.hh")

				    endforeach()

				    if (args_RECURSIVE)

				        set(glob_kind GLOB_RECURSE)

				function(default_target_arch arch)

				    set(x86_instruction_sets i386 i686 x86_64)

				    if(CMAKE_SYSTEM_PROCESSOR IN_LIST x86_instruction_sets)

				        set(${arch} "westmere" PARENT_SCOPE)

				    elseif(CMAKE_SYSTEM_PROCESSOR EQUAL "aarch64")

				        set(${arch} "armv8-a+crc+crypto" PARENT_SCOPE)

				    else()

				        set(glob_kind GLOB)

				        set(${arch} "" PARENT_SCOPE)

				    endif()

				endfunction()

				default_target_arch(target_arch)

				if(target_arch)

				    set(target_arch_flag "-march=${target_arch}")

				endif()

				    file(${glob_kind} var

				            ${globs})

				# Configure Seastar compile options to align with Scylla

				set(Seastar_CXX_FLAGS -fcoroutines ${target_arch_flag} CACHE INTERNAL "" FORCE)

				set(Seastar_CXX_DIALECT gnu++20 CACHE INTERNAL "" FORCE)

				    set(${args_VAR} ${var} PARENT_SCOPE)

				add_subdirectory(seastar)

				add_subdirectory(abseil)

				# Exclude absl::strerror from the default "all" target since it's not

				# used in Scylla build and, moreover, makes use of deprecated glibc APIs,

				# such as sys_nerr, which are not exposed from "stdio.h" since glibc 2.32,

				# which happens to be the case for recent Fedora distribution versions.

				#

				# Need to use the internal "absl_strerror" target name instead of namespaced

				# variant because `set_target_properties` does not understand the latter form,

				# unfortunately.

				set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)

				# System libraries dependencies

				find_package(Boost COMPONENTS filesystem program_options system thread regex REQUIRED)

				find_package(Lua REQUIRED)

				find_package(ZLIB REQUIRED)

				find_package(ICU COMPONENTS uc REQUIRED)

				set(scylla_build_dir "${CMAKE_BINARY_DIR}/build/${BUILD_TYPE}")

				set(scylla_gen_build_dir "${scylla_build_dir}/gen")

				file(MAKE_DIRECTORY "${scylla_build_dir}" "${scylla_gen_build_dir}")

				# Place libraries, executables and archives in ${buildroot}/build/${mode}/

				foreach(mode RUNTIME LIBRARY ARCHIVE)

				    set(CMAKE_${mode}_OUTPUT_DIRECTORY "${scylla_build_dir}")

				endforeach()

				# Generate C++ source files from thrift definitions

				function(scylla_generate_thrift)

				    set(one_value_args TARGET VAR IN_FILE OUT_DIR SERVICE)

				    cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})

				    get_filename_component(in_file_name ${args_IN_FILE} NAME_WE)

				    set(aux_out_file_name ${args_OUT_DIR}/${in_file_name})

				    set(outputs

				        ${aux_out_file_name}_types.cpp

				        ${aux_out_file_name}_types.h

				        ${aux_out_file_name}_constants.cpp

				        ${aux_out_file_name}_constants.h

				        ${args_OUT_DIR}/${args_SERVICE}.cpp

				        ${args_OUT_DIR}/${args_SERVICE}.h)

				    add_custom_command(

				        DEPENDS

				            ${args_IN_FILE}

				            thrift

				        OUTPUT ${outputs}

				        COMMAND ${CMAKE_COMMAND} -E make_directory ${args_OUT_DIR}

				        COMMAND thrift -gen cpp:cob_style,no_skeleton -out "${args_OUT_DIR}" "${args_IN_FILE}")

				    add_custom_target(${args_TARGET}

				        DEPENDS ${outputs})

				    set(${args_VAR} ${outputs} PARENT_SCOPE)

				endfunction()

				## Although Seastar is an external project, it is common enough to explore the sources while doing

				## Scylla development that we'll treat the Seastar sources as part of this project for easier navigation.

				scan_scylla_source_directories(

				        VAR SEASTAR_SOURCE_FILES

				        RECURSIVE

				scylla_generate_thrift(

				    TARGET scylla_thrift_gen_cassandra

				    VAR scylla_thrift_gen_cassandra_files

				    IN_FILE interface/cassandra.thrift

				    OUT_DIR ${scylla_gen_build_dir}

				    SERVICE Cassandra)

				        PATHS

				          seastar/core

				          seastar/http

				          seastar/json

				          seastar/net

				          seastar/rpc

				          seastar/tests

				          seastar/util)

				# Parse antlr3 grammar files and generate C++ sources

				function(scylla_generate_antlr3)

				    set(one_value_args TARGET VAR IN_FILE OUT_DIR)

				    cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})

				scan_scylla_source_directories(

				        VAR SCYLLA_ROOT_SOURCE_FILES

				        PATHS .)

				    get_filename_component(in_file_pure_name ${args_IN_FILE} NAME)

				    get_filename_component(stem ${in_file_pure_name} NAME_WE)

				scan_scylla_source_directories(

				        VAR SCYLLA_SUB_SOURCE_FILES

				        RECURSIVE

				    set(outputs

				        "${args_OUT_DIR}/${stem}Lexer.hpp"

				        "${args_OUT_DIR}/${stem}Lexer.cpp"

				        "${args_OUT_DIR}/${stem}Parser.hpp"

				        "${args_OUT_DIR}/${stem}Parser.cpp")

				        PATHS

				          api

				          auth

				          cql3

				          db

				          dht

				          exceptions

				          gms

				          index

				          io

				          locator

				          message

				          repair

				          service

				          sstables

				          streaming

				          tests

				          thrift

				          tracing

				          transport

				          utils)

				    add_custom_command(

				        DEPENDS

				            ${args_IN_FILE}

				        OUTPUT ${outputs}

				        # Remove #ifdef'ed code from the grammar source code

				        COMMAND sed -e "/^#if 0/,/^#endif/d" "${args_IN_FILE}" > "${args_OUT_DIR}/${in_file_pure_name}"

				        COMMAND antlr3 "${args_OUT_DIR}/${in_file_pure_name}"

				        # We replace many local `ExceptionBaseType* ex` variables with a single function-scope one.

				        # Because we add such a variable to every function, and because `ExceptionBaseType` is not a global

				        # name, we also add a global typedef to avoid compilation errors.

				        COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Lexer.hpp"

				        COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Lexer.cpp"

				        COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Parser.hpp"

				        COMMAND sed -i

				            -e "s/^\\( *\\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$/\\1const \\2/"

				            -e "/^.*On :.*$/d"

				            -e "1i using ExceptionBaseType = int;"

				            -e "s/^{/{ ExceptionBaseType\\* ex = nullptr;/; s/ExceptionBaseType\\* ex = new/ex = new/; s/exceptions::syntax_exception e/exceptions::syntax_exception\\& e/"

				            "${args_OUT_DIR}/${stem}Parser.cpp"

				        VERBATIM)

				scan_scylla_source_directories(

				        VAR SCYLLA_GEN_SOURCE_FILES

				        RECURSIVE

				        PATHS build/release/gen)

				    add_custom_target(${args_TARGET}

				        DEPENDS ${outputs})

				set(SCYLLA_SOURCE_FILES

				        ${SCYLLA_ROOT_SOURCE_FILES}

				        ${SCYLLA_GEN_SOURCE_FILES}

				        ${SCYLLA_SUB_SOURCE_FILES})

				    set(${args_VAR} ${outputs} PARENT_SCOPE)

				endfunction()

				set(antlr3_grammar_files

				    cql3/Cql.g

				    alternator/expressions.g)

				set(antlr3_gen_files)

				foreach(f ${antlr3_grammar_files})

				    get_filename_component(grammar_file_name "${f}" NAME_WE)

				    get_filename_component(f_dir "${f}" DIRECTORY)

				    scylla_generate_antlr3(

				        TARGET scylla_antlr3_gen_${grammar_file_name}

				        VAR scylla_antlr3_gen_${grammar_file_name}_files

				        IN_FILE ${f}

				        OUT_DIR ${scylla_gen_build_dir}/${f_dir})

				    list(APPEND antlr3_gen_files "${scylla_antlr3_gen_${grammar_file_name}_files}")

				endforeach()

				# Generate C++ sources from ragel grammar files

				seastar_generate_ragel(

				    TARGET scylla_ragel_gen_protocol_parser

				    VAR scylla_ragel_gen_protocol_parser_file

				    IN_FILE redis/protocol_parser.rl

				    OUT_FILE ${scylla_gen_build_dir}/redis/protocol_parser.hh)

				# Generate C++ sources from Swagger definitions

				set(swagger_files

				    api/api-doc/cache_service.json

				    api/api-doc/collectd.json

				    api/api-doc/column_family.json

				    api/api-doc/commitlog.json

				    api/api-doc/compaction_manager.json

				    api/api-doc/config.json

				    api/api-doc/endpoint_snitch_info.json

				    api/api-doc/error_injection.json

				    api/api-doc/failure_detector.json

				    api/api-doc/gossiper.json

				    api/api-doc/hinted_handoff.json

				    api/api-doc/lsa.json

				    api/api-doc/messaging_service.json

				    api/api-doc/storage_proxy.json

				    api/api-doc/storage_service.json

				    api/api-doc/stream_manager.json

				    api/api-doc/system.json

				    api/api-doc/utils.json)

				set(swagger_gen_files)

				foreach(f ${swagger_files})

				    get_filename_component(fname "${f}" NAME_WE)

				    get_filename_component(dir "${f}" DIRECTORY)

				    seastar_generate_swagger(

				        TARGET scylla_swagger_gen_${fname}

				        VAR scylla_swagger_gen_${fname}_files

				        IN_FILE "${f}"

				        OUT_DIR "${scylla_gen_build_dir}/${dir}")

				    list(APPEND swagger_gen_files "${scylla_swagger_gen_${fname}_files}")

				endforeach()

				# Create C++ bindings for IDL serializers

				function(scylla_generate_idl_serializer)

				    set(one_value_args TARGET VAR IN_FILE OUT_FILE)

				    cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})

				    get_filename_component(out_dir ${args_OUT_FILE} DIRECTORY)

				    set(idl_compiler "${CMAKE_SOURCE_DIR}/idl-compiler.py")

				    find_package(Python3 COMPONENTS Interpreter)

				    add_custom_command(

				        DEPENDS

				            ${args_IN_FILE}

				            ${idl_compiler}

				        OUTPUT ${args_OUT_FILE}

				        COMMAND ${CMAKE_COMMAND} -E make_directory ${out_dir}

				        COMMAND Python3::Interpreter ${idl_compiler} --ns ser -f ${args_IN_FILE} -o ${args_OUT_FILE})

				    add_custom_target(${args_TARGET}

				        DEPENDS ${args_OUT_FILE})

				    set(${args_VAR} ${args_OUT_FILE} PARENT_SCOPE)

				endfunction()

				set(idl_serializers

				    idl/cache_temperature.idl.hh

				    idl/commitlog.idl.hh

				    idl/consistency_level.idl.hh

				    idl/frozen_mutation.idl.hh

				    idl/frozen_schema.idl.hh

				    idl/gossip_digest.idl.hh

				    idl/idl_test.idl.hh

				    idl/keys.idl.hh

				    idl/messaging_service.idl.hh

				    idl/mutation.idl.hh

				    idl/paging_state.idl.hh

				    idl/partition_checksum.idl.hh

				    idl/paxos.idl.hh

				    idl/query.idl.hh

				    idl/range.idl.hh

				    idl/read_command.idl.hh

				    idl/reconcilable_result.idl.hh

				    idl/replay_position.idl.hh

				    idl/result.idl.hh

				    idl/ring_position.idl.hh

				    idl/streaming.idl.hh

				    idl/token.idl.hh

				    idl/tracing.idl.hh

				    idl/truncation_record.idl.hh

				    idl/uuid.idl.hh

				    idl/view.idl.hh)

				set(idl_gen_files)

				foreach(f ${idl_serializers})

				    get_filename_component(idl_name "${f}" NAME)

				    get_filename_component(idl_target "${idl_name}" NAME_WE)

				    get_filename_component(idl_dir "${f}" DIRECTORY)

				    string(REPLACE ".idl.hh" ".dist.hh" idl_out_hdr_name "${idl_name}")

				    scylla_generate_idl_serializer(

				        TARGET scylla_idl_gen_${idl_target}

				        VAR scylla_idl_gen_${idl_target}_files

				        IN_FILE ${f}

				        OUT_FILE ${scylla_gen_build_dir}/${idl_dir}/${idl_out_hdr_name})

				    list(APPEND idl_gen_files "${scylla_idl_gen_${idl_target}_files}")

				endforeach()

				set(scylla_sources

				    absl-flat_hash_map.cc

				    alternator/auth.cc

				    alternator/base64.cc

				    alternator/conditions.cc

				    alternator/executor.cc

				    alternator/expressions.cc

				    alternator/serialization.cc

				    alternator/server.cc

				    alternator/stats.cc

				    alternator/streams.cc

				    api/api.cc

				    api/cache_service.cc

				    api/collectd.cc

				    api/column_family.cc

				    api/commitlog.cc

				    api/compaction_manager.cc

				    api/config.cc

				    api/endpoint_snitch.cc

				    api/error_injection.cc

				    api/failure_detector.cc

				    api/gossiper.cc

				    api/hinted_handoff.cc

				    api/lsa.cc

				    api/messaging_service.cc

				    api/storage_proxy.cc

				    api/storage_service.cc

				    api/stream_manager.cc

				    api/system.cc

				    atomic_cell.cc

				    auth/allow_all_authenticator.cc

				    auth/allow_all_authorizer.cc

				    auth/authenticated_user.cc

				    auth/authentication_options.cc

				    auth/authenticator.cc

				    auth/common.cc

				    auth/default_authorizer.cc

				    auth/password_authenticator.cc

				    auth/passwords.cc

				    auth/permission.cc

				    auth/permissions_cache.cc

				    auth/resource.cc

				    auth/role_or_anonymous.cc

				    auth/roles-metadata.cc

				    auth/sasl_challenge.cc

				    auth/service.cc

				    auth/standard_role_manager.cc

				    auth/transitional.cc

				    bytes.cc

				    canonical_mutation.cc

				    cdc/cdc_partitioner.cc

				    cdc/generation.cc

				    cdc/log.cc

				    cdc/metadata.cc

				    cdc/split.cc

				    clocks-impl.cc

				    collection_mutation.cc

				    compress.cc

				    connection_notifier.cc

				    converting_mutation_partition_applier.cc

				    counters.cc

				    cql3/abstract_marker.cc

				    cql3/attributes.cc

				    cql3/cf_name.cc

				    cql3/column_condition.cc

				    cql3/column_identifier.cc

				    cql3/column_specification.cc

				    cql3/constants.cc

				    cql3/cql3_type.cc

				    cql3/expr/expression.cc

				    cql3/functions/aggregate_fcts.cc

				    cql3/functions/castas_fcts.cc

				    cql3/functions/error_injection_fcts.cc

				    cql3/functions/functions.cc

				    cql3/functions/user_function.cc

				    cql3/index_name.cc

				    cql3/keyspace_element_name.cc

				    cql3/lists.cc

				    cql3/maps.cc

				    cql3/operation.cc

				    cql3/query_options.cc

				    cql3/query_processor.cc

				    cql3/relation.cc

				    cql3/restrictions/statement_restrictions.cc

				    cql3/result_set.cc

				    cql3/role_name.cc

				    cql3/selection/abstract_function_selector.cc

				    cql3/selection/selectable.cc

				    cql3/selection/selection.cc

				    cql3/selection/selector.cc

				    cql3/selection/selector_factories.cc

				    cql3/selection/simple_selector.cc

				    cql3/sets.cc

				    cql3/single_column_relation.cc

				    cql3/statements/alter_keyspace_statement.cc

				    cql3/statements/alter_table_statement.cc

				    cql3/statements/alter_type_statement.cc

				    cql3/statements/alter_view_statement.cc

				    cql3/statements/authentication_statement.cc

				    cql3/statements/authorization_statement.cc

				    cql3/statements/batch_statement.cc

				    cql3/statements/cas_request.cc

				    cql3/statements/cf_prop_defs.cc

				    cql3/statements/cf_statement.cc

				    cql3/statements/create_function_statement.cc

				    cql3/statements/create_index_statement.cc

				    cql3/statements/create_keyspace_statement.cc

				    cql3/statements/create_table_statement.cc

				    cql3/statements/create_type_statement.cc

				    cql3/statements/create_view_statement.cc

				    cql3/statements/delete_statement.cc

				    cql3/statements/drop_function_statement.cc

				    cql3/statements/drop_index_statement.cc

				    cql3/statements/drop_keyspace_statement.cc

				    cql3/statements/drop_table_statement.cc

				    cql3/statements/drop_type_statement.cc

				    cql3/statements/drop_view_statement.cc

				    cql3/statements/function_statement.cc

				    cql3/statements/grant_statement.cc

				    cql3/statements/index_prop_defs.cc

				    cql3/statements/index_target.cc

				    cql3/statements/ks_prop_defs.cc

				    cql3/statements/list_permissions_statement.cc

				    cql3/statements/list_users_statement.cc

				    cql3/statements/modification_statement.cc

				    cql3/statements/permission_altering_statement.cc

				    cql3/statements/property_definitions.cc

				    cql3/statements/raw/parsed_statement.cc

				    cql3/statements/revoke_statement.cc

				    cql3/statements/role-management-statements.cc

				    cql3/statements/schema_altering_statement.cc

				    cql3/statements/select_statement.cc

				    cql3/statements/truncate_statement.cc

				    cql3/statements/update_statement.cc

				    cql3/statements/use_statement.cc

				    cql3/token_relation.cc

				    cql3/tuples.cc

				    cql3/type_json.cc

				    cql3/untyped_result_set.cc

				    cql3/update_parameters.cc

				    cql3/user_types.cc

				    cql3/ut_name.cc

				    cql3/util.cc

				    cql3/values.cc

				    cql3/variable_specifications.cc

				    data/cell.cc

				    database.cc

				    db/batchlog_manager.cc

				    db/commitlog/commitlog.cc

				    db/commitlog/commitlog_entry.cc

				    db/commitlog/commitlog_replayer.cc

				    db/config.cc

				    db/consistency_level.cc

				    db/cql_type_parser.cc

				    db/data_listeners.cc

				    db/extensions.cc

				    db/heat_load_balance.cc

				    db/hints/manager.cc

				    db/hints/resource_manager.cc

				    db/large_data_handler.cc

				    db/legacy_schema_migrator.cc

				    db/marshal/type_parser.cc

				    db/schema_tables.cc

				    db/size_estimates_virtual_reader.cc

				    db/snapshot-ctl.cc

				    db/sstables-format-selector.cc

				    db/system_distributed_keyspace.cc

				    db/system_keyspace.cc

				    db/view/row_locking.cc

				    db/view/view.cc

				    db/view/view_update_generator.cc

				    dht/boot_strapper.cc

				    dht/i_partitioner.cc

				    dht/murmur3_partitioner.cc

				    dht/range_streamer.cc

				    dht/token.cc

				    distributed_loader.cc

				    duration.cc

				    exceptions/exceptions.cc

				    flat_mutation_reader.cc

				    frozen_mutation.cc

				    frozen_schema.cc

				    gms/application_state.cc

				    gms/endpoint_state.cc

				    gms/failure_detector.cc

				    gms/feature_service.cc

				    gms/gossip_digest_ack.cc

				    gms/gossip_digest_ack2.cc

				    gms/gossip_digest_syn.cc

				    gms/gossiper.cc

				    gms/inet_address.cc

				    gms/version_generator.cc

				    gms/versioned_value.cc

				    hashers.cc

				    index/secondary_index.cc

				    index/secondary_index_manager.cc

				    init.cc

				    keys.cc

				    lister.cc

				    locator/abstract_replication_strategy.cc

				    locator/ec2_multi_region_snitch.cc

				    locator/ec2_snitch.cc

				    locator/everywhere_replication_strategy.cc

				    locator/gce_snitch.cc

				    locator/gossiping_property_file_snitch.cc

				    locator/local_strategy.cc

				    locator/network_topology_strategy.cc

				    locator/production_snitch_base.cc

				    locator/rack_inferring_snitch.cc

				    locator/simple_snitch.cc

				    locator/simple_strategy.cc

				    locator/snitch_base.cc

				    locator/token_metadata.cc

				    lua.cc

				    main.cc

				    memtable.cc

				    message/messaging_service.cc

				    multishard_mutation_query.cc

				    mutation.cc

				    raft/fsm.cc

				    raft/log.cc

				    raft/progress.cc

				    raft/raft.cc

				    raft/server.cc

				    mutation_fragment.cc

				    mutation_partition.cc

				    mutation_partition_serializer.cc

				    mutation_partition_view.cc

				    mutation_query.cc

				    mutation_reader.cc

				    mutation_writer/multishard_writer.cc

				    mutation_writer/shard_based_splitting_writer.cc

				    mutation_writer/timestamp_based_splitting_writer.cc

				    mutation_writer/feed_writers.cc

				    partition_slice_builder.cc

				    partition_version.cc

				    querier.cc

				    query-result-set.cc

				    query.cc

				    range_tombstone.cc

				    range_tombstone_list.cc

				    reader_concurrency_semaphore.cc

				    redis/abstract_command.cc

				    redis/command_factory.cc

				    redis/commands.cc

				    redis/keyspace_utils.cc

				    redis/lolwut.cc

				    redis/mutation_utils.cc

				    redis/options.cc

				    redis/query_processor.cc

				    redis/query_utils.cc

				    redis/server.cc

				    redis/service.cc

				    redis/stats.cc

				    repair/repair.cc

				    repair/row_level.cc

				    row_cache.cc

				    schema.cc

				    schema_mutations.cc

				    schema_registry.cc

				    service/client_state.cc

				    service/migration_manager.cc

				    service/migration_task.cc

				    service/misc_services.cc

				    service/pager/paging_state.cc

				    service/pager/query_pagers.cc

				    service/paxos/paxos_state.cc

				    service/paxos/prepare_response.cc

				    service/paxos/prepare_summary.cc

				    service/paxos/proposal.cc

				    service/priority_manager.cc

				    service/storage_proxy.cc

				    service/storage_service.cc

				    sstables/compaction.cc

				    sstables/compaction_manager.cc

				    sstables/compaction_strategy.cc

				    sstables/compress.cc

				    sstables/integrity_checked_file_impl.cc

				    sstables/kl/writer.cc

				    sstables/leveled_compaction_strategy.cc

				    sstables/m_format_read_helpers.cc

				    sstables/metadata_collector.cc

				    sstables/mp_row_consumer.cc

				    sstables/mx/writer.cc

				    sstables/partition.cc

				    sstables/prepended_input_stream.cc

				    sstables/random_access_reader.cc

				    sstables/size_tiered_compaction_strategy.cc

				    sstables/sstable_directory.cc

				    sstables/sstable_version.cc

				    sstables/sstables.cc

				    sstables/sstables_manager.cc

				    sstables/time_window_compaction_strategy.cc

				    sstables/writer.cc

				    streaming/progress_info.cc

				    streaming/session_info.cc

				    streaming/stream_coordinator.cc

				    streaming/stream_manager.cc

				    streaming/stream_plan.cc

				    streaming/stream_reason.cc

				    streaming/stream_receive_task.cc

				    streaming/stream_request.cc

				    streaming/stream_result_future.cc

				    streaming/stream_session.cc

				    streaming/stream_session_state.cc

				    streaming/stream_summary.cc

				    streaming/stream_task.cc

				    streaming/stream_transfer_task.cc

				    table.cc

				    table_helper.cc

				    thrift/controller.cc

				    thrift/handler.cc

				    thrift/server.cc

				    thrift/thrift_validation.cc

				    timeout_config.cc

				    tracing/trace_keyspace_helper.cc

				    tracing/trace_state.cc

				    tracing/traced_file.cc

				    tracing/tracing.cc

				    tracing/tracing_backend_registry.cc

				    transport/controller.cc

				    transport/cql_protocol_extension.cc

				    transport/event.cc

				    transport/event_notifier.cc

				    transport/messages/result_message.cc

				    transport/server.cc

				    types.cc

				    unimplemented.cc

				    utils/UUID_gen.cc

				    utils/arch/powerpc/crc32-vpmsum/crc32_wrapper.cc

				    utils/array-search.cc

				    utils/ascii.cc

				    utils/big_decimal.cc

				    utils/bloom_calculations.cc

				    utils/bloom_filter.cc

				    utils/buffer_input_stream.cc

				    utils/build_id.cc

				    utils/config_file.cc

				    utils/directories.cc

				    utils/disk-error-handler.cc

				    utils/dynamic_bitset.cc

				    utils/error_injection.cc

				    utils/exceptions.cc

				    utils/file_lock.cc

				    utils/generation-number.cc

				    utils/gz/crc_combine.cc

				    utils/human_readable.cc

				    utils/i_filter.cc

				    utils/large_bitset.cc

				    utils/like_matcher.cc

				    utils/limiting_data_source.cc

				    utils/logalloc.cc

				    utils/managed_bytes.cc

				    utils/multiprecision_int.cc

				    utils/murmur_hash.cc

				    utils/rate_limiter.cc

				    utils/rjson.cc

				    utils/runtime.cc

				    utils/updateable_value.cc

				    utils/utf8.cc

				    utils/uuid.cc

				    validation.cc

				    vint-serialization.cc

				    zstd.cc

				    release.cc)

				set(scylla_gen_sources

				    "${scylla_thrift_gen_cassandra_files}"

				    "${scylla_ragel_gen_protocol_parser_file}"

				    "${swagger_gen_files}"

				    "${idl_gen_files}"

				    "${antlr3_gen_files}")

				add_executable(scylla

				        ${SEASTAR_SOURCE_FILES}

				        ${SCYLLA_SOURCE_FILES})

				    ${scylla_sources}

				    ${scylla_gen_sources})

				# Note that since CLion does not undestand GCC6 concepts, we always disable them (even if users configure otherwise).

				# CLion seems to have trouble with `-U` (macro undefinition), so we do it this way instead.

				list(REMOVE_ITEM SEASTAR_CFLAGS "-DHAVE_GCC6_CONCEPTS")

				target_link_libraries(scylla PRIVATE

				    seastar

				    # Boost dependencies

				    Boost::filesystem

				    Boost::program_options

				    Boost::system

				    Boost::thread

				    Boost::regex

				    Boost::headers

				    # Abseil libs

				    absl::hashtablez_sampler

				    absl::raw_hash_set

				    absl::synchronization

				    absl::graphcycles_internal

				    absl::stacktrace

				    absl::symbolize

				    absl::debugging_internal

				    absl::demangle_internal

				    absl::time

				    absl::time_zone

				    absl::int128

				    absl::city

				    absl::hash

				    absl::malloc_internal

				    absl::spinlock_wait

				    absl::base

				    absl::dynamic_annotations

				    absl::raw_logging_internal

				    absl::exponential_biased

				    absl::throw_delegate

				    # System libs

				    ZLIB::ZLIB

				    ICU::uc

				    systemd

				    zstd

				    snappy

				    ${LUA_LIBRARIES}

				    thrift

				    crypt)

				# If the Seastar pkg-config information is available, append to the default flags.

				#

				# For ease of browsing the source code, we always pretend that DPDK is enabled.

				target_compile_options(scylla PUBLIC

				        -std=gnu++1z

				        -DHAVE_DPDK

				        -DHAVE_HWLOC

				        "${SEASTAR_CFLAGS}")

				target_link_libraries(scylla PRIVATE

				    -Wl,--build-id=sha1 # Force SHA1 build-id generation

				    # TODO: Use lld linker if it's available, otherwise gold, else bfd

				    -fuse-ld=lld)

				# TODO: patch dynamic linker to match configure.py behavior

				# The order matters here: prefer the "static" DPDK directories to any dynamic paths from pkg-config. Some files are only

				# available dynamically, though.

				target_include_directories(scylla PUBLIC

				        .

				        ${SEASTAR_DPDK_INCLUDE_DIRS}

				        ${SEASTAR_INCLUDE_DIRS}

				        ${Boost_INCLUDE_DIRS}

				        xxhash

				        libdeflate

				        build/release/gen)

				target_compile_options(scylla PRIVATE

				    -std=gnu++20

				    -fcoroutines # TODO: Clang does not have this flag, adjust to both variants

				    ${target_arch_flag})

				# Hacks needed to expose internal APIs for xxhash dependencies

				target_compile_definitions(scylla PRIVATE XXH_PRIVATE_API HAVE_LZ4_COMPRESS_DEFAULT)

				target_include_directories(scylla PRIVATE

				    "${CMAKE_CURRENT_SOURCE_DIR}"

				    libdeflate

				    abseil

				    "${scylla_gen_build_dir}")

				###

				### Create crc_combine_table helper executable.

				### Use it to generate crc_combine_table.cc to be used in scylla at build time.

				###

				add_executable(crc_combine_table utils/gz/gen_crc_combine_table.cc)

				target_link_libraries(crc_combine_table PRIVATE seastar)

				target_include_directories(crc_combine_table PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}")

				target_compile_options(crc_combine_table PRIVATE

				    -std=gnu++20

				    -fcoroutines

				    ${target_arch_flag})

				add_dependencies(scylla crc_combine_table)

				# Generate an additional source file at build time that is needed for Scylla compilation

				add_custom_command(OUTPUT "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc"

				    COMMAND $<TARGET_FILE:crc_combine_table> > "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc"

				    DEPENDS crc_combine_table)

				target_sources(scylla PRIVATE "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc")

				###

				### Generate version file and supply appropriate compile definitions for release.cc

				###

				execute_process(COMMAND ${CMAKE_SOURCE_DIR}/SCYLLA-VERSION-GEN RESULT_VARIABLE scylla_version_gen_res)

				if(scylla_version_gen_res)

				    message(SEND_ERROR "Version file generation failed. Return code: ${scylla_version_gen_res}")

				endif()

				file(READ build/SCYLLA-VERSION-FILE scylla_version)

				string(STRIP "${scylla_version}" scylla_version)

				file(READ build/SCYLLA-RELEASE-FILE scylla_release)

				string(STRIP "${scylla_release}" scylla_release)

				get_property(release_cdefs SOURCE "${CMAKE_SOURCE_DIR}/release.cc" PROPERTY COMPILE_DEFINITIONS)

				list(APPEND release_cdefs "SCYLLA_VERSION=\"${scylla_version}\"" "SCYLLA_RELEASE=\"${scylla_release}\"")

				set_source_files_properties("${CMAKE_SOURCE_DIR}/release.cc" PROPERTIES COMPILE_DEFINITIONS "${release_cdefs}")

				###

				### Custom command for building libdeflate. Link the library to scylla.

				###

				set(libdeflate_lib "${scylla_build_dir}/libdeflate/libdeflate.a")

				add_custom_command(OUTPUT "${libdeflate_lib}"

				    COMMAND make -C libdeflate

				        BUILD_DIR=../build/${BUILD_TYPE}/libdeflate/

				        CC=${CMAKE_C_COMPILER}

				        "CFLAGS=${target_arch_flag}"

				        ../build/${BUILD_TYPE}/libdeflate//libdeflate.a) # Two backslashes are important!

				# Hack to force generating custom command to produce libdeflate.a

				add_custom_target(libdeflate DEPENDS "${libdeflate_lib}")

				target_link_libraries(scylla PRIVATE "${libdeflate_lib}")

				# TODO: create cmake/ directory and move utilities (generate functions etc) there

				# TODO: Build tests if BUILD_TESTING=on (using CTest module)

									
										10

CONTRIBUTING.md
									
												View File
												
				@@ -1,11 +1,13 @@

				# Asking questions or requesting help

				# Contributing

				## Asking questions or requesting help

				Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) or the [Slack workspace](http://slack.scylladb.com) for general questions and help.

				# Reporting an issue

				## Reporting an issue

				Please use the [Issue Tracker](https://github.com/scylladb/scylla/issues/) to report issues.  Fill in as much information as you can in the issue template, especially for performance problems.

				# Contributing Code to Scylla

				## Contributing Code to Scylla

				To contribute code to Scylla, you need to sign the [Contributor License Agreement](http://www.scylladb.com/opensource/cla/) and send your changes as [patches](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches) to the [mailing list](https://groups.google.com/forum/#!forum/scylladb-dev). We don't accept pull requests on GitHub.

				To contribute code to Scylla, you need to sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send your changes as [patches](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches) to the [mailing list](https://groups.google.com/forum/#!forum/scylladb-dev). We don't accept pull requests on GitHub.

									
										32

HACKING.md
									
												View File
												
				@@ -18,23 +18,35 @@ $ git submodule update --init --recursive

				### Dependencies

				Scylla depends on the system package manager for its development dependencies.

				Scylla is fairly fussy about its build environment, requiring a very recent

				version of the C++20 compiler and numerous tools and libraries to build.

				Running `./install-dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.

				Run `./install-dependencies.sh` (as root) to use your Linux distributions's

				package manager to install the appropriate packages on your build machine.

				However, this will only work on very recent distributions. For example,

				currently Fedora users must upgrade to Fedora 32 otherwise the C++ compiler

				will be too old, and not support the new C++20 standard that Scylla uses.

				On Ubuntu and Debian based Linux distributions, some packages

				required to build Scylla are missing in the official upstream:

				Alternatively, to avoid having to upgrade your build machine or install

				various packages on it, we provide another option - the **frozen toolchain**.

				This is a script, `./tools/toolchain/dbuild`, that can execute build or run

				commands inside a Docker image that contains exactly the right build tools and

				libraries. The `dbuild` technique is useful for beginners, but is also the way

				in which ScyllaDB produces official releases, so it is highly recommended.

				- libthrift-dev and libthrift

				- antlr3-c++-dev

				To use `dbuild`, you simply prefix any build or run command with it. Building

				and running Scylla becomes as easy as:

				Try running ```sudo ./scripts/scylla_current_repo``` to add Scylla upstream,

				and get the missing packages from it.

				```bash

				$ ./tools/toolchain/dbuild ./configure.py

				$ ./tools/toolchain/dbuild ninja build/release/scylla

				$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1

				```

				### Build system

				**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native

				thread, and up to 3 GB per native thread while linking. GCC >= 8.1.1. is

				thread, and up to 3 GB per native thread while linking. GCC >= 10 is

				required.

				Scylla is built with [Ninja](https://ninja-build.org/), a low-level rule-based system. A Python script, `configure.py`, generates a Ninja file (`build.ninja`) based on configuration options.

				@@ -141,7 +153,7 @@ In v3:

				"Tests: unit ({mode}), dtest ({smp})"

				```

				The usual is "Tests: unit (release)", although running debug tests is encouraged.

				The usual is "Tests: unit (dev)", although running debug tests is encouraged.

				5. When answering review comments, prefer inline quotes as they make it easier to track the conversation across multiple e-mails.

131

MAINTAINERS

View File

@@ -1,131 +0,0 @@
 M: Maintainer with commit access
 R: Reviewer with subsystem expertise
 F: Filename, directory, or pattern for the subsystem
 ---
 AUTH
 M: Paweł Dziepak <pdziepak@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Calle Wilund <calle@scylladb.com>
 R: Vlad Zolotarov <vladz@scylladb.com>
 R: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
 F: auth/*
 CACHE
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Paweł Dziepak <pdziepak@scylladb.com>
 R: Piotr Jastrzebski <piotr@scylladb.com>
 F: row_cache*
 F: *mutation*
 F: tests/mvcc*
 COMMITLOG / BATCHLOGa
 M: Paweł Dziepak <pdziepak@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Calle Wilund <calle@scylladb.com>
 F: db/commitlog/*
 F: db/batch*
 COORDINATOR
 M: Paweł Dziepak <pdziepak@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Gleb Natapov <gleb@scylladb.com>
 F: service/storage_proxy*
 COMPACTION
 R: Raphael S. Carvalho <raphaelsc@scylladb.com>
 R: Glauber Costa <glauber@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
 F: sstables/compaction*
 CQL TRANSPORT LAYER
 M: Pekka Enberg <penberg@scylladb.com>
 F: transport/*
 CQL QUERY LANGUAGE
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Pekka Enberg <penberg@scylladb.com>
 F: cql3/*
 COUNTERS
 M: Paweł Dziepak <pdziepak@scylladb.com>
 F: counters*
 F: tests/counter_test*
 GOSSIP
 M: Duarte Nunes <duarte@scylladb.com>
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 R: Asias He <asias@scylladb.com>
 F: gms/*
 DOCKER
 M: Pekka Enberg <penberg@scylladb.com>
 F: dist/docker/*
 LSA
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Paweł Dziepak <pdziepak@scylladb.com>
 F: utils/logalloc*
 MATERIALIZED VIEWS
 M: Duarte Nunes <duarte@scylladb.com>
 M: Pekka Enberg <penberg@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
 R: Duarte Nunes <duarte@scylladb.com>
 F: db/view/*
 F: cql3/statements/*view*
 PACKAGING
 R: Takuya ASADA <syuu@scylladb.com>
 F: dist/*
 REPAIR
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Asias He <asias@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
 F: repair/*
 SCHEMA MANAGEMENT
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 M: Pekka Enberg <penberg@scylladb.com>
 F: db/schema_tables*
 F: db/legacy_schema_migrator*
 F: service/migration*
 F: schema*
 SECONDARY INDEXES
 M: Pekka Enberg <penberg@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
 R: Pekka Enberg <penberg@scylladb.com>
 F: db/index/*
 F: cql3/statements/*index*
 SSTABLES
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Raphael S. Carvalho <raphaelsc@scylladb.com>
 R: Glauber Costa <glauber@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
 F: sstables/*
 STREAMING
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 R: Asias He <asias@scylladb.com>
 F: streaming/*
 F: service/storage_service.*
 THRIFT TRANSPORT LAYER
 M: Duarte Nunes <duarte@scylladb.com>
 F: thrift/*
 THE REST
 M: Avi Kivity <avi@scylladb.com>
 M: Paweł Dziepak <pdziepak@scylladb.com>
 M: Duarte Nunes <duarte@scylladb.com>
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 F: *

4

NOTICE.txt

View File

@@ -1,5 +1,7 @@
 This project includes code developed by the Apache Software Foundation (http://www.apache.org/),
 especially Apache Cassandra.
 It also includes files from https://github.com/antonblanchard/crc32-vpmsum (author Anton Blanchard <anton@au.ibm.com>, IBM).
 It includes files from https://github.com/antonblanchard/crc32-vpmsum (author Anton Blanchard <anton@au.ibm.com>, IBM).
 These files are located in utils/arch/powerpc/crc32-vpmsum. Their license may be found in licenses/LICENSE-crc32-vpmsum.TXT.
 It includes modified code from https://gitbox.apache.org/repos/asf?p=cassandra-dtest.git (owned by The Apache Software Foundation)

									
										151

README.md
									
												View File
												
				@@ -1,103 +1,110 @@

				# Scylla

				## Quick-start

				[![Slack](https://img.shields.io/badge/slack-scylla-brightgreen.svg?logo=slack)](http://slack.scylladb.com)

				[![Twitter](https://img.shields.io/twitter/follow/ScyllaDB.svg?style=social&label=Follow)](https://twitter.com/intent/follow?screen_name=ScyllaDB)

				To get the build going quickly, Scylla offers a [frozen toolchain](tools/toolchain/README.md)

				which would build and run Scylla using a pre-configured Docker image.

				Using the frozen toolchain will also isolate all of the installed

				dependencies in a Docker container.

				Assuming you have met the toolchain prerequisites, which is running

				Docker in user mode, building and running is as easy as:

				## What is Scylla?

				Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB.

				Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

				For more information, please see the [ScyllaDB web site].

				[ScyllaDB web site]: https://www.scylladb.com

				## Build Prerequisites

				Scylla is fairly fussy about its build environment, requiring very recent

				versions of the C++20 compiler and of many libraries to build. The document

				[HACKING.md](HACKING.md) includes detailed information on building and

				developing Scylla, but to get Scylla building quickly on (almost) any build

				machine, Scylla offers a [frozen toolchain](tools/toolchain/README.md),

				This is a pre-configured Docker image which includes recent versions of all

				the required compilers, libraries and build tools. Using the frozen toolchain

				allows you to avoid changing anything in your build machine to meet Scylla's

				requirements - you just need to meet the frozen toolchain's prerequisites

				(mostly, Docker or Podman being available).

				## Building Scylla

				Building Scylla with the frozen toolchain `dbuild` is as easy as:

				```bash

				$ ./tools/toolchain/dbuild ./configure.py

				$ ./tools/toolchain/dbuild ninja build/release/scylla

				$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1

				 ```

				$ git submodule update --init --force --recursive

				$ ./tools/toolchain/dbuild ./configure.py

				$ ./tools/toolchain/dbuild ninja build/release/scylla

				```

				Please see [HACKING.md](HACKING.md) for detailed information on building and developing Scylla.

				For further information, please see:

				**Note**: GCC >= 8.1.1 is required to compile Scylla.

				* [Developer documentation] for more information on building Scylla.

				* [Build documentation] on how to build Scylla binaries, tests, and packages.

				* [Docker image build documentation] for information on how to build Docker images.

				[developer documentation]: HACKING.md

				[build documentation]: docs/building.md

				[docker image build documentation]: dist/docker/redhat/README.md

				## Running Scylla

				* Run Scylla

				```

				./build/release/scylla

				To start Scylla server, run:

				```bash

				$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

				```

				* run Scylla with one CPU and ./tmp as data directory

				This will start a Scylla node with one CPU core allocated to it and data files stored in the `tmp` directory.

				The `--developer-mode` is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations).

				Please note that you need to run Scylla with `dbuild` if you built it with the frozen toolchain.

				```

				./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1

				For more run options, run:

				```bash

				$ ./tools/toolchain/dbuild ./build/release/scylla --help

				```

				* For more run options:

				```

				./build/release/scylla --help

				```

				## Testing

				See [test.py manual](docs/testing.md).

				## Scylla APIs and compatibility

				By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and

				Thrift. There is also experimental support for the API of Amazon DynamoDB,

				but being experimental it needs to be explicitly enabled to be used. For more

				information on how to enable the experimental DynamoDB compatibility in Scylla,

				and the current limitations of this feature, see

				Thrift. There is also support for the API of Amazon DynamoDB™,

				which needs to be enabled and configured in order to be used. For more

				information on how to enable the DynamoDB™ API in Scylla,

				and the current compatibility of this feature as well as Scylla-specific extensions, see

				[Alternator](docs/alternator/alternator.md) and

				[Getting started with Alternator](docs/alternator/getting-started.md).

				## Documentation

				Documentation can be found in [./docs](./docs) and on the

				[wiki](https://github.com/scylladb/scylla/wiki). There is currently no clear

				definition of what goes where, so when looking for something be sure to check

				both.

				Documentation can be found [here](https://scylla.docs.scylladb.com).

				Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).

				User documentation can be found [here](https://docs.scylladb.com/).

				## Building Fedora RPM

				## Training 

				As a pre-requisite, you need to install [Mock](https://fedoraproject.org/wiki/Mock) on your machine:

				```

				# Install mock:

				sudo yum install mock

				# Add user to the "mock" group:

				usermod -a -G mock $USER && newgrp mock

				```

				Then, to build an RPM, run:

				```

				./dist/redhat/build_rpm.sh

				```

				The built RPM is stored in ``/var/lib/mock/<configuration>/result`` directory.

				For example, on Fedora 21 mock reports the following:

				```

				INFO: Done(scylla-server-0.00-1.fc21.src.rpm) Config(default) 20 minutes 7 seconds

				INFO: Results and/or logs in: /var/lib/mock/fedora-21-x86_64/result

				```

				## Building Fedora-based Docker image

				Build a Docker image with:

				```

				cd dist/docker

				docker build -t <image-name> .

				```

				Run the image with:

				```

				docker run -p $(hostname -i):9042:9042 -i -t <image name>

				```

				Training material and online courses can be found at [Scylla University](https://university.scylladb.com/). 

				The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, 

				administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, 

				multi-datacenters and how Scylla integrates with third-party applications.

				## Contributing to Scylla

				[Hacking howto](HACKING.md)

				[Guidelines for contributing](CONTRIBUTING.md)

				If you want to report a bug or submit a pull request or a patch, please read the [contribution guidelines].

				If you are a developer working on Scylla, please read the [developer guidelines].

				[contribution guidelines]: CONTRIBUTING.md

				[developer guidelines]: HACKING.md

				## Contact

				* The [users mailing list] and [Slack channel] are for users to discuss configuration, management, and operations of the ScyllaDB open source.

				* The [developers mailing list] is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

				[Users mailing list]: https://groups.google.com/forum/#!forum/scylladb-users

				[Slack channel]: http://slack.scylladb.com/

				[Developers mailing list]: https://groups.google.com/forum/#!forum/scylladb-dev

10

SCYLLA-VERSION-GEN

View File

@@ -1,7 +1,7 @@
 #!/bin/sh
 PRODUCT=scylla
 VERSION=3.2.5
 VERSION=4.4.9
 if test -f version
 then
@@ -19,6 +19,14 @@ else
 	SCYLLA_RELEASE=$SCYLLA_BUILD.$DATE.$GIT_COMMIT
 fi
 if [ -f build/SCYLLA-RELEASE-FILE ]; then
 	RELEASE_FILE=$(cat build/SCYLLA-RELEASE-FILE)
 	GIT_COMMIT_FILE=$(cat build/SCYLLA-RELEASE-FILE |cut -d . -f 3)
 	if [ "$GIT_COMMIT" = "$GIT_COMMIT_FILE" ]; then
 		exit 0
 	fi
 fi
 echo "$SCYLLA_VERSION-$SCYLLA_RELEASE"
 mkdir -p build
 echo "$SCYLLA_VERSION" > build/SCYLLA-VERSION-FILE

1

abseil Submodule

Submodule abseil added at 1e3d25b265

									
										26

absl-flat_hash_map.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,26 @@

				/*

				 * Copyright (C) 2020 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "absl-flat_hash_map.hh"

				size_t sstring_hash::operator()(std::string_view v) const noexcept {

				    return absl::Hash<std::string_view>{}(v);

				}

									
										47

absl-flat_hash_map.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,47 @@

				/*

				 * Copyright (C) 2020 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include <absl/container/flat_hash_map.h>

				#include <seastar/core/sstring.hh>

				using namespace seastar;

				struct sstring_hash {

				    using is_transparent = void;

				    size_t operator()(std::string_view v) const noexcept;

				};

				struct sstring_eq {

				    using is_transparent = void;

				    bool operator()(std::string_view a, std::string_view b) const noexcept {

				        return a == b;

				    }

				};

				template <typename K, typename V, typename... Ts>

				struct flat_hash_map : public absl::flat_hash_map<K, V, Ts...> {

				};

				template <typename V>

				struct flat_hash_map<sstring, V>

				    : public absl::flat_hash_map<sstring, V, sstring_hash, sstring_eq> {};

									
										40

alternator-test/test_condition_expression.py
									
												View File
											
				@@ -1,40 +0,0 @@

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for the ConditionExpression parameter

				import pytest

				from botocore.exceptions import ClientError

				from util import random_string

				# Test that ConditionExpression works as expected

				@pytest.mark.xfail(reason="ConditionExpression not yet implemented")

				def test_update_condition_expression(test_table_s):

				    p = random_string()

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = :val1',

				        ExpressionAttributeValues={':val1': 4})

				    test_table_s.update_item(Key={'p': p},

				        UpdateExpression='SET b = :val1',

				        ConditionExpression='b = :oldval',

				        ExpressionAttributeValues={':val1': 6, ':oldval': 4})

				    with pytest.raises(ClientError, match='ConditionalCheckFailedException.*'):

				        test_table_s.update_item(Key={'p': p},

				            UpdateExpression='SET b = :val1',

				            ConditionExpression='b = :oldval',

				            ExpressionAttributeValues={':val1': 8, ':oldval': 4})

				    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 6}

									
										358

alternator-test/test_query.py
									
												View File
											
				@@ -1,358 +0,0 @@

				# -*- coding: utf-8 -*-

				# Copyright 2019 ScyllaDB

				#

				# This file is part of Scylla.

				#

				# Scylla is free software: you can redistribute it and/or modify

				# it under the terms of the GNU Affero General Public License as published by

				# the Free Software Foundation, either version 3 of the License, or

				# (at your option) any later version.

				#

				# Scylla is distributed in the hope that it will be useful,

				# but WITHOUT ANY WARRANTY; without even the implied warranty of

				# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				# GNU General Public License for more details.

				#

				# You should have received a copy of the GNU Affero General Public License

				# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				# Tests for the Query operation

				import random

				import pytest

				from botocore.exceptions import ClientError

				from decimal import Decimal

				from util import random_string, random_bytes, full_query, multiset

				from boto3.dynamodb.conditions import Key, Attr

				# Test that scanning works fine with in-stock paginator

				def test_query_basic_restrictions(dynamodb, filled_test_table):

				    test_table, items = filled_test_table

				    paginator = dynamodb.meta.client.get_paginator('query')

				    # EQ

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long']) == multiset(got_items)

				    # LT

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': ['12'], 'ComparisonOperator': 'LT'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)

				    # LE

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'LE'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'] <= '14']) == multiset(got_items)

				    # GT

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': ['15'], 'ComparisonOperator': 'GT'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'] > '15']) == multiset(got_items)

				    # GE

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'GE'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '14']) == multiset(got_items)

				    # BETWEEN

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': ['155', '164'], 'ComparisonOperator': 'BETWEEN'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '155' and item['c'] <= '164']) == multiset(got_items)

				    # BEGINS_WITH

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': ['11'], 'ComparisonOperator': 'BEGINS_WITH'}

				        }):

				        print([item for item in items if item['p'] == 'long' and item['c'].startswith('11')])

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'].startswith('11')]) == multiset(got_items)

				# Test that KeyConditionExpression parameter is supported

				@pytest.mark.xfail(reason="KeyConditionExpression not supported yet")

				def test_query_key_condition_expression(dynamodb, filled_test_table):

				    test_table, items = filled_test_table

				    paginator = dynamodb.meta.client.get_paginator('query')

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditionExpression=Key("p").eq("long") & Key("c").lt("12")):

				        got_items += page['Items']

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)

				def test_begins_with(dynamodb, test_table):

				    paginator = dynamodb.meta.client.get_paginator('query')

				    items = [{'p': 'unorthodox_chars', 'c': sort_key, 'str': 'a'} for sort_key in [u'ÿÿÿ', u'cÿbÿ', u'cÿbÿÿabg'] ]

				    with test_table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    # TODO(sarna): Once bytes type is supported, /xFF character should be tested

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': [u'ÿÿ'], 'ComparisonOperator': 'BEGINS_WITH'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'ÿÿ')])

				    got_items = []

				    for page in paginator.paginate(TableName=test_table.name, KeyConditions={

				            'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},

				            'c' : {'AttributeValueList': [u'cÿbÿ'], 'ComparisonOperator': 'BEGINS_WITH'}

				        }):

				        got_items += page['Items']

				    print(got_items)

				    assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'cÿbÿ')])

				def test_begins_with_wrong_type(dynamodb, test_table_sn):

				    paginator = dynamodb.meta.client.get_paginator('query')

				    with pytest.raises(ClientError, match='ValidationException'):

				        for page in paginator.paginate(TableName=test_table_sn.name, KeyConditions={

				                'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},

				                'c' : {'AttributeValueList': [17], 'ComparisonOperator': 'BEGINS_WITH'}

				                }):

				            pass

				# Items returned by Query should be sorted by the sort key. The following

				# tests verify that this is indeed the case, for the three allowed key types:

				# strings, binary, and numbers. These tests test not just the Query operation,

				# but inherently that the sort-key sorting works.

				def test_query_sort_order_string(test_table):

				    # Insert a lot of random items in one new partition:

				    # str(i) has a non-obvious sort order (e.g., "100" comes before "2") so is a nice test.

				    p = random_string()

				    items = [{'p': p, 'c': str(i)} for i in range(128)]

				    with test_table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})

				    assert len(items) == len(got_items)

				    # Extract just the sort key ("c") from the items

				    sort_keys = [x['c'] for x in items]

				    got_sort_keys = [x['c'] for x in got_items]

				    # Verify that got_sort_keys are already sorted (in string order)

				    assert sorted(got_sort_keys) == got_sort_keys

				    # Verify that got_sort_keys are a sorted version of the expected sort_keys

				    assert sorted(sort_keys) == got_sort_keys

				def test_query_sort_order_bytes(test_table_sb):

				    # Insert a lot of random items in one new partition:

				    # We arbitrarily use random_bytes with a random length.

				    p = random_string()

				    items = [{'p': p, 'c': random_bytes(10)} for i in range(128)]

				    with test_table_sb.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    got_items = full_query(test_table_sb, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})

				    assert len(items) == len(got_items)

				    sort_keys = [x['c'] for x in items]

				    got_sort_keys = [x['c'] for x in got_items]

				    # Boto3's "Binary" objects are sorted as if bytes are signed integers.

				    # This isn't the order that DynamoDB itself uses (byte 0 should be first,

				    # not byte -128). Sorting the byte array ".value" works.

				    assert sorted(got_sort_keys, key=lambda x: x.value) == got_sort_keys

				    assert sorted(sort_keys) == got_sort_keys

				def test_query_sort_order_number(test_table_sn):

				    # This is a list of numbers, sorted in correct order, and each suitable

				    # for accurate representation by Alternator's number type.

				    numbers = [

				        Decimal("-2e10"),

				        Decimal("-7.1e2"),

				        Decimal("-4.1"),

				        Decimal("-0.1"),

				        Decimal("-1e-5"),

				        Decimal("0"),

				        Decimal("2e-5"),

				        Decimal("0.15"),

				        Decimal("1"),

				        Decimal("1.00000000000000000000000001"),

				        Decimal("3.14159"),

				        Decimal("3.1415926535897932384626433832795028841"),

				        Decimal("31.4"),

				        Decimal("1.4e10"),

				    ]

				    # Insert these numbers, in random order, into one partition:

				    p = random_string()

				    items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]

				    with test_table_sn.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    # Finally, verify that we get back exactly the same numbers (with identical

				    # precision), and in their original sorted order.

				    got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})

				    got_sort_keys = [x['c'] for x in got_items]

				    assert got_sort_keys == numbers

				def test_query_filtering_attributes_equality(filled_test_table):

				    test_table, items = filled_test_table

				    query_filter = {

				        "attribute" : {

				            "AttributeValueList" : [ "xxxx" ],

				            "ComparisonOperator": "EQ"

				        }

				    }

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)

				    query_filter = {

				        "attribute" : {

				            "AttributeValueList" : [ "xxxx" ],

				            "ComparisonOperator": "EQ"

				        },

				        "another" : {

				            "AttributeValueList" : [ "yy" ],

				            "ComparisonOperator": "EQ"

				        }

				    }

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)

				# Test that FilterExpression works as expected

				@pytest.mark.xfail(reason="FilterExpression not supported yet")

				def test_query_filter_expression(filled_test_table):

				    test_table, items = filled_test_table

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx"))

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))

				    print(got_items)

				    assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)

				# QueryFilter can only contain non-key attributes in order to be compatible

				def test_query_filtering_key_equality(filled_test_table):

				    test_table, items = filled_test_table

				    with pytest.raises(ClientError, match='ValidationException'):

				        query_filter = {

				            "c" : {

				                "AttributeValueList" : [ "5" ],

				                "ComparisonOperator": "EQ"

				            }

				        }

				        got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)

				        print(got_items)

				    with pytest.raises(ClientError, match='ValidationException'):

				        query_filter = {

				            "attribute" : {

				                "AttributeValueList" : [ "x" ],

				                "ComparisonOperator": "EQ"

				            },

				            "p" : {

				                "AttributeValueList" : [ "5" ],

				                "ComparisonOperator": "EQ"

				            }

				        }

				        got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)

				        print(got_items)

				# Test Query with the AttributesToGet parameter. Result should include the

				# selected attributes only - if one wants the key attributes as well, one

				# needs to select them explicitly. When no key attributes are selected,

				# some items may have *none* of the selected attributes. Those items are

				# returned too, as empty items - they are not outright missing.

				def test_query_attributes_to_get(dynamodb, test_table):

				    p = random_string()

				    items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]

				    with test_table.batch_writer() as batch:

				        for item in items:

				            batch.put_item(item)

				    for wanted in [ ['a'],             # only non-key attributes

				                    ['c', 'a'],        # a key attribute (sort key) and non-key

				                    ['p', 'c'],        # entire key

				                    ['nonexistent']    # none of the items have this attribute!

				                   ]:

				        got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, AttributesToGet=wanted)

				        expected_items = [{k: x[k] for k in wanted if k in x} for x in items]

				        assert multiset(expected_items) == multiset(got_items)

				# Test that in a table with both hash key and sort key, which keys we can

				# Query by: We can Query by the hash key, by a combination of both hash and

				# sort keys, but *cannot* query by just the sort key, and obviously not

				# by any non-key column.

				def test_query_which_key(test_table):

				    p = random_string()

				    c = random_string()

				    p2 = random_string()

				    c2 = random_string()

				    item1 = {'p': p, 'c': c}

				    item2 = {'p': p, 'c': c2}

				    item3 = {'p': p2, 'c': c}

				    for i in [item1, item2, item3]:

				        test_table.put_item(Item=i)

				    # Query by hash key only:

				    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})

				    expected_items = [item1, item2]

				    assert multiset(expected_items) == multiset(got_items)

				    # Query by hash key *and* sort key (this is basically a GetItem):

				    got_items = full_query(test_table, KeyConditions={

				        'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},

				        'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}

				    })

				    expected_items = [item1]

				    assert multiset(expected_items) == multiset(got_items)

				    # Query by sort key alone is not allowed. DynamoDB reports:

				    # "Query condition missed key schema element: p".

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_query(test_table, KeyConditions={

				            'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}

				        })

				    # Query by a non-key isn't allowed, for the same reason - that the

				    # actual hash key (p) is missing in the query:

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_query(test_table, KeyConditions={

				            'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}

				        })

				    # If we try both p and a non-key we get a complaint that the sort

				    # key is missing: "Query condition missed key schema element: c"

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_query(test_table, KeyConditions={

				            'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},

				            'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}

				        })

				    # If we try p, c and another key, we get an error that

				    # "Conditions can be of length 1 or 2 only".

				    with pytest.raises(ClientError, match='ValidationException'):

				        full_query(test_table, KeyConditions={

				            'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},

				            'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'},

				            'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}

				        })

									
										21

alternator/auth.cc
									
												View File
												
				@@ -66,8 +66,9 @@ static std::string format_time_point(db_clock::time_point tp) {

				    time_t time_point_repr = db_clock::to_time_t(tp);

				    std::string time_point_str;

				    time_point_str.resize(17);

				    ::tm time_buf;

				    // strftime prints the terminating null character as well

				    std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", std::gmtime(&time_point_repr));

				    std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", ::gmtime_r(&time_point_repr, &time_buf));

				    time_point_str.resize(16);

				    return time_point_str;

				}

				@@ -77,12 +78,12 @@ void check_expiry(std::string_view signature_date) {

				    std::string expiration_str = format_time_point(db_clock::now() - 15min);

				    std::string validity_str = format_time_point(db_clock::now() + 15min);

				    if (signature_date < expiration_str) {

				        throw api_error("InvalidSignatureException",

				        throw api_error::invalid_signature(

				                fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",

				                signature_date, expiration_str));

				    }

				    if (signature_date > validity_str) {

				        throw api_error("InvalidSignatureException",

				        throw api_error::invalid_signature(

				                fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",

				                signature_date, validity_str));

				    }

				@@ -93,13 +94,13 @@ std::string get_signature(std::string_view access_key_id, std::string_view secre

				        std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string) {

				    auto amz_date_it = signed_headers_map.find("x-amz-date");

				    if (amz_date_it == signed_headers_map.end()) {

				        throw api_error("InvalidSignatureException", "X-Amz-Date header is mandatory for signature verification");

				        throw api_error::invalid_signature("X-Amz-Date header is mandatory for signature verification");

				    }

				    std::string_view amz_date = amz_date_it->second;

				    check_expiry(amz_date);

				    std::string_view datestamp = amz_date.substr(0, 8);

				    if (datestamp != orig_datestamp) {

				        throw api_error("InvalidSignatureException",

				        throw api_error::invalid_signature(

				                format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",

				                        orig_datestamp, datestamp));

				    }

				@@ -125,19 +126,19 @@ std::string get_signature(std::string_view access_key_id, std::string_view secre

				future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username) {

				    static const sstring query = format("SELECT salted_hash FROM {} WHERE {} = ?",

				            auth::meta::roles_table::qualified_name(), auth::meta::roles_table::role_col_name);

				            auth::meta::roles_table::qualified_name, auth::meta::roles_table::role_col_name);

				    auto cl = auth::password_authenticator::consistency_for_user(username);

				    auto timeout = auth::internal_distributed_timeout_config();

				    return qp.process(query, cl, timeout, {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {

				    auto& timeout = auth::internal_distributed_timeout_config();

				    return qp.execute_internal(query, cl, timeout, {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {

				        auto res = f.get0();

				        auto salted_hash = std::optional<sstring>();

				        if (res->empty()) {

				            throw api_error("UnrecognizedClientException", fmt::format("User not found: {}", username));

				            throw api_error::unrecognized_client(fmt::format("User not found: {}", username));

				        }

				        salted_hash = res->one().get_opt<sstring>("salted_hash");

				        if (!salted_hash) {

				            throw api_error("UnrecognizedClientException", fmt::format("No password found for user: {}", username));

				            throw api_error::unrecognized_client(fmt::format("No password found for user: {}", username));

				        }

				        return make_ready_future<std::string>(*salted_hash);

				    });

									
										42

alternator/base64.cc
									
												View File
												
				@@ -32,13 +32,13 @@

				// and the character used in base64 encoding to represent it.

				static class base64_chars {

				public:

				    static constexpr const char* to =

				    static constexpr const char to[] =

				            "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

				    int8_t from[255];

				    base64_chars() {

				        static_assert(strlen(to) == 64);

				        static_assert(sizeof(to) == 64 + 1);

				        for (int i = 0; i < 255; i++) {

				            from[i] = 255; // signal invalid character

				            from[i] = -1; // signal invalid character

				        }

				        for (int i = 0; i < 64; i++) {

				            from[(unsigned) to[i]] = i;

				@@ -77,7 +77,7 @@ std::string base64_encode(bytes_view in) {

				    return ret;

				}

				bytes base64_decode(std::string_view in) {

				static std::string base64_decode_string(std::string_view in) {

				    int i = 0;

				    int8_t chunk4[4]; // chunk of input, each byte converted to 0..63;

				    std::string ret;

				@@ -104,8 +104,42 @@ bytes base64_decode(std::string_view in) {

				        if (i==3)

				            ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);

				    }

				    return ret;

				}

				bytes base64_decode(std::string_view in) {

				    // FIXME: This copy is sad. The problem is we need back "bytes"

				    // but "bytes" doesn't have efficient append and std::string.

				    // To fix this we need to use bytes' "uninitialized" feature.

				    std::string ret = base64_decode_string(in);

				    return bytes(ret.begin(), ret.end());

				}

				static size_t base64_padding_len(std::string_view str) {

				    size_t padding = 0;

				    padding += (!str.empty() && str.back() == '=');

				    padding += (str.size() > 1 && *(str.end() - 2) == '=');

				    return padding;

				}

				size_t base64_decoded_len(std::string_view str) {

				    return str.size() / 4 * 3 - base64_padding_len(str);

				}

				bool base64_begins_with(std::string_view base, std::string_view operand) {

				    if (base.size() < operand.size() || base.size() % 4 != 0 || operand.size() % 4 != 0) {

				        return false;

				    }

				    if (base64_padding_len(operand) == 0) {

				        return base.starts_with(operand);

				    }

				    const std::string_view unpadded_base_prefix = base.substr(0, operand.size() - 4);

				    const std::string_view unpadded_operand = operand.substr(0, operand.size() - 4);

				    if (unpadded_base_prefix != unpadded_operand) {

				        return false;

				    }

				    // Decode and compare last 4 bytes of base64-encoded strings

				    const std::string base_remainder = base64_decode_string(base.substr(operand.size() - 4, operand.size()));

				    const std::string operand_remainder = base64_decode_string(operand.substr(operand.size() - 4));

				    return base_remainder.starts_with(operand_remainder);

				}

									
										6

alternator/base64.hh
									
												View File
												
				@@ -23,7 +23,7 @@

				#include <string_view>

				#include "bytes.hh"

				#include "rjson.hh"

				#include "utils/rjson.hh"

				std::string base64_encode(bytes_view);

				@@ -32,3 +32,7 @@ bytes base64_decode(std::string_view);

				inline bytes base64_decode(const rjson::value& v) {

				  return base64_decode(std::string_view(v.GetString(), v.GetStringLength()));

				}

				size_t base64_decoded_len(std::string_view str);

				bool base64_begins_with(std::string_view base, std::string_view operand);

									
										625

alternator/conditions.cc
									
												View File
												
				@@ -26,9 +26,15 @@

				#include "alternator/error.hh"

				#include "cql3/constants.hh"

				#include <unordered_map>

				#include "rjson.hh"

				#include "utils/rjson.hh"

				#include "serialization.hh"

				#include "base64.hh"

				#include <stdexcept>

				#include <boost/algorithm/cxx11/all_of.hpp>

				#include <boost/algorithm/cxx11/any_of.hpp>

				#include "utils/overloaded_functor.hh"

				#include "expressions.hh"

				namespace alternator {

				@@ -47,61 +53,20 @@ comparison_operator_type get_comparison_operator(const rjson::value& comparison_

				            {"NOT_NULL", comparison_operator_type::NOT_NULL},

				            {"BETWEEN", comparison_operator_type::BETWEEN},

				            {"BEGINS_WITH", comparison_operator_type::BEGINS_WITH},

				    }; //TODO: CONTAINS

				            {"CONTAINS", comparison_operator_type::CONTAINS},

				            {"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},

				    };

				    if (!comparison_operator.IsString()) {

				        throw api_error("ValidationException", format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));

				        throw api_error::validation(format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));

				    }

				    std::string op = comparison_operator.GetString();

				    auto it = ops.find(op);

				    if (it == ops.end()) {

				        throw api_error("ValidationException", format("Unsupported comparison operator {}", op));

				        throw api_error::validation(format("Unsupported comparison operator {}", op));

				    }

				    return it->second;

				}

				static ::shared_ptr<cql3::restrictions::single_column_restriction::contains> make_map_element_restriction(const column_definition& cdef, std::string_view key, const rjson::value& value) {

				    bytes raw_key = utf8_type->from_string(sstring_view(key.data(), key.size()));

				    auto key_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_key)));

				    bytes raw_value = serialize_item(value);

				    auto entry_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));

				    return make_shared<cql3::restrictions::single_column_restriction::contains>(cdef, std::move(key_value), std::move(entry_value));

				}

				static ::shared_ptr<cql3::restrictions::single_column_restriction::EQ> make_key_eq_restriction(const column_definition& cdef, const rjson::value& value) {

				    bytes raw_value = get_key_from_typed_value(value, cdef, type_to_string(cdef.type));

				    auto restriction_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));

				    return make_shared<cql3::restrictions::single_column_restriction::EQ>(cdef, std::move(restriction_value));

				}

				::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter) {

				    clogger.trace("Getting filtering restrictions for: {}", rjson::print(query_filter));

				    auto filtering_restrictions = ::make_shared<cql3::restrictions::statement_restrictions>(schema, true);

				    for (auto it = query_filter.MemberBegin(); it != query_filter.MemberEnd(); ++it) {

				        std::string_view column_name(it->name.GetString(), it->name.GetStringLength());

				        const rjson::value& condition = it->value;

				        const rjson::value& comp_definition = rjson::get(condition, "ComparisonOperator");

				        const rjson::value& attr_list = rjson::get(condition, "AttributeValueList");

				        comparison_operator_type op = get_comparison_operator(comp_definition);

				        if (op != comparison_operator_type::EQ) {

				            throw api_error("ValidationException", "Filtering is currently implemented for EQ operator only");

				        }

				        if (attr_list.Size() != 1) {

				            throw api_error("ValidationException", format("EQ restriction needs exactly 1 attribute value: {}", rjson::print(attr_list)));

				        }

				        if (const column_definition* cdef = schema->get_column_definition(to_bytes(column_name.data()))) {

				            // Primary key restriction

				            filtering_restrictions->add_restriction(make_key_eq_restriction(*cdef, attr_list[0]), false, true);

				        } else {

				            // Regular column restriction

				            filtering_restrictions->add_restriction(make_map_element_restriction(attrs_col, column_name, attr_list[0]), false, true);

				        }

				    }

				    return filtering_restrictions;

				}

				namespace {

				struct size_check {

				@@ -133,61 +98,209 @@ struct nonempty : public size_check {

				// Check that array has the expected number of elements

				static void verify_operand_count(const rjson::value* array, const size_check& expected, const rjson::value& op) {

				    if (!array && expected(0)) {

				        // If expected() allows an empty AttributeValueList, it is also fine

				        // that it is missing.

				        return;

				    }

				    if (!array || !array->IsArray()) {

				        throw api_error("ValidationException", "With ComparisonOperator, AttributeValueList must be given and an array");

				        throw api_error::validation("With ComparisonOperator, AttributeValueList must be given and an array");

				    }

				    if (!expected(array->Size())) {

				        throw api_error("ValidationException",

				        throw api_error::validation(

				                        format("{} operator requires AttributeValueList {}, instead found list size {}",

				                               op, expected.what(), array->Size()));

				    }

				}

				struct rjson_engaged_ptr_comp {

				    bool operator()(const rjson::value* p1, const rjson::value* p2) const {

				        return rjson::single_value_comp()(*p1, *p2);

				    }

				};

				// It's not enough to compare underlying JSON objects when comparing sets,

				// as internally they're stored in an array, and the order of elements is

				// not important in set equality. See issue #5021

				static bool check_EQ_for_sets(const rjson::value& set1, const rjson::value& set2) {

				    if (!set1.IsArray() || !set2.IsArray() || set1.Size() != set2.Size()) {

				        return false;

				    }

				    std::set<const rjson::value*, rjson_engaged_ptr_comp> set1_raw;

				    for (auto it = set1.Begin(); it != set1.End(); ++it) {

				        set1_raw.insert(&*it);

				    }

				    for (const auto& a : set2.GetArray()) {

				        if (!set1_raw.contains(&a)) {

				            return false;

				        }

				    }

				    return true;

				}

				// Moreover, the JSON being compared can be a nested document with outer

				// layers of lists and maps and some inner set - and we need to get to that

				// inner set to compare it correctly with check_EQ_for_sets() (issue #8514).

				static bool check_EQ(const rjson::value* v1, const rjson::value& v2);

				static bool check_EQ_for_lists(const rjson::value& list1, const rjson::value& list2) {

				    if (!list1.IsArray() || !list2.IsArray() || list1.Size() != list2.Size()) {

				        return false;

				    }

				    auto it1 = list1.Begin();

				    auto it2 = list2.Begin();

				    while (it1 != list1.End()) {

				        // Note: Alternator limits an item's depth (rjson::parse() limits

				        // it to around 37 levels), so this recursion is safe.

				        if (!check_EQ(&*it1, *it2)) {

				            return false;

				        }

				        ++it1;

				        ++it2;

				    }

				    return true;

				}

				static bool check_EQ_for_maps(const rjson::value& list1, const rjson::value& list2) {

				    if (!list1.IsObject() || !list2.IsObject() || list1.MemberCount() != list2.MemberCount()) {

				        return false;

				    }

				    for (auto it1 = list1.MemberBegin(); it1 != list1.MemberEnd(); ++it1) {

				        auto it2 = list2.FindMember(it1->name);

				        if (it2 == list2.MemberEnd() || !check_EQ(&it1->value, it2->value)) {

				            return false;

				        }

				    }

				    return true;

				}

				// Check if two JSON-encoded values match with the EQ relation

				static bool check_EQ(const rjson::value* v1, const rjson::value& v2) {

				    return v1 && *v1 == v2;

				    if (v1 && v1->IsObject() && v1->MemberCount() == 1 && v2.IsObject() && v2.MemberCount() == 1) {

				        auto it1 = v1->MemberBegin();

				        auto it2 = v2.MemberBegin();

				        if (it1->name != it2->name) {

				            return false;

				        }

				        if (it1->name == "SS" || it1->name == "NS" || it1->name == "BS") {

				            return check_EQ_for_sets(it1->value, it2->value);

				        } else if(it1->name == "L") {

				            return check_EQ_for_lists(it1->value, it2->value);

				        } else if(it1->name == "M") {

				            return check_EQ_for_maps(it1->value, it2->value);

				        } else {

				            // Other, non-nested types (number, string, etc.) can be compared

				            // literally, comparing their JSON representation.

				            return it1->value == it2->value;

				        }

				    } else {

				        // If v1 and/or v2 are missing (IsNull()) the result should be false.

				        // In the unlikely case that the object is malformed (issue #8070),

				        // let's also return false.

				        return false;

				    }

				}

				// Check if two JSON-encoded values match with the NE relation

				static bool check_NE(const rjson::value* v1, const rjson::value& v2) {

				    return !v1 || *v1 != v2; // null is unequal to anything.

				    return !check_EQ(v1, v2);

				}

				// Check if two JSON-encoded values match with the BEGINS_WITH relation

				static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {

				    // BEGINS_WITH requires that its single operand (v2) be a string or

				    // binary - otherwise it's a validation error. However, problems with

				    // the stored attribute (v1) will just return false (no match).

				    if (!v2.IsObject() || v2.MemberCount() != 1) {

				        throw api_error("ValidationException", format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));

				    }

				    auto it2 = v2.MemberBegin();

				    if (it2->name != "S" && it2->name != "B") {

				        throw api_error("ValidationException", format("BEGINS_WITH operator requires String or Binary in AttributeValue, got {}", it2->name));

				    }

				bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2,

				                       bool v1_from_query, bool v2_from_query) {

				    bool bad = false;

				    if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {

				        if (v1_from_query) {

				            throw api_error::validation("begins_with() encountered malformed argument");

				        } else {

				            bad = true;

				        }

				    } else if (v1->MemberBegin()->name != "S" && v1->MemberBegin()->name != "B") {

				        if (v1_from_query) {

				            throw api_error::validation(format("begins_with supports only string or binary type, got: {}", *v1));

				        } else {

				            bad = true;

				        }

				    }

				    if (!v2.IsObject() || v2.MemberCount() != 1) {

				        if (v2_from_query) {

				            throw api_error::validation("begins_with() encountered malformed argument");

				        } else {

				            bad = true;

				        }

				    } else if (v2.MemberBegin()->name != "S" && v2.MemberBegin()->name != "B") {

				        if (v2_from_query) {

				            throw api_error::validation(format("begins_with() supports only string or binary type, got: {}", v2));

				        } else {

				            bad = true;

				        }

				    }

				    if (bad) {

				        return false;

				    }

				    auto it1 = v1->MemberBegin();

				    auto it2 = v2.MemberBegin();

				    if (it1->name != it2->name) {

				        return false;

				    }

				    std::string_view val1(it1->value.GetString(), it1->value.GetStringLength());

				    std::string_view val2(it2->value.GetString(), it2->value.GetStringLength());

				    return val1.substr(0, val2.size()) == val2;

				    if (it2->name == "S") {

				        return rjson::to_string_view(it1->value).starts_with(rjson::to_string_view(it2->value));

				    } else /* it2->name == "B" */ {

				        return base64_begins_with(rjson::to_string_view(it1->value), rjson::to_string_view(it2->value));

				    }

				}

				static bool is_set_of(const rjson::value& type1, const rjson::value& type2) {

				    return (type2 == "S" && type1 == "SS") || (type2 == "N" && type1 == "NS") || (type2 == "B" && type1 == "BS");

				}

				// Check if two JSON-encoded values match with the CONTAINS relation

				bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {

				    if (!v1) {

				        return false;

				    }

				    const auto& kv1 = *v1->MemberBegin();

				    const auto& kv2 = *v2.MemberBegin();

				    if (kv1.name == "S" && kv2.name == "S") {

				        return rjson::to_string_view(kv1.value).find(rjson::to_string_view(kv2.value)) != std::string_view::npos;

				    } else if (kv1.name == "B" && kv2.name == "B") {

				        return base64_decode(kv1.value).find(base64_decode(kv2.value)) != bytes::npos;

				    } else if (is_set_of(kv1.name, kv2.name)) {

				        for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {

				            if (*i == kv2.value) {

				                return true;

				            }

				        }

				    } else if (kv1.name == "L") {

				        for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {

				            if (!i->IsObject() || i->MemberCount() != 1) {

				                clogger.error("check_CONTAINS received a list whose element is malformed");

				                return false;

				            }

				            const auto& el = *i->MemberBegin();

				            if (el.name == kv2.name && el.value == kv2.value) {

				                return true;

				            }

				        }

				    }

				    return false;

				}

				// Check if two JSON-encoded values match with the NOT_CONTAINS relation

				static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2) {

				    if (!v1) {

				        return false;

				    }

				    return !check_CONTAINS(v1, v2);

				}

				// Check if a JSON-encoded value equals any element of an array, which must have at least one element.

				static bool check_IN(const rjson::value* val, const rjson::value& array) {

				    if (!array[0].IsObject() || array[0].MemberCount() != 1) {

				        throw api_error("ValidationException",

				        throw api_error::validation(

				                        format("IN operator encountered malformed AttributeValue: {}", array[0]));

				    }

				    const auto& type = array[0].MemberBegin()->name;

				    if (type != "S" && type != "N" && type != "B") {

				        throw api_error("ValidationException",

				        throw api_error::validation(

				                        "IN operator requires AttributeValueList elements to be of type String, Number, or Binary ");

				    }

				    if (!val) {

				@@ -196,7 +309,7 @@ static bool check_IN(const rjson::value* val, const rjson::value& array) {

				    bool have_match = false;

				    for (const auto& elem : array.GetArray()) {

				        if (!elem.IsObject() || elem.MemberCount() != 1 || elem.MemberBegin()->name != type) {

				            throw api_error("ValidationException",

				            throw api_error::validation(

				                            "IN operator requires all AttributeValueList elements to have the same type ");

				        }

				        if (!have_match && *val == elem) {

				@@ -207,6 +320,19 @@ static bool check_IN(const rjson::value* val, const rjson::value& array) {

				    return have_match;

				}

				// Another variant of check_IN, this one for ConditionExpression. It needs to

				// check whether the first element in the given vector is equal to any of the

				// others.

				static bool check_IN(const std::vector<rjson::value>& array) {

				    const rjson::value* first = &array[0];

				    for (unsigned i = 1; i < array.size(); i++) {

				        if (check_EQ(first, array[i])) {

				            return true;

				        }

				    }

				    return false;

				}

				static bool check_NULL(const rjson::value* val) {

				    return val == nullptr;

				}

				@@ -215,29 +341,45 @@ static bool check_NOT_NULL(const rjson::value* val) {

				    return val != nullptr;

				}

				// Only types S, N or B (string, number or bytes) may be compared by the

				// various comparion operators - lt, le, gt, ge, and between.

				// Note that in particular, if the value is missing (v->IsNull()), this

				// check returns false.

				static bool check_comparable_type(const rjson::value& v) {

				    if (!v.IsObject() || v.MemberCount() != 1) {

				        return false;

				    }

				    const rjson::value& type = v.MemberBegin()->name;

				    return type == "S" || type == "N" || type == "B";

				}

				// Check if two JSON-encoded values match with cmp.

				template <typename Comparator>

				bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {

				    if (!v2.IsObject() || v2.MemberCount() != 1) {

				        throw api_error("ValidationException",

				                        format("{} requires a single AttributeValue of type String, Number, or Binary",

				                               cmp.diagnostic()));

				bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp,

				                   bool v1_from_query, bool v2_from_query) {

				    bool bad = false;

				    if (!v1 || !check_comparable_type(*v1)) {

				        if (v1_from_query) {

				            throw api_error::validation(format("{} allow only the types String, Number, or Binary", cmp.diagnostic));

				        }

				        bad = true;

				    }

				    const auto& kv2 = *v2.MemberBegin();

				    if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {

				        throw api_error("ValidationException",

				                        format("{} requires a single AttributeValue of type String, Number, or Binary",

				                               cmp.diagnostic()));

				    if (!check_comparable_type(v2)) {

				        if (v2_from_query) {

				            throw api_error::validation(format("{} allow only the types String, Number, or Binary", cmp.diagnostic));

				        }

				        bad = true;

				    }

				    if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {

				    if (bad) {

				        return false;

				    }

				    const auto& kv1 = *v1->MemberBegin();

				    const auto& kv2 = *v2.MemberBegin();

				    if (kv1.name != kv2.name) {

				        return false;

				    }

				    if (kv1.name == "N") {

				        return cmp(unwrap_number(*v1, cmp.diagnostic()), unwrap_number(v2, cmp.diagnostic()));

				        return cmp(unwrap_number(*v1, cmp.diagnostic), unwrap_number(v2, cmp.diagnostic));

				    }

				    if (kv1.name == "S") {

				        return cmp(std::string_view(kv1.value.GetString(), kv1.value.GetStringLength()),

				@@ -246,21 +388,105 @@ bool check_compare(const rjson::value* v1, const rjson::value& v2, const Compara

				    if (kv1.name == "B") {

				        return cmp(base64_decode(kv1.value), base64_decode(kv2.value));

				    }

				    clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");

				    // cannot reach here, as check_comparable_type() verifies the type is one

				    // of the above options.

				    return false;

				}

				struct cmp_lt {

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs; }

				    const char* diagnostic() const { return "LT operator"; }

				    // We cannot use the normal comparison operators like "<" on the bytes

				    // type, because they treat individual bytes as signed but we need to

				    // compare them as *unsigned*. So we need a specialization for bytes.

				    bool operator()(const bytes& lhs, const bytes& rhs) const { return compare_unsigned(lhs, rhs) < 0; }

				    static constexpr const char* diagnostic = "LT operator";

				};

				struct cmp_le {

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs <= rhs; }

				    bool operator()(const bytes& lhs, const bytes& rhs) const { return compare_unsigned(lhs, rhs) <= 0; }

				    static constexpr const char* diagnostic = "LE operator";

				};

				struct cmp_ge {

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs >= rhs; }

				    bool operator()(const bytes& lhs, const bytes& rhs) const { return compare_unsigned(lhs, rhs) >= 0; }

				    static constexpr const char* diagnostic = "GE operator";

				};

				struct cmp_gt {

				    // bytes only has <

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs; }

				    const char* diagnostic() const { return "GT operator"; }

				    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs > rhs; }

				    bool operator()(const bytes& lhs, const bytes& rhs) const { return compare_unsigned(lhs, rhs) > 0; }

				    static constexpr const char* diagnostic = "GT operator";

				};

				// True if v is between lb and ub, inclusive.  Throws or returns false

				// (depending on bounds_from_query parameter) if lb > ub.

				template <typename T>

				static bool check_BETWEEN(const T& v, const T& lb, const T& ub, bool bounds_from_query) {

				    if (cmp_lt()(ub, lb)) {

				        if (bounds_from_query) {

				            throw api_error::validation(

				                format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));

				        } else {

				            return false;

				        }

				    }

				    return cmp_ge()(v, lb) && cmp_le()(v, ub);

				}

				static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub,

				                          bool v_from_query, bool lb_from_query, bool ub_from_query) {

				    if ((v && v_from_query && !check_comparable_type(*v)) ||

				        (lb_from_query && !check_comparable_type(lb)) ||

				        (ub_from_query && !check_comparable_type(ub))) {

				        throw api_error::validation("between allow only the types String, Number, or Binary");

				    }

				    if (!v || !v->IsObject() || v->MemberCount() != 1 ||

				        !lb.IsObject() || lb.MemberCount() != 1 ||

				        !ub.IsObject() || ub.MemberCount() != 1) {

				        return false;

				    }

				    const auto& kv_v = *v->MemberBegin();

				    const auto& kv_lb = *lb.MemberBegin();

				    const auto& kv_ub = *ub.MemberBegin();

				    bool bounds_from_query = lb_from_query && ub_from_query;

				    if (kv_lb.name != kv_ub.name) {

				        if (bounds_from_query) {

				           throw api_error::validation(

				                format("BETWEEN operator requires the same type for lower and upper bound; instead got {} and {}",

				                       kv_lb.name, kv_ub.name));

				        } else {

				            return false;

				        }

				    }

				    if (kv_v.name != kv_lb.name) { // Cannot compare different types, so v is NOT between lb and ub.

				        return false;

				    }

				    if (kv_v.name == "N") {

				        const char* diag = "BETWEEN operator";

				        return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag), bounds_from_query);

				    }

				    if (kv_v.name == "S") {

				        return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),

				                             std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),

				                             std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()),

				                             bounds_from_query);

				    }

				    if (kv_v.name == "B") {

				        return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value), bounds_from_query);

				    }

				    if (v_from_query) {

				        throw api_error::validation(

				            format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",

				               kv_lb.name));

				    } else {

				        return false;

				    }

				}

				// Verify one Expect condition on one attribute (whose content is "got")

				// for the verify_expected() below.

				// This function returns true or false depending on whether the condition

				@@ -276,24 +502,24 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu

				    // and requires a different combinations of parameters in the request

				    if (value) {

				        if (exists && (!exists->IsBool() || exists->GetBool() != true)) {

				            throw api_error("ValidationException", "Cannot combine Value with Exists!=true");

				            throw api_error::validation("Cannot combine Value with Exists!=true");

				        }

				        if (comparison_operator) {

				            throw api_error("ValidationException", "Cannot combine Value with ComparisonOperator");

				            throw api_error::validation("Cannot combine Value with ComparisonOperator");

				        }

				        return check_EQ(got, *value);

				    } else if (exists) {

				        if (comparison_operator) {

				            throw api_error("ValidationException", "Cannot combine Exists with ComparisonOperator");

				            throw api_error::validation("Cannot combine Exists with ComparisonOperator");

				        }

				        if (!exists->IsBool() || exists->GetBool() != false) {

				            throw api_error("ValidationException", "Exists!=false requires Value");

				            throw api_error::validation("Exists!=false requires Value");

				        }

				        // Remember Exists=false, so we're checking that the attribute does *not* exist:

				        return !got;

				    } else {

				        if (!comparison_operator) {

				            throw api_error("ValidationException", "Missing ComparisonOperator, Value or Exists");

				            throw api_error::validation("Missing ComparisonOperator, Value or Exists");

				        }

				        comparison_operator_type op = get_comparison_operator(*comparison_operator);

				        switch (op) {

				@@ -305,13 +531,19 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu

				            return check_NE(got, (*attribute_value_list)[0]);

				        case comparison_operator_type::LT:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_compare(got, (*attribute_value_list)[0], cmp_lt{});

				            return check_compare(got, (*attribute_value_list)[0], cmp_lt{}, false, true);

				        case comparison_operator_type::LE:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_compare(got, (*attribute_value_list)[0], cmp_le{}, false, true);

				        case comparison_operator_type::GT:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_compare(got, (*attribute_value_list)[0], cmp_gt{});

				            return check_compare(got, (*attribute_value_list)[0], cmp_gt{}, false, true);

				        case comparison_operator_type::GE:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_compare(got, (*attribute_value_list)[0], cmp_ge{}, false, true);

				        case comparison_operator_type::BEGINS_WITH:

				            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				            return check_BEGINS_WITH(got, (*attribute_value_list)[0]);

				            return check_BEGINS_WITH(got, (*attribute_value_list)[0], false, true);

				        case comparison_operator_type::IN:

				            verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);

				            return check_IN(got, *attribute_value_list);

				@@ -321,67 +553,198 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu

				        case comparison_operator_type::NOT_NULL:

				            verify_operand_count(attribute_value_list, empty(), *comparison_operator);

				            return check_NOT_NULL(got);

				        default:

				            // FIXME: implement all the missing types, so there will be no default here.

				            throw api_error("ValidationException", format("ComparisonOperator {} is not yet supported", *comparison_operator));

				        case comparison_operator_type::BETWEEN:

				            verify_operand_count(attribute_value_list, exact_size(2), *comparison_operator);

				            return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1],

				                                 false, true, true);

				        case comparison_operator_type::CONTAINS:

				            {

				                verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				                // Expected's "CONTAINS" has this artificial limitation.

				                // ConditionExpression's "contains()" does not...

				                const rjson::value& arg = (*attribute_value_list)[0];

				                const auto& argtype = (*arg.MemberBegin()).name;

				                if (argtype != "S" && argtype != "N" && argtype != "B") {

				                    throw api_error::validation(

				                            format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "

				                                    "got {} instead", argtype));

				                }

				                return check_CONTAINS(got, arg);

				            }

				        case comparison_operator_type::NOT_CONTAINS:

				            {

				                verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);

				                // Expected's "NOT_CONTAINS" has this artificial limitation.

				                // ConditionExpression's "contains()" does not...

				                const rjson::value& arg = (*attribute_value_list)[0];

				                const auto& argtype = (*arg.MemberBegin()).name;

				                if (argtype != "S" && argtype != "N" && argtype != "B") {

				                    throw api_error::validation(

				                            format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "

				                                    "got {} instead", argtype));

				                }

				                return check_NOT_CONTAINS(got, arg);

				            }

				        }

				        throw std::logic_error(format("Internal error: corrupted operator enum: {}", int(op)));

				    }

				}

				// Verify that the existing values of the item (previous_item) match the

				conditional_operator_type get_conditional_operator(const rjson::value& req) {

				    const rjson::value* conditional_operator = rjson::find(req, "ConditionalOperator");

				    if (!conditional_operator) {

				        return conditional_operator_type::MISSING;

				    }

				    if (!conditional_operator->IsString()) {

				        throw api_error::validation("'ConditionalOperator' parameter, if given, must be a string");

				    }

				    auto s = rjson::to_string_view(*conditional_operator);

				    if (s == "AND") {

				        return conditional_operator_type::AND;

				    } else if (s == "OR") {

				        return conditional_operator_type::OR;

				    } else {

				        throw api_error::validation(

				                format("'ConditionalOperator' parameter must be AND, OR or missing. Found {}.", s));

				    }

				}

				// Check if the existing values of the item (previous_item) match the

				// conditions given by the Expected and ConditionalOperator parameters

				// (if they exist) in the request (an UpdateItem, PutItem or DeleteItem).

				// This function will throw a ConditionalCheckFailedException API error

				// if the values do not match the condition, or ValidationException if there

				// This function can throw an ValidationException API error if there

				// are errors in the format of the condition itself.

				void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item) {

				bool verify_expected(const rjson::value& req, const rjson::value* previous_item) {

				    const rjson::value* expected = rjson::find(req, "Expected");

				    auto conditional_operator = get_conditional_operator(req);

				    if (conditional_operator != conditional_operator_type::MISSING &&

				        (!expected || (expected->IsObject() && expected->GetObject().ObjectEmpty()))) {

				            throw api_error::validation("'ConditionalOperator' parameter cannot be specified for missing or empty Expression");

				    }

				    if (!expected) {

				        return;

				        return true;

				    }

				    if (!expected->IsObject()) {

				        throw api_error("ValidationException", "'Expected' parameter, if given, must be an object");

				    }

				    // ConditionalOperator can be "AND" for requiring all conditions, or

				    // "OR" for requiring one condition, and defaults to "AND" if missing.

				    const rjson::value* conditional_operator = rjson::find(req, "ConditionalOperator");

				    bool require_all = true;

				    if (conditional_operator) {

				        if (!conditional_operator->IsString()) {

				            throw api_error("ValidationException", "'ConditionalOperator' parameter, if given, must be a string");

				        }

				        std::string_view s(conditional_operator->GetString(), conditional_operator->GetStringLength());

				        if (s == "AND") {

				            // require_all is already true

				        } else if (s == "OR") {

				            require_all = false;

				        } else {

				            throw api_error("ValidationException", "'ConditionalOperator' parameter must be AND, OR or missing");

				        }

				        if (expected->GetObject().ObjectEmpty()) {

				            throw api_error("ValidationException", "'ConditionalOperator' parameter cannot be specified for empty Expression");

				        }

				        throw api_error::validation("'Expected' parameter, if given, must be an object");

				    }

				    bool require_all = conditional_operator != conditional_operator_type::OR;

				    return verify_condition(*expected, require_all, previous_item);

				}

				    for (auto it = expected->MemberBegin(); it != expected->MemberEnd(); ++it) {

				bool verify_condition(const rjson::value& condition, bool require_all, const rjson::value* previous_item) {

				    for (auto it = condition.MemberBegin(); it != condition.MemberEnd(); ++it) {

				        const rjson::value* got = nullptr;

				        if (previous_item && previous_item->IsObject() && previous_item->HasMember("Item")) {

				            got = rjson::find((*previous_item)["Item"], rjson::string_ref_type(it->name.GetString()));

				        if (previous_item) {

				            got = rjson::find(*previous_item, rjson::to_string_view(it->name));

				        }

				        bool success = verify_expected_one(it->value, got);

				        if (success && !require_all) {

				            // When !require_all, one success is enough!

				            return;

				            return true;

				        } else if (!success && require_all) {

				            // When require_all, one failure is enough!

				            throw api_error("ConditionalCheckFailedException", "Failed condition.");

				            return false;

				        }

				    }

				    // If we got here and require_all, none of the checks failed, so succeed.

				    // If we got here and !require_all, all of the checks failed, so fail.

				    if (!require_all) {

				        throw api_error("ConditionalCheckFailedException", "None of ORed Expect conditions were successful.");

				    return require_all;

				}

				static bool calculate_primitive_condition(const parsed::primitive_condition& cond,

				        const rjson::value* previous_item) {

				    std::vector<rjson::value> calculated_values;

				    calculated_values.reserve(cond._values.size());

				    for (const parsed::value& v : cond._values) {

				        calculated_values.push_back(calculate_value(v,

				                cond._op == parsed::primitive_condition::type::VALUE ?

				                        calculate_value_caller::ConditionExpressionAlone :

				                        calculate_value_caller::ConditionExpression,

				                previous_item));

				    }

				    switch (cond._op) {

				    case parsed::primitive_condition::type::BETWEEN:

				        if (calculated_values.size() != 3) {

				            // Shouldn't happen unless we have a bug in the parser

				            throw std::logic_error(format("Wrong number of values {} in BETWEEN primitive_condition", cond._values.size()));

				        }

				        return check_BETWEEN(&calculated_values[0], calculated_values[1], calculated_values[2],

				                             cond._values[0].is_constant(), cond._values[1].is_constant(), cond._values[2].is_constant());

				    case parsed::primitive_condition::type::IN:

				        return check_IN(calculated_values);

				    case parsed::primitive_condition::type::VALUE:

				        if (calculated_values.size() != 1) {

				            // Shouldn't happen unless we have a bug in the parser

				            throw std::logic_error(format("Unexpected values in primitive_condition", cond._values.size()));

				        }

				        // Unwrap the boolean wrapped as the value (if it is a boolean)

				        if (calculated_values[0].IsObject() && calculated_values[0].MemberCount() == 1) {

				            auto it = calculated_values[0].MemberBegin();

				            if (it->name == "BOOL" && it->value.IsBool()) {

				                return it->value.GetBool();

				            }

				        }

				        throw api_error::validation(

				                format("ConditionExpression: condition results in a non-boolean value: {}",

				                        calculated_values[0]));

				    default:

				        // All the rest of the operators have exactly two parameters (and unless

				        // we have a bug in the parser, that's what we have in the parsed object:

				        if (calculated_values.size() != 2) {

				            throw std::logic_error(format("Wrong number of values {} in primitive_condition object", cond._values.size()));

				        }

				    }

				    switch (cond._op) {

				    case parsed::primitive_condition::type::EQ:

				        return check_EQ(&calculated_values[0], calculated_values[1]);

				    case parsed::primitive_condition::type::NE:

				        return check_NE(&calculated_values[0], calculated_values[1]);

				    case parsed::primitive_condition::type::GT:

				        return check_compare(&calculated_values[0], calculated_values[1], cmp_gt{},

				            cond._values[0].is_constant(), cond._values[1].is_constant());

				    case parsed::primitive_condition::type::GE:

				        return check_compare(&calculated_values[0], calculated_values[1], cmp_ge{},

				            cond._values[0].is_constant(), cond._values[1].is_constant());

				    case parsed::primitive_condition::type::LT:

				        return check_compare(&calculated_values[0], calculated_values[1], cmp_lt{},

				            cond._values[0].is_constant(), cond._values[1].is_constant());

				    case parsed::primitive_condition::type::LE:

				        return check_compare(&calculated_values[0], calculated_values[1], cmp_le{},

				            cond._values[0].is_constant(), cond._values[1].is_constant());

				    default:

				        // Shouldn't happen unless we have a bug in the parser

				        throw std::logic_error(format("Unknown type {} in primitive_condition object", (int)(cond._op)));

				    }

				}

				// Check if the existing values of the item (previous_item) match the

				// conditions given by the given parsed ConditionExpression.

				bool verify_condition_expression(

				        const parsed::condition_expression& condition_expression,

				        const rjson::value* previous_item) {

				    if (condition_expression.empty()) {

				        return true;

				    }

				    bool ret = std::visit(overloaded_functor {

				        [&] (const parsed::primitive_condition& cond) -> bool {

				            return calculate_primitive_condition(cond, previous_item);

				        },

				        [&] (const parsed::condition_expression::condition_list& list) -> bool {

				            auto verify_condition = [&] (const parsed::condition_expression& e) {

				                return verify_condition_expression(e, previous_item);

				            };

				            switch (list.op) {

				            case '&':

				                return boost::algorithm::all_of(list.conditions, verify_condition);

				            case '|':

				                return boost::algorithm::any_of(list.conditions, verify_condition);

				            default:

				                // Shouldn't happen unless we have a bug in the parser

				                throw std::logic_error("bad operator in condition_list");

				            }

				        }

				    }, condition_expression._expression);

				    return condition_expression._negated ? !ret : ret;

				}

				}

									
										18

alternator/conditions.hh
									
												View File
												
				@@ -33,17 +33,29 @@

				#include "cql3/restrictions/statement_restrictions.hh"

				#include "serialization.hh"

				#include "expressions_types.hh"

				namespace alternator {

				enum class comparison_operator_type {

				    EQ, NE, LE, LT, GE, GT, IN, BETWEEN, CONTAINS, IS_NULL, NOT_NULL, BEGINS_WITH

				    EQ, NE, LE, LT, GE, GT, IN, BETWEEN, CONTAINS, NOT_CONTAINS, IS_NULL, NOT_NULL, BEGINS_WITH

				};

				comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator);

				::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter);

				enum class conditional_operator_type {

				    AND, OR, MISSING

				};

				conditional_operator_type get_conditional_operator(const rjson::value& req);

				void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item);

				bool verify_expected(const rjson::value& req, const rjson::value* previous_item);

				bool verify_condition(const rjson::value& condition, bool require_all, const rjson::value* previous_item);

				bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2);

				bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2, bool v1_from_query, bool v2_from_query);

				bool verify_condition_expression(

				        const parsed::condition_expression& condition_expression,

				        const rjson::value* previous_item);

				}

									
										53

alternator/error.hh
									
												View File
												
				@@ -26,12 +26,15 @@

				namespace alternator {

				// DynamoDB's error messages are described in detail in

				// api_error contains a DynamoDB error message to be returned to the user.

				// It can be returned by value (see executor::request_return_type) or thrown.

				// The DynamoDB's error messages are described in detail in

				// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html

				// Ah An error message has a "type", e.g., "ResourceNotFoundException", a coarser

				// HTTP code (almost always, 400), and a human readable message. Eventually these

				// will be wrapped into a JSON object returned to the client.

				class api_error : public std::exception {

				// An error message has an HTTP code (almost always 400), a type, e.g.,

				// "ResourceNotFoundException", and a human readable message.

				// Eventually alternator::api_handler will convert a returned or thrown

				// api_error into a JSON object, and that is returned to the user.

				class api_error final {

				public:

				    using status_type = httpd::reply::status_type;

				    status_type _http_code;

				@@ -42,8 +45,44 @@ public:

				        , _type(std::move(type))

				        , _msg(std::move(msg))

				    { }

				    api_error() = default;

				    virtual const char* what() const noexcept override { return _msg.c_str(); }

				    // Factory functions for some common types of DynamoDB API errors

				    static api_error validation(std::string msg) {

				        return api_error("ValidationException", std::move(msg));

				    }

				    static api_error resource_not_found(std::string msg) {

				        return api_error("ResourceNotFoundException", std::move(msg));

				    }

				    static api_error resource_in_use(std::string msg) {

				        return api_error("ResourceInUseException", std::move(msg));

				    }

				    static api_error invalid_signature(std::string msg) {

				        return api_error("InvalidSignatureException", std::move(msg));

				    }

				    static api_error missing_authentication_token(std::string msg) {

				        return api_error("MissingAuthenticationTokenException", std::move(msg));

				    }

				    static api_error unrecognized_client(std::string msg) {

				        return api_error("UnrecognizedClientException", std::move(msg));

				    }

				    static api_error unknown_operation(std::string msg) {

				        return api_error("UnknownOperationException", std::move(msg));

				    }

				    static api_error access_denied(std::string msg) {

				        return api_error("AccessDeniedException", std::move(msg));

				    }

				    static api_error conditional_check_failed(std::string msg) {

				        return api_error("ConditionalCheckFailedException", std::move(msg));

				    }

				    static api_error expired_iterator(std::string msg) {

				        return api_error("ExpiredIteratorException", std::move(msg));

				    }

				    static api_error trimmed_data_access_exception(std::string msg) {

				        return api_error("TrimmedDataAccessException", std::move(msg));

				    }

				    static api_error internal(std::string msg) {

				        return api_error("InternalServerError", std::move(msg), reply::status_type::internal_server_error);

				    }

				};

				}

3801

alternator/executor.cc

View File

File diff suppressed because it is too large Load Diff

									
										191

alternator/executor.hh
									
												View File
												
				@@ -25,47 +25,202 @@

				#include <seastar/http/httpd.hh>

				#include "seastarx.hh"

				#include <seastar/json/json_elements.hh>

				#include <seastar/core/sharded.hh>

				#include "service/storage_proxy.hh"

				#include "service/migration_manager.hh"

				#include "service/client_state.hh"

				#include "db/timeout_clock.hh"

				#include "alternator/error.hh"

				#include "stats.hh"

				#include "utils/rjson.hh"

				namespace db {

				    class system_distributed_keyspace;

				}

				namespace query {

				class partition_slice;

				class result;

				}

				namespace cql3::selection {

				    class selection;

				}

				namespace service {

				    class storage_service;

				}

				namespace alternator {

				class executor {

				class rmw_operation;

				struct make_jsonable : public json::jsonable {

				    rjson::value _value;

				public:

				    explicit make_jsonable(rjson::value&& value);

				    std::string to_json() const override;

				};

				struct json_string : public json::jsonable {

				    std::string _value;

				public:

				    explicit json_string(std::string&& value);

				    std::string to_json() const override;

				};

				namespace parsed {

				class path;

				};

				// An attribute_path_map object is used to hold data for various attributes

				// paths (parsed::path) in a hierarchy of attribute paths. Each attribute path

				// has a root attribute, and then modified by member and index operators -

				// for example in "a.b[2].c" we have "a" as the root, then ".b" member, then

				// "[2]" index, and finally ".c" member.

				// Data can be added to an attribute_path_map using the add() function, but

				// requires that attributes with data not be *overlapping* or *conflicting*:

				//

				// 1. Two attribute paths which are identical or an ancestor of one another

				//    are considered *overlapping* and not allowed. If a.b.c has data,

				//    we can't add more data in a.b.c or any of its descendants like a.b.c.d.

				//

				// 2. Two attribute paths which need the same parent to have both a member and

				//    an index are considered *conflicting* and not allowed. E.g., if a.b has

				//    data, you can't add a[1]. The meaning of adding both would be that the

				//    attribute a is both a map and an array, which isn't sensible.

				//

				// These two requirements are common to the two places where Alternator uses

				// this abstraction to describe how a hierarchical item is to be transformed:

				//

				// 1. In ProjectExpression: for filtering from a full top-level attribute

				//    only the parts for which user asked in ProjectionExpression.

				//

				// 2. In UpdateExpression: for taking the previous value of a top-level

				//    attribute, and modifying it based on the instructions in the user

				//    wrote in UpdateExpression.

				template<typename T>

				class attribute_path_map_node {

				public:

				    using data_t = T;

				    // We need the extra shared_ptr<> here because libstdc++ unordered_map

				    // doesn't work with incomplete types :-( We couldn't use lw_shared_ptr<>

				    // because it doesn't work for incomplete types either. We couldn't use

				    // std::unique_ptr<> because it makes the entire object uncopyable. We

				    // don't often need to copy such a map, but we do have some code that

				    // copies an attrs_to_get object, and is hard to find and remove.

				    // The shared_ptr should never be null.

				    using members_t =  std::unordered_map<std::string, seastar::shared_ptr<attribute_path_map_node<T>>>;

				    // The indexes list is sorted because DynamoDB requires handling writes

				    // beyond the end of a list in index order.

				    using indexes_t = std::map<unsigned, seastar::shared_ptr<attribute_path_map_node<T>>>;

				    // The prohibition on "overlap" and "conflict" explained above means

				    // That only one of data, members or indexes is non-empty.

				    std::optional<std::variant<data_t, members_t, indexes_t>> _content;

				    bool is_empty() const { return !_content; }

				    bool has_value() const { return _content && std::holds_alternative<data_t>(*_content); }

				    bool has_members() const { return _content && std::holds_alternative<members_t>(*_content); }

				    bool has_indexes() const { return _content && std::holds_alternative<indexes_t>(*_content); }

				    // get_members() assumes that has_members() is true

				    members_t& get_members() { return std::get<members_t>(*_content); }

				    const members_t& get_members() const { return std::get<members_t>(*_content); }

				    indexes_t& get_indexes() { return std::get<indexes_t>(*_content); }

				    const indexes_t& get_indexes() const { return std::get<indexes_t>(*_content); }

				    T& get_value() { return std::get<T>(*_content); }

				    const T& get_value() const { return std::get<T>(*_content); }

				};

				template<typename T>

				using attribute_path_map = std::unordered_map<std::string, attribute_path_map_node<T>>;

				using attrs_to_get_node = attribute_path_map_node<std::monostate>;

				using attrs_to_get = attribute_path_map<std::monostate>;

				class executor : public peering_sharded_service<executor> {

				    service::storage_proxy& _proxy;

				    service::migration_manager& _mm;

				    db::system_distributed_keyspace& _sdks;

				    service::storage_service& _ss;

				    // An smp_service_group to be used for limiting the concurrency when

				    // forwarding Alternator request between shards - if necessary for LWT.

				    smp_service_group _ssg;

				public:

				    using client_state = service::client_state;

				    using request_return_type = std::variant<json::json_return_type, api_error>;

				    stats _stats;

				    static constexpr auto ATTRS_COLUMN_NAME = ":attrs";

				    static constexpr auto KEYSPACE_NAME = "alternator";

				    static constexpr auto KEYSPACE_NAME_PREFIX = "alternator_";

				    static constexpr std::string_view INTERNAL_TABLE_PREFIX = ".scylla.alternator.";

				    executor(service::storage_proxy& proxy, service::migration_manager& mm) : _proxy(proxy), _mm(mm) {}

				    executor(service::storage_proxy& proxy, service::migration_manager& mm, db::system_distributed_keyspace& sdks, service::storage_service& ss, smp_service_group ssg)

				        : _proxy(proxy), _mm(mm), _sdks(sdks), _ss(ss), _ssg(ssg) {}

				    future<json::json_return_type> create_table(client_state& client_state, std::string content);

				    future<json::json_return_type> describe_table(client_state& client_state, std::string content);

				    future<json::json_return_type> delete_table(client_state& client_state, std::string content);

				    future<json::json_return_type> put_item(client_state& client_state, std::string content);

				    future<json::json_return_type> get_item(client_state& client_state, std::string content);

				    future<json::json_return_type> delete_item(client_state& client_state, std::string content);

				    future<json::json_return_type> update_item(client_state& client_state, std::string content);

				    future<json::json_return_type> list_tables(client_state& client_state, std::string content);

				    future<json::json_return_type> scan(client_state& client_state, std::string content);

				    future<json::json_return_type> describe_endpoints(client_state& client_state, std::string content, std::string host_header);

				    future<json::json_return_type> batch_write_item(client_state& client_state, std::string content);

				    future<json::json_return_type> batch_get_item(client_state& client_state, std::string content);

				    future<json::json_return_type> query(client_state& client_state, std::string content);

				    future<request_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> update_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> list_tables(client_state& client_state, service_permit permit, rjson::value request);

				    future<request_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> describe_endpoints(client_state& client_state, service_permit permit, rjson::value request, std::string host_header);

				    future<request_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> tag_resource(client_state& client_state, service_permit permit, rjson::value request);

				    future<request_return_type> untag_resource(client_state& client_state, service_permit permit, rjson::value request);

				    future<request_return_type> list_tags_of_resource(client_state& client_state, service_permit permit, rjson::value request);

				    future<request_return_type> list_streams(client_state& client_state, service_permit permit, rjson::value request);

				    future<request_return_type> describe_stream(client_state& client_state, service_permit permit, rjson::value request);

				    future<request_return_type> get_shard_iterator(client_state& client_state, service_permit permit, rjson::value request);

				    future<request_return_type> get_records(client_state& client_state, tracing::trace_state_ptr, service_permit permit, rjson::value request);

				    future<> start();

				    future<> stop() { return make_ready_future<>(); }

				    future<> maybe_create_keyspace();

				    future<> create_keyspace(std::string_view keyspace_name);

				    static void maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);

				    static tracing::trace_state_ptr maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);

				    static sstring table_name(const schema&);

				    static db::timeout_clock::time_point default_timeout();

				    static void set_default_timeout(db::timeout_clock::duration timeout);

				private:

				    static db::timeout_clock::duration s_default_timeout;

				public:

				    static schema_ptr find_table(service::storage_proxy&, const rjson::value& request);

				private:

				    friend class rmw_operation;

				    static bool is_alternator_keyspace(const sstring& ks_name);

				    static sstring make_keyspace_name(const sstring& table_name);

				    static void describe_key_schema(rjson::value& parent, const schema&, std::unordered_map<std::string,std::string> * = nullptr);

				    static void describe_key_schema(rjson::value& parent, const schema& schema, std::unordered_map<std::string,std::string>&);

				public:    

				    static std::optional<rjson::value> describe_single_item(schema_ptr,

				        const query::partition_slice&,

				        const cql3::selection::selection&,

				        const query::result&,

				        const attrs_to_get&);

				    static void describe_single_item(const cql3::selection::selection&,

				        const std::vector<bytes_opt>&,

				        const attrs_to_get&,

				        rjson::value&,

				        bool = false);

				    void add_stream_options(const rjson::value& stream_spec, schema_builder&) const;

				    void supplement_table_info(rjson::value& descr, const schema& schema) const;

				    void supplement_table_stream_info(rjson::value& descr, const schema& schema) const;

				};

				}

									
										677

alternator/expressions.cc
									
												View File
												
				@@ -20,15 +20,24 @@

				 */

				#include "expressions.hh"

				#include "serialization.hh"

				#include "base64.hh"

				#include "conditions.hh"

				#include "alternator/expressionsLexer.hpp"

				#include "alternator/expressionsParser.hpp"

				#include "utils/overloaded_functor.hh"

				#include "error.hh"

				#include <seastarx.hh>

				#include "seastarx.hh"

				#include <seastar/core/print.hh>

				#include <seastar/util/log.hh>

				#include <boost/algorithm/cxx11/any_of.hpp>

				#include <boost/algorithm/cxx11/all_of.hpp>

				#include <functional>

				#include <unordered_map>

				namespace alternator {

				@@ -65,13 +74,19 @@ parse_projection_expression(std::string query) {

				    }

				}

				template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };

				template<class... Ts> overloaded(Ts...) -> overloaded<Ts...>;

				parsed::condition_expression

				parse_condition_expression(std::string query) {

				    try {

				        return do_with_parser(query,  std::mem_fn(&expressionsParser::condition_expression));

				    } catch (...) {

				        throw expressions_syntax_error(format("Failed parsing ConditionExpression '{}': {}", query, std::current_exception()));

				    }

				}

				namespace parsed {

				void update_expression::add(update_expression::action a) {

				    std::visit(overloaded {

				    std::visit(overloaded_functor {

				        [&] (action::set&)    { seen_set = true; },

				        [&] (action::remove&) { seen_remove = true; },

				        [&] (action::add&)    { seen_add = true; },

				@@ -94,5 +109,659 @@ void update_expression::append(update_expression other) {

				    seen_del |= other.seen_del;

				}

				void condition_expression::append(condition_expression&& a, char op) {

				    std::visit(overloaded_functor {

				        [&] (condition_list& x) {

				            // If 'a' has a single condition, we could, instead of inserting

				            // it insert its single condition (possibly negated if a._negated)

				            // But considering it we don't evaluate these expressions many

				            // times, this optimization is not worth extra code complexity.

				            if (!x.conditions.empty() && x.op != op) {

				                // Shouldn't happen unless we have a bug in the parser

				                throw std::logic_error("condition_expression::append called with mixed operators");

				            }

				            x.conditions.push_back(std::move(a));

				            x.op = op;

				        },

				        [&] (primitive_condition& x) {

				            // Shouldn't happen unless we have a bug in the parser

				            throw std::logic_error("condition_expression::append called on primitive_condition");

				        }

				    }, _expression);

				}

				void path::check_depth_limit() {

				    if (1 + _operators.size() > depth_limit) {

				        throw expressions_syntax_error(format("Document path exceeded {} nesting levels", depth_limit));

				    }

				}

				std::ostream& operator<<(std::ostream& os, const path& p) {

				    os << p.root();

				    for (const auto& op : p.operators()) {

				        std::visit(overloaded_functor {

				            [&] (const std::string& member) {

				                os << '.' << member;

				            },

				            [&] (unsigned index) {

				                os << '[' << index << ']';

				            }

				        }, op);

				    }

				    return os;

				}

				} // namespace parsed

				// The following resolve_*() functions resolve references in parsed

				// expressions of different types. Resolving a parsed expression means

				// replacing:

				//  1. In parsed::path objects, replace references like "#name" with the

				//     attribute name from ExpressionAttributeNames,

				//  2. In parsed::constant objects, replace references like ":value" with

				//     the value from ExpressionAttributeValues.

				// These function also track which name and value references were used, to

				// allow complaining if some remain unused.

				// Note that the resolve_*() functions modify the expressions in-place,

				// so if we ever intend to cache parsed expression, we need to pass a copy

				// into this function.

				//

				// Doing the "resolving" stage before the evaluation stage has two benefits.

				// First, it allows us to be compatible with DynamoDB in catching unused

				// names and values (see issue #6572). Second, in the FilterExpression case,

				// we need to resolve the expression just once but then use it many times

				// (once for each item to be filtered).

				static std::optional<std::string> resolve_path_component(const std::string& column_name,

				        const rjson::value* expression_attribute_names,

				        std::unordered_set<std::string>& used_attribute_names) {

				    if (column_name.size() > 0 && column_name.front() == '#') {

				        if (!expression_attribute_names) {

				            throw api_error::validation(

				                    format("ExpressionAttributeNames missing, entry '{}' required by expression", column_name));

				        }

				        const rjson::value* value = rjson::find(*expression_attribute_names, column_name);

				        if (!value || !value->IsString()) {

				            throw api_error::validation(

				                    format("ExpressionAttributeNames missing entry '{}' required by expression", column_name));

				        }

				        used_attribute_names.emplace(column_name);

				        return std::string(rjson::to_string_view(*value));

				    }

				    return std::nullopt;

				}

				static void resolve_path(parsed::path& p,

				        const rjson::value* expression_attribute_names,

				        std::unordered_set<std::string>& used_attribute_names) {

				    std::optional<std::string> r = resolve_path_component(p.root(), expression_attribute_names, used_attribute_names);

				    if (r) {

				        p.set_root(std::move(*r));

				    }

				    for (auto& op : p.operators()) {

				        std::visit(overloaded_functor {

				            [&] (std::string& s) {

				                r = resolve_path_component(s, expression_attribute_names, used_attribute_names);

				                if (r) {

				                    s = std::move(*r);

				                }

				            },

				            [&] (unsigned index) {

				                // nothing to resolve

				            }

				        }, op);

				    }

				}

				static void resolve_constant(parsed::constant& c,

				        const rjson::value* expression_attribute_values,

				        std::unordered_set<std::string>& used_attribute_values) {

				    std::visit(overloaded_functor {

				        [&] (const std::string& valref) {

				            if (!expression_attribute_values) {

				                throw api_error::validation(

				                        format("ExpressionAttributeValues missing, entry '{}' required by expression", valref));

				            }

				            const rjson::value* value = rjson::find(*expression_attribute_values, valref);

				            if (!value) {

				                throw api_error::validation(

				                        format("ExpressionAttributeValues missing entry '{}' required by expression", valref));

				            }

				            if (value->IsNull()) {

				                throw api_error::validation(

				                        format("ExpressionAttributeValues null value for entry '{}' required by expression", valref));

				            }

				            validate_value(*value, "ExpressionAttributeValues");

				            used_attribute_values.emplace(valref);

				            c.set(*value);

				        },

				        [&] (const parsed::constant::literal& lit) {

				            // Nothing to do, already resolved

				        }

				    }, c._value);

				}

				void resolve_value(parsed::value& rhs,

				        const rjson::value* expression_attribute_names,

				        const rjson::value* expression_attribute_values,

				        std::unordered_set<std::string>& used_attribute_names,

				        std::unordered_set<std::string>& used_attribute_values) {

				    std::visit(overloaded_functor {

				        [&] (parsed::constant& c) {

				            resolve_constant(c, expression_attribute_values, used_attribute_values);

				        },

				        [&] (parsed::value::function_call& f) {

				            for (parsed::value& value : f._parameters) {

				                resolve_value(value, expression_attribute_names, expression_attribute_values,

				                        used_attribute_names, used_attribute_values);

				            }

				        },

				        [&] (parsed::path& p) {

				            resolve_path(p, expression_attribute_names, used_attribute_names);

				        }

				    }, rhs._value);

				}

				void resolve_set_rhs(parsed::set_rhs& rhs,

				        const rjson::value* expression_attribute_names,

				        const rjson::value* expression_attribute_values,

				        std::unordered_set<std::string>& used_attribute_names,

				        std::unordered_set<std::string>& used_attribute_values) {

				    resolve_value(rhs._v1, expression_attribute_names, expression_attribute_values,

				            used_attribute_names, used_attribute_values);

				    if (rhs._op != 'v') {

				        resolve_value(rhs._v2, expression_attribute_names, expression_attribute_values,

				                used_attribute_names, used_attribute_values);

				    }

				}

				void resolve_update_expression(parsed::update_expression& ue,

				        const rjson::value* expression_attribute_names,

				        const rjson::value* expression_attribute_values,

				        std::unordered_set<std::string>& used_attribute_names,

				        std::unordered_set<std::string>& used_attribute_values) {

				    for (parsed::update_expression::action& action : ue.actions()) {

				        resolve_path(action._path, expression_attribute_names, used_attribute_names);

				        std::visit(overloaded_functor {

				            [&] (parsed::update_expression::action::set& a) {

				                resolve_set_rhs(a._rhs, expression_attribute_names, expression_attribute_values,

				                        used_attribute_names, used_attribute_values);

				            },

				            [&] (parsed::update_expression::action::remove& a) {

				                // nothing to do

				            },

				            [&] (parsed::update_expression::action::add& a) {

				                resolve_constant(a._valref, expression_attribute_values, used_attribute_values);

				            },

				            [&] (parsed::update_expression::action::del& a) {

				                resolve_constant(a._valref, expression_attribute_values, used_attribute_values);

				            }

				        }, action._action);

				    }

				}

				static void resolve_primitive_condition(parsed::primitive_condition& pc,

				        const rjson::value* expression_attribute_names,

				        const rjson::value* expression_attribute_values,

				        std::unordered_set<std::string>& used_attribute_names,

				        std::unordered_set<std::string>& used_attribute_values) {

				    for (parsed::value& value : pc._values) {

				        resolve_value(value,

				                expression_attribute_names, expression_attribute_values,

				                used_attribute_names, used_attribute_values);

				    }

				}

				void resolve_condition_expression(parsed::condition_expression& ce,

				        const rjson::value* expression_attribute_names,

				        const rjson::value* expression_attribute_values,

				        std::unordered_set<std::string>& used_attribute_names,

				        std::unordered_set<std::string>& used_attribute_values) {

				    std::visit(overloaded_functor {

				        [&] (parsed::primitive_condition& cond) {

				            resolve_primitive_condition(cond,

				                    expression_attribute_names, expression_attribute_values,

				                    used_attribute_names, used_attribute_values);

				        },

				        [&] (parsed::condition_expression::condition_list& list) {

				            for (parsed::condition_expression& cond : list.conditions) {

				                resolve_condition_expression(cond,

				                        expression_attribute_names, expression_attribute_values,

				                            used_attribute_names, used_attribute_values);

				            }

				        }

				    }, ce._expression);

				}

				void resolve_projection_expression(std::vector<parsed::path>& pe,

				        const rjson::value* expression_attribute_names,

				        std::unordered_set<std::string>& used_attribute_names) {

				    for (parsed::path& p : pe) {

				        resolve_path(p, expression_attribute_names, used_attribute_names);

				    }

				}

				// condition_expression_on() checks whether a condition_expression places any

				// condition on the given attribute. It can be useful, for example, for

				// checking whether the condition tries to restrict a key column.

				static bool value_on(const parsed::value& v, std::string_view attribute) {

				    return std::visit(overloaded_functor {

				        [&] (const parsed::constant& c) {

				            return false;

				        },

				        [&] (const parsed::value::function_call& f) {

				            for (const parsed::value& value : f._parameters) {

				                if (value_on(value, attribute)) {

				                    return true;

				                }

				            }

				            return false;

				        },

				        [&] (const parsed::path& p) {

				            return p.root() == attribute;

				        }

				    }, v._value);

				}

				static bool primitive_condition_on(const parsed::primitive_condition& pc, std::string_view attribute) {

				    for (const parsed::value& value : pc._values) {

				        if (value_on(value, attribute)) {

				            return true;

				        }

				    }

				    return false;

				}

				bool condition_expression_on(const parsed::condition_expression& ce, std::string_view attribute) {

				    return std::visit(overloaded_functor {

				        [&] (const parsed::primitive_condition& cond) {

				            return primitive_condition_on(cond, attribute);

				        },

				        [&] (const parsed::condition_expression::condition_list& list) {

				            for (const parsed::condition_expression& cond : list.conditions) {

				                if (condition_expression_on(cond, attribute)) {

				                    return true;

				                }

				            }

				            return false;

				        }

				    }, ce._expression);

				}

				// for_condition_expression_on() runs a given function over all the attributes

				// mentioned in the expression. If the same attribute is mentioned more than

				// once, the function will be called more than once for the same attribute.

				static void for_value_on(const parsed::value& v, const noncopyable_function<void(std::string_view)>& func) {

				    std::visit(overloaded_functor {

				        [&] (const parsed::constant& c) { },

				        [&] (const parsed::value::function_call& f) {

				            for (const parsed::value& value : f._parameters) {

				                for_value_on(value, func);

				            }

				        },

				        [&] (const parsed::path& p) {

				            func(p.root());

				        }

				    }, v._value);

				}

				void for_condition_expression_on(const parsed::condition_expression& ce, const noncopyable_function<void(std::string_view)>& func) {

				    std::visit(overloaded_functor {

				        [&] (const parsed::primitive_condition& cond) {

				            for (const parsed::value& value : cond._values) {

				                for_value_on(value, func);

				            }

				        },

				        [&] (const parsed::condition_expression::condition_list& list) {

				            for (const parsed::condition_expression& cond : list.conditions) {

				                for_condition_expression_on(cond, func);

				            }

				        }

				    }, ce._expression);

				}

				// The following calculate_value() functions calculate, or evaluate, a parsed

				// expression. The parsed expression is assumed to have been "resolved", with

				// the matching resolve_* function.

				// Take two JSON-encoded list values (remember that a list value is

				// {"L": [...the actual list]}) and return the concatenation, again as

				// a list value.

				static rjson::value list_concatenate(const rjson::value& v1, const rjson::value& v2) {

				    const rjson::value* list1 = unwrap_list(v1);

				    const rjson::value* list2 = unwrap_list(v2);

				    if (!list1 || !list2) {

				        throw api_error::validation("UpdateExpression: list_append() given a non-list");

				    }

				    rjson::value cat = rjson::copy(*list1);

				    for (const auto& a : list2->GetArray()) {

				        rjson::push_back(cat, rjson::copy(a));

				    }

				    rjson::value ret = rjson::empty_object();

				    rjson::set(ret, "L", std::move(cat));

				    return ret;

				}

				// calculate_size() is ConditionExpression's size() function, i.e., it takes

				// a JSON-encoded value and returns its "size" as defined differently for the

				// different types - also as a JSON-encoded number.

				// It return a JSON-encoded "null" value if this value's type has no size

				// defined. Comparisons against this non-numeric value will later fail.

				static rjson::value calculate_size(const rjson::value& v) {

				    // NOTE: If v is improperly formatted for our JSON value encoding, it

				    // must come from the request itself, not from the database, so it makes

				    // sense to throw a ValidationException if we see such a problem.

				    if (!v.IsObject() || v.MemberCount() != 1) {

				        throw api_error::validation(format("invalid object: {}", v));

				    }

				    auto it = v.MemberBegin();

				    int ret;

				    if (it->name == "S") {

				        if (!it->value.IsString()) {

				            throw api_error::validation(format("invalid string: {}", v));

				        }

				        ret = it->value.GetStringLength();

				    } else if (it->name == "NS" || it->name == "SS" || it->name == "BS" || it->name == "L") {

				        if (!it->value.IsArray()) {

				            throw api_error::validation(format("invalid set: {}", v));

				        }

				        ret = it->value.Size();

				    } else if (it->name == "M") {

				        if (!it->value.IsObject()) {

				            throw api_error::validation(format("invalid map: {}", v));

				        }

				        ret = it->value.MemberCount();

				    } else if (it->name == "B") {

				        if (!it->value.IsString()) {

				            throw api_error::validation(format("invalid byte string: {}", v));

				        }

				        ret = base64_decoded_len(rjson::to_string_view(it->value));

				    } else {

				        rjson::value json_ret = rjson::empty_object();

				        rjson::set(json_ret, "null", rjson::value(true));

				        return json_ret;

				    }

				    rjson::value json_ret = rjson::empty_object();

				    rjson::set(json_ret, "N", rjson::from_string(std::to_string(ret)));

				    return json_ret;

				}

				static const rjson::value& calculate_value(const parsed::constant& c) {

				    return std::visit(overloaded_functor {

				        [&] (const parsed::constant::literal& v) -> const rjson::value& {

				            return *v;

				        },

				        [&] (const std::string& valref) -> const rjson::value& {

				            // Shouldn't happen, we should have called resolve_value() earlier

				            // and replaced the value reference by the literal constant.

				            throw std::logic_error("calculate_value() called before resolve_value()");

				        }

				    }, c._value);

				}

				static rjson::value to_bool_json(bool b) {

				    rjson::value json_ret = rjson::empty_object();

				    rjson::set(json_ret, "BOOL", rjson::value(b));

				    return json_ret;

				}

				static bool known_type(std::string_view type) {

				    static thread_local const std::unordered_set<std::string_view> types = {

				            "N", "S", "B", "NS", "SS", "BS", "L", "M", "NULL", "BOOL"

				    };

				    return types.contains(type);

				}

				using function_handler_type = rjson::value(calculate_value_caller, const rjson::value*, const parsed::value::function_call&);

				static const

				std::unordered_map<std::string_view, function_handler_type*> function_handlers {

				    {"list_append", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {

				            if (caller != calculate_value_caller::UpdateExpression) {

				                throw api_error::validation(

				                        format("{}: list_append() not allowed here", caller));

				            }

				            if (f._parameters.size() != 2) {

				                throw api_error::validation(

				                        format("{}: list_append() accepts 2 parameters, got {}", caller, f._parameters.size()));

				            }

				            rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);

				            rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);

				            return list_concatenate(v1, v2);

				        }

				    },

				    {"if_not_exists", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {

				            if (caller != calculate_value_caller::UpdateExpression) {

				                throw api_error::validation(

				                        format("{}: if_not_exists() not allowed here", caller));

				            }

				            if (f._parameters.size() != 2) {

				                throw api_error::validation(

				                        format("{}: if_not_exists() accepts 2 parameters, got {}", caller, f._parameters.size()));

				            }

				            if (!std::holds_alternative<parsed::path>(f._parameters[0]._value)) {

				                throw api_error::validation(

				                        format("{}: if_not_exists() must include path as its first argument", caller));

				            }

				            rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);

				            rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);

				            return v1.IsNull() ? std::move(v2) : std::move(v1);

				        }

				    },

				    {"size", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {

				            if (caller != calculate_value_caller::ConditionExpression) {

				                throw api_error::validation(

				                        format("{}: size() not allowed here", caller));

				            }

				            if (f._parameters.size() != 1) {

				                throw api_error::validation(

				                        format("{}: size() accepts 1 parameter, got {}", caller, f._parameters.size()));

				            }

				            rjson::value v = calculate_value(f._parameters[0], caller, previous_item);

				            return calculate_size(v);

				        }

				    },

				    {"attribute_exists", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {

				            if (caller != calculate_value_caller::ConditionExpressionAlone) {

				                throw api_error::validation(

				                        format("{}: attribute_exists() not allowed here", caller));

				            }

				            if (f._parameters.size() != 1) {

				                throw api_error::validation(

				                        format("{}: attribute_exists() accepts 1 parameter, got {}", caller, f._parameters.size()));

				            }

				            if (!std::holds_alternative<parsed::path>(f._parameters[0]._value)) {

				                throw api_error::validation(

				                        format("{}: attribute_exists()'s parameter must be a path", caller));

				            }

				            rjson::value v = calculate_value(f._parameters[0], caller, previous_item);

				            return to_bool_json(!v.IsNull());

				        }

				    },

				    {"attribute_not_exists", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {

				            if (caller != calculate_value_caller::ConditionExpressionAlone) {

				                throw api_error::validation(

				                        format("{}: attribute_not_exists() not allowed here", caller));

				            }

				            if (f._parameters.size() != 1) {

				                throw api_error::validation(

				                        format("{}: attribute_not_exists() accepts 1 parameter, got {}", caller, f._parameters.size()));

				            }

				            if (!std::holds_alternative<parsed::path>(f._parameters[0]._value)) {

				                throw api_error::validation(

				                        format("{}: attribute_not_exists()'s parameter must be a path", caller));

				            }

				            rjson::value v = calculate_value(f._parameters[0], caller, previous_item);

				            return to_bool_json(v.IsNull());

				        }

				    },

				    {"attribute_type", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {

				            if (caller != calculate_value_caller::ConditionExpressionAlone) {

				                throw api_error::validation(

				                        format("{}: attribute_type() not allowed here", caller));

				            }

				            if (f._parameters.size() != 2) {

				                throw api_error::validation(

				                        format("{}: attribute_type() accepts 2 parameters, got {}", caller, f._parameters.size()));

				            }

				            // There is no real reason for the following check (not

				            // allowing the type to come from a document attribute), but

				            // DynamoDB does this check, so we do too...

				            if (!f._parameters[1].is_constant()) {

				                throw api_error::validation(

				                        format("{}: attribute_types()'s first parameter must be an expression attribute", caller));

				            }

				            rjson::value v0 = calculate_value(f._parameters[0], caller, previous_item);

				            rjson::value v1 = calculate_value(f._parameters[1], caller, previous_item);

				            if (v1.IsObject() && v1.MemberCount() == 1 && v1.MemberBegin()->name == "S") {

				                // If the type parameter is not one of the legal types

				                // we should generate an error, not a failed condition:

				                if (!known_type(rjson::to_string_view(v1.MemberBegin()->value))) {

				                    throw api_error::validation(

				                            format("{}: attribute_types()'s second parameter, {}, is not a known type",

				                                    caller, v1.MemberBegin()->value));

				                }

				                if (v0.IsObject() && v0.MemberCount() == 1) {

				                    return to_bool_json(v1.MemberBegin()->value == v0.MemberBegin()->name);

				                } else {

				                    return to_bool_json(false);

				                }

				            } else {

				                throw api_error::validation(

				                        format("{}: attribute_type() second parameter must refer to a string, got {}", caller, v1));

				            }

				        }

				    },

				    {"begins_with", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {

				            if (caller != calculate_value_caller::ConditionExpressionAlone) {

				                throw api_error::validation(

				                        format("{}: begins_with() not allowed here", caller));

				            }

				            if (f._parameters.size() != 2) {

				                throw api_error::validation(

				                        format("{}: begins_with() accepts 2 parameters, got {}", caller, f._parameters.size()));

				            }

				            rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);

				            rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);

				            return to_bool_json(check_BEGINS_WITH(v1.IsNull() ? nullptr : &v1,  v2,

				                                    f._parameters[0].is_constant(), f._parameters[1].is_constant()));

				        }

				    },

				    {"contains", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {

				            if (caller != calculate_value_caller::ConditionExpressionAlone) {

				                throw api_error::validation(

				                        format("{}: contains() not allowed here", caller));

				            }

				            if (f._parameters.size() != 2) {

				                throw api_error::validation(

				                        format("{}: contains() accepts 2 parameters, got {}", caller, f._parameters.size()));

				            }

				            rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);

				            rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);

				            return to_bool_json(check_CONTAINS(v1.IsNull() ? nullptr : &v1,  v2));

				        }

				    },

				};

				// Given a parsed::path and an item read from the table, extract the value

				// of a certain attribute path, such as "a" or "a.b.c[3]". Returns a null

				// value if the item or the requested attribute does not exist.

				// Note that the item is assumed to be encoded in JSON using DynamoDB

				// conventions - each level of a nested document is a map with one key -

				// a type (e.g., "M" for map) - and its value is the representation of

				// that value.

				static rjson::value extract_path(const rjson::value* item,

				        const parsed::path& p, calculate_value_caller caller) {

				    if (!item) {

				        return rjson::null_value();

				    }

				    const rjson::value* v = rjson::find(*item, p.root());

				    if (!v) {

				        return rjson::null_value();

				    }

				    for (const auto& op : p.operators()) {

				        if (!v->IsObject() || v->MemberCount() != 1) {

				            // This shouldn't happen. We shouldn't have stored malformed

				            // objects. But today Alternator does not validate the structure

				            // of nested documents before storing them, so this can happen on

				            // read.

				            throw api_error::validation(format("{}: malformed item read: {}", *item));

				        }

				        const char* type = v->MemberBegin()->name.GetString();

				        v = &(v->MemberBegin()->value);

				        std::visit(overloaded_functor {

				            [&] (const std::string& member) {

				                if (type[0] == 'M' && v->IsObject()) {

				                    v = rjson::find(*v, member);

				                } else {

				                    v = nullptr;

				                }

				            },

				            [&] (unsigned index) {

				                if (type[0] == 'L' && v->IsArray() && index < v->Size()) {

				                    v = &(v->GetArray()[index]);

				                } else {

				                    v = nullptr;

				                }

				            }

				        }, op);

				        if (!v) {

				            return rjson::null_value();

				        }

				    }

				    return rjson::copy(*v);

				}

				// Given a parsed::value, which can refer either to a constant value from

				// ExpressionAttributeValues, to the value of some attribute, or to a function

				// of other values, this function calculates the resulting value.

				// "caller" determines which expression - ConditionExpression or

				// UpdateExpression - is asking for this value. We need to know this because

				// DynamoDB allows a different choice of functions for different expressions.

				rjson::value calculate_value(const parsed::value& v,

				        calculate_value_caller caller,

				        const rjson::value* previous_item) {

				    return std::visit(overloaded_functor {

				        [&] (const parsed::constant& c) -> rjson::value {

				            return rjson::copy(calculate_value(c));

				        },

				        [&] (const parsed::value::function_call& f) -> rjson::value {

				            auto function_it = function_handlers.find(std::string_view(f._function_name));

				            if (function_it == function_handlers.end()) {

				                throw api_error::validation(

				                        format("{}: unknown function '{}' called.", caller, f._function_name));

				            }

				            return function_it->second(caller, previous_item, f);

				        },

				        [&] (const parsed::path& p) -> rjson::value {

				            return extract_path(previous_item, p, caller);

				        }

				    }, v._value);

				}

				// Same as calculate_value() above, except takes a set_rhs, which may be

				// either a single value, or v1+v2 or v1-v2.

				rjson::value calculate_value(const parsed::set_rhs& rhs,

				        const rjson::value* previous_item) {

				    switch (rhs._op) {

				    case 'v':

				        return calculate_value(rhs._v1, calculate_value_caller::UpdateExpression, previous_item);

				    case '+': {

				        rjson::value v1 = calculate_value(rhs._v1, calculate_value_caller::UpdateExpression, previous_item);

				        rjson::value v2 = calculate_value(rhs._v2, calculate_value_caller::UpdateExpression, previous_item);

				        return number_add(v1, v2);

				    }

				    case '-': {

				        rjson::value v1 = calculate_value(rhs._v1, calculate_value_caller::UpdateExpression, previous_item);

				        rjson::value v2 = calculate_value(rhs._v2, calculate_value_caller::UpdateExpression, previous_item);

				        return number_subtract(v1, v2);

				    }

				    }

				    // Can't happen

				    return rjson::null_value();

				}

				} // namespace alternator

69

alternator/expressions.g

View File

@@ -145,6 +145,12 @@ REMOVE: R E M O V E;
 ADD: A D D;
 DELETE: D E L E T E;
 AND: A N D;
 OR: O R;
 NOT: N O T;
 BETWEEN: B E T W E E N;
 IN: I N;
 fragment ALPHA: 'A'..'Z' | 'a'..'z';
 fragment DIGIT: '0'..'9';
 fragment ALNUM: ALPHA | DIGIT | '_';
@@ -165,19 +171,19 @@ path returns [parsed::path p]:
       | '[' INTEGER ']'           { $p.add_index(std::stoi($INTEGER.text)); }
     )*;
 update_expression_set_value returns [parsed::value v]:
       VALREF                             { $v.set_valref($VALREF.text); }
     | path                               { $v.set_path($path.p); }
     | NAME                               { $v.set_func_name($NAME.text); }
      '(' x=update_expression_set_value   { $v.add_func_parameter($x.v); }
      (',' x=update_expression_set_value  { $v.add_func_parameter($x.v); })*
 value returns [parsed::value v]:
       VALREF       { $v.set_valref($VALREF.text); }
     | path         { $v.set_path($path.p); }
     | NAME         { $v.set_func_name($NAME.text); }
      '(' x=value   { $v.add_func_parameter($x.v); }
      (',' x=value  { $v.add_func_parameter($x.v); })*
      ')'
     ;
 update_expression_set_rhs returns [parsed::set_rhs rhs]:
     v=update_expression_set_value  { $rhs.set_value(std::move($v.v)); }
     (   '+' v=update_expression_set_value  { $rhs.set_plus(std::move($v.v)); }
       | '-' v=update_expression_set_value  { $rhs.set_minus(std::move($v.v)); }
     v=value  { $rhs.set_value(std::move($v.v)); }
     (   '+' v=value  { $rhs.set_plus(std::move($v.v)); }
       | '-' v=value  { $rhs.set_minus(std::move($v.v)); }
     )?
     ;
@@ -212,3 +218,48 @@ update_expression returns [parsed::update_expression e]:
 projection_expression returns [std::vector<parsed::path> v]:
     p=path      { $v.push_back(std::move($p.p)); }
     (',' p=path { $v.push_back(std::move($p.p)); } )* EOF;
 primitive_condition returns [parsed::primitive_condition c]:
       v=value         { $c.add_value(std::move($v.v));
                         $c.set_operator(parsed::primitive_condition::type::VALUE); }
       (  (  '='       { $c.set_operator(parsed::primitive_condition::type::EQ); }
           | '<' '>'   { $c.set_operator(parsed::primitive_condition::type::NE); }
           | '<'       { $c.set_operator(parsed::primitive_condition::type::LT); }
           | '<' '='   { $c.set_operator(parsed::primitive_condition::type::LE); }
           | '>'       { $c.set_operator(parsed::primitive_condition::type::GT); }
           | '>' '='   { $c.set_operator(parsed::primitive_condition::type::GE); }
          )
          v=value      { $c.add_value(std::move($v.v)); }
        | BETWEEN      { $c.set_operator(parsed::primitive_condition::type::BETWEEN); }
          v=value      { $c.add_value(std::move($v.v)); }
          AND
          v=value      { $c.add_value(std::move($v.v)); }
        | IN '('       { $c.set_operator(parsed::primitive_condition::type::IN); }
          v=value      { $c.add_value(std::move($v.v)); }
          (',' v=value { $c.add_value(std::move($v.v)); })*
          ')'
       )?
     ;
 // The following rules for parsing boolean expressions are verbose and
 // somewhat strange because of Antlr 3's limitations on recursive rules,
 // common rule prefixes, and (lack of) support for operator precedence.
 // These rules could have been written more clearly using a more powerful
 // parser generator - such as Yacc.
 boolean_expression returns [parsed::condition_expression e]:
 	  b=boolean_expression_1       { $e.append(std::move($b.e), '|'); }
 	  (OR b=boolean_expression_1   { $e.append(std::move($b.e), '|'); } )*
 	;
 boolean_expression_1 returns [parsed::condition_expression e]:
 	  b=boolean_expression_2       { $e.append(std::move($b.e), '&'); }
 	  (AND b=boolean_expression_2  { $e.append(std::move($b.e), '&'); } )*
 	;
 boolean_expression_2 returns [parsed::condition_expression e]:
 	  p=primitive_condition        { $e.set_primitive(std::move($p.c)); }
 	| NOT b=boolean_expression_2   { $e = std::move($b.e); $e.apply_not(); }
 	| '(' b=boolean_expression ')' { $e = std::move($b.e); }
     ;
 condition_expression returns [parsed::condition_expression e]:
     boolean_expression { e=std::move($boolean_expression.e); } EOF;

									
										61

alternator/expressions.hh
									
												View File
												
				@@ -24,8 +24,13 @@

				#include <string>

				#include <stdexcept>

				#include <vector>

				#include <unordered_set>

				#include <string_view>

				#include <seastar/util/noncopyable_function.hh>

				#include "expressions_types.hh"

				#include "utils/rjson.hh"

				namespace alternator {

				@@ -36,6 +41,62 @@ public:

				parsed::update_expression parse_update_expression(std::string query);

				std::vector<parsed::path> parse_projection_expression(std::string query);

				parsed::condition_expression parse_condition_expression(std::string query);

				void resolve_update_expression(parsed::update_expression& ue,

				        const rjson::value* expression_attribute_names,

				        const rjson::value* expression_attribute_values,

				        std::unordered_set<std::string>& used_attribute_names,

				        std::unordered_set<std::string>& used_attribute_values);

				void resolve_projection_expression(std::vector<parsed::path>& pe,

				        const rjson::value* expression_attribute_names,

				        std::unordered_set<std::string>& used_attribute_names);

				void resolve_condition_expression(parsed::condition_expression& ce,

				        const rjson::value* expression_attribute_names,

				        const rjson::value* expression_attribute_values,

				        std::unordered_set<std::string>& used_attribute_names,

				        std::unordered_set<std::string>& used_attribute_values);

				void validate_value(const rjson::value& v, const char* caller);

				bool condition_expression_on(const parsed::condition_expression& ce, std::string_view attribute);

				// for_condition_expression_on() runs the given function on the attributes

				// that the expression uses. It may run for the same attribute more than once

				// if the same attribute is used more than once in the expression.

				void for_condition_expression_on(const parsed::condition_expression& ce, const noncopyable_function<void(std::string_view)>& func);

				// calculate_value() behaves slightly different (especially, different

				// functions supported) when used in different types of expressions, as

				// enumerated in this enum:

				enum class calculate_value_caller {

				    UpdateExpression, ConditionExpression, ConditionExpressionAlone

				};

				inline std::ostream& operator<<(std::ostream& out, calculate_value_caller caller) {

				    switch (caller) {

				        case calculate_value_caller::UpdateExpression:

				            out << "UpdateExpression";

				            break;

				        case calculate_value_caller::ConditionExpression:

				            out << "ConditionExpression";

				            break;

				        case calculate_value_caller::ConditionExpressionAlone:

				            out << "ConditionExpression";

				            break;

				        default:

				            out << "unknown type of expression";

				            break;

				    }

				    return out;

				}

				rjson::value calculate_value(const parsed::value& v,

				        calculate_value_caller caller,

				        const rjson::value* previous_item);

				rjson::value calculate_value(const parsed::set_rhs& rhs,

				        const rjson::value* previous_item);

				} /* namespace alternator */

									
										122

alternator/expressions_types.hh
									
												View File
												
				@@ -25,6 +25,10 @@

				#include <string>

				#include <variant>

				#include <seastar/core/shared_ptr.hh>

				#include "utils/rjson.hh"

				/*

				 * Parsed representation of expressions and their components.

				 *

				@@ -45,15 +49,23 @@ class path {

				    // dot (e.g., ".xyz").

				    std::string _root;

				    std::vector<std::variant<std::string, unsigned>> _operators;

				    // It is useful to limit the depth of a user-specified path, because is

				    // allows us to use recursive algorithms without worrying about recursion

				    // depth. DynamoDB officially limits the length of paths to 32 components

				    // (including the root) so let's use the same limit.

				    static constexpr unsigned depth_limit = 32;

				    void check_depth_limit();

				public:

				    void set_root(std::string root) {

				        _root = std::move(root);

				    }

				    void add_index(unsigned i) {

				        _operators.emplace_back(i);

				        check_depth_limit();

				    }

				    void add_dot(std::string(name)) {

				        _operators.emplace_back(std::move(name));

				        check_depth_limit();

				    }

				    const std::string& root() const {

				        return _root;

				@@ -61,12 +73,36 @@ public:

				    bool has_operators() const {

				        return !_operators.empty();

				    }

				    const std::vector<std::variant<std::string, unsigned>>& operators() const {

				        return _operators;

				    }

				    std::vector<std::variant<std::string, unsigned>>& operators() {

				        return _operators;

				    }

				    friend std::ostream& operator<<(std::ostream&, const path&);

				};

				// When an expression is first parsed, all constants are references, like

				// ":val1", into ExpressionAttributeValues. This uses std::string() variant.

				// The resolve_value() function replaces these constants by the JSON item

				// extracted from the ExpressionAttributeValues.

				struct constant {

				    // We use lw_shared_ptr<rjson::value> just to make rjson::value copyable,

				    // to make this entire object copyable as ANTLR needs.

				    using literal = lw_shared_ptr<rjson::value>;

				    std::variant<std::string, literal> _value;

				    void set(const rjson::value& v) {

				        _value = make_lw_shared<rjson::value>(rjson::copy(v));

				    }

				    void set(std::string& s) {

				        _value = s;

				    }

				};

				// "value" is is a value used in the right hand side of an assignment

				// expression, "SET a = ...". It can be a reference to a value included in

				// the request (":val"), a path to an attribute from the existing item

				// (e.g., "a.b[3].c"), or a function of other such values.

				// expression, "SET a = ...". It can be a constant (a reference to a value

				// included in the request, e.g., ":val"), a path to an attribute from the

				// existing item (e.g., "a.b[3].c"), or a function of other such values.

				// Note that the real right-hand-side of an assignment is actually a bit

				// more general - it allows either a value, or a value+value or value-value -

				// see class set_rhs below.

				@@ -75,9 +111,12 @@ struct value {

				        std::string _function_name;

				        std::vector<value> _parameters;

				    };

				    std::variant<std::string, path, function_call> _value;

				    std::variant<constant, path, function_call> _value;

				    void set_constant(constant c) {

				        _value = std::move(c);

				    }

				    void set_valref(std::string s) {

				        _value = std::move(s);

				        _value = constant { std::move(s) };

				    }

				    void set_path(path p) {

				        _value = std::move(p);

				@@ -88,6 +127,15 @@ struct value {

				    void add_func_parameter(value v) {

				        std::get<function_call>(_value)._parameters.emplace_back(std::move(v));

				    }

				    bool is_constant() const {

				        return std::holds_alternative<constant>(_value);

				    }

				    bool is_path() const {

				        return std::holds_alternative<path>(_value);

				    }

				    bool is_func() const {

				        return std::holds_alternative<function_call>(_value);

				    }

				};

				// The right-hand-side of a SET in an update expression can be either a

				@@ -121,10 +169,10 @@ public:

				        struct remove {

				        };

				        struct add {

				            std::string _valref;

				            constant _valref;

				        };

				        struct del {

				            std::string _valref;

				            constant _valref;

				        };

				        std::variant<set, remove, add, del> _action;

				@@ -138,11 +186,11 @@ public:

				        }

				        void assign_add(path p, std::string v) {

				            _path = std::move(p);

				            _action = add { std::move(v) };

				            _action = add { constant { std::move(v) } };

				        }

				        void assign_del(path p, std::string v) {

				            _path = std::move(p);

				            _action = del { std::move(v) };

				            _action = del { constant { std::move(v) } };

				        }

				    };

				private:

				@@ -160,6 +208,62 @@ public:

				    const std::vector<action>& actions() const {

				        return _actions;

				    }

				    std::vector<action>& actions() {

				        return _actions;

				    }

				};

				// A primitive_condition is a condition expression involving one condition,

				// while the full condition_expression below adds boolean logic over these

				// primitive conditions.

				// The supported primitive conditions are:

				// 1. Binary operators - v1 OP v2, where OP is =, <>, <, <=, >, or >= and

				//    v1 and v2 are values - from the item (an attribute path), the query

				//    (a ":val" reference), or a function of the the above (only the size()

				//    function is supported).

				// 2. Ternary operator - v1 BETWEEN v2 and v3 (means v1 >= v2 AND v1 <= v3).

				// 3. N-ary operator - v1 IN ( v2, v3, ... )

				// 4. A single function call (attribute_exists etc.). The parser actually

				//    accepts a more general "value" here but later stages reject a value

				//    which is not a function call (because DynamoDB does it too).

				class primitive_condition {

				public:

				    enum class type {

				        UNDEFINED, VALUE, EQ, NE, LT, LE, GT, GE, BETWEEN, IN

				    };

				    type _op = type::UNDEFINED;

				    std::vector<value> _values;

				    void set_operator(type op) {

				        _op = op;

				    }

				    void add_value(value&& v) {

				        _values.push_back(std::move(v));

				    }

				    bool empty() const {

				        return _op == type::UNDEFINED;

				    }

				};

				class condition_expression {

				public:

				    bool _negated = false; // If true, the entire condition is negated

				    struct condition_list {

				        char op = '|'; // '&' or '|'

				        std::vector<condition_expression> conditions;

				    };

				    std::variant<primitive_condition, condition_list> _expression = condition_list();

				    void set_primitive(primitive_condition&& p) {

				        _expression = std::move(p);

				    }

				    void append(condition_expression&& c, char op);

				    void apply_not() {

				        _negated = !_negated;

				    }

				    bool empty() const {

				        return std::holds_alternative<condition_list>(_expression) &&

				               std::get<condition_list>(_expression).conditions.empty();

				    }

				};

				} // namespace parsed

									
										120

alternator/rjson.cc
									
												View File
											
				@@ -1,120 +0,0 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "rjson.hh"

				#include "error.hh"

				#include <seastar/core/print.hh>

				namespace rjson {

				static allocator the_allocator;

				std::string print(const rjson::value& value) {

				    string_buffer buffer;

				    writer writer(buffer);

				    value.Accept(writer);

				    return std::string(buffer.GetString());

				}

				rjson::value copy(const rjson::value& value) {

				    return rjson::value(value, the_allocator);

				}

				rjson::value parse(const std::string& str) {

				    return parse_raw(str.c_str(), str.size());

				}

				rjson::value parse_raw(const char* c_str, size_t size) {

				    rjson::document d;

				    d.Parse(c_str, size);

				    if (d.HasParseError()) {

				        throw rjson::error(format("Parsing JSON failed: {}", GetParseError_En(d.GetParseError())));

				    }

				    rjson::value& v = d;

				    return std::move(v);

				}

				rjson::value& get(rjson::value& value, rjson::string_ref_type name) {

				    auto member_it = value.FindMember(name);

				    if (member_it != value.MemberEnd())

				        return member_it->value;

				    else {

				        throw rjson::error(format("JSON parameter {} not found", name));

				    }

				}

				const rjson::value& get(const rjson::value& value, rjson::string_ref_type name) {

				    auto member_it = value.FindMember(name);

				    if (member_it != value.MemberEnd())

				        return member_it->value;

				    else {

				        throw rjson::error(format("JSON parameter {} not found", name));

				    }

				}

				rjson::value from_string(const std::string& str) {

				    return rjson::value(str.c_str(), str.size(), the_allocator);

				}

				rjson::value from_string(const sstring& str) {

				    return rjson::value(str.c_str(), str.size(), the_allocator);

				}

				rjson::value from_string(const char* str, size_t size) {

				    return rjson::value(str, size, the_allocator);

				}

				const rjson::value* find(const rjson::value& value, string_ref_type name) {

				    auto member_it = value.FindMember(name);

				    return member_it != value.MemberEnd() ? &member_it->value : nullptr;

				}

				rjson::value* find(rjson::value& value, string_ref_type name) {

				    auto member_it = value.FindMember(name);

				    return member_it != value.MemberEnd() ? &member_it->value : nullptr;

				}

				void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member) {

				    base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), std::move(member), the_allocator);

				}

				void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member) {

				    base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), rjson::value(member), the_allocator);

				}

				void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member) {

				    base.AddMember(name, std::move(member), the_allocator);

				}

				void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member) {

				    base.AddMember(name, rjson::value(member), the_allocator);

				}

				void push_back(rjson::value& base_array, rjson::value&& item) {

				    base_array.PushBack(std::move(item), the_allocator);

				}

				} // end namespace rjson

				std::ostream& std::operator<<(std::ostream& os, const rjson::value& v) {

				    return os << rjson::print(v);

				}

									
										159

alternator/rjson.hh
									
												View File
											
				@@ -1,159 +0,0 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				/*

				 * rjson is a wrapper over rapidjson library, providing fast JSON parsing and generation.

				 *

				 * rapidjson has strict copy elision policies, which, among other things, involves

				 * using provided char arrays without copying them and allows copying objects only explicitly.

				 * As such, one should be careful when passing strings with limited liveness

				 * (e.g. data underneath local std::strings) to rjson functions, because created JSON objects

				 * may end up relying on dangling char pointers. All rjson functions that create JSONs from strings

				 * by rjson have both APIs for string_ref_type (more optimal, used when the string is known to live

				 * at least as long as the object, e.g. a static char array) and for std::strings. The more optimal

				 * variants should be used *only* if the liveness of the string is guaranteed, otherwise it will

				 * result in undefined behaviour.

				 * Also, bear in mind that methods exposed by rjson::value are generic, but some of them

				 * work fine only for specific types. In case the type does not match, an rjson::error will be thrown.

				 * Examples of such mismatched usages is calling MemberCount() on a JSON value not of object type

				 * or calling Size() on a non-array value.

				 */

				#include <string>

				#include <stdexcept>

				namespace rjson {

				class error : public std::exception {

				    std::string _msg;

				public:

				    error() = default;

				    error(const std::string& msg) : _msg(msg) {}

				    virtual const char* what() const noexcept override { return _msg.c_str(); }

				};

				}

				// rapidjson configuration macros

				#define RAPIDJSON_HAS_STDSTRING 1

				// Default rjson policy is to use assert() - which is dangerous for two reasons:

				// 1. assert() can be turned off with -DNDEBUG

				// 2. assert() crashes a program

				// Fortunately, the default policy can be overridden, and so rapidjson errors will

				// throw an rjson::error exception instead.

				#define RAPIDJSON_ASSERT(x) do { if (!(x)) throw rjson::error(std::string("JSON error: condition not met: ") + #x); } while (0)

				#include <rapidjson/document.h>

				#include <rapidjson/writer.h>

				#include <rapidjson/stringbuffer.h>

				#include <rapidjson/error/en.h>

				#include <seastar/core/sstring.hh>

				#include "seastarx.hh"

				namespace rjson {

				using allocator = rapidjson::CrtAllocator;

				using encoding = rapidjson::UTF8<>;

				using document = rapidjson::GenericDocument<encoding, allocator>;

				using value = rapidjson::GenericValue<encoding, allocator>;

				using string_ref_type = value::StringRefType;

				using string_buffer = rapidjson::GenericStringBuffer<encoding>;

				using writer = rapidjson::Writer<string_buffer, encoding>;

				using type = rapidjson::Type;

				// Returns an object representing JSON's null

				inline rjson::value null_value() {

				    return rjson::value(rapidjson::kNullType);

				}

				// Returns an empty JSON object - {}

				inline rjson::value empty_object() {

				    return rjson::value(rapidjson::kObjectType);

				}

				// Returns an empty JSON array - []

				inline rjson::value empty_array() {

				    return rjson::value(rapidjson::kArrayType);

				}

				// Returns an empty JSON string - ""

				inline rjson::value empty_string() {

				    return rjson::value(rapidjson::kStringType);

				}

				// Convert the JSON value to a string with JSON syntax, the opposite of parse().

				// The representation is dense - without any redundant indentation.

				std::string print(const rjson::value& value);

				// Copies given JSON value - involves allocation

				rjson::value copy(const rjson::value& value);

				// Parses a JSON value from given string or raw character array.

				// The string/char array liveness does not need to be persisted,

				// as both parse() and parse_raw() will allocate member names and values.

				// Throws rjson::error if parsing failed.

				rjson::value parse(const std::string& str);

				rjson::value parse_raw(const char* c_str, size_t size);

				// Creates a JSON value (of JSON string type) out of internal string representations.

				// The string value is copied, so str's liveness does not need to be persisted.

				rjson::value from_string(const std::string& str);

				rjson::value from_string(const sstring& str);

				rjson::value from_string(const char* str, size_t size);

				// Returns a pointer to JSON member if it exists, nullptr otherwise

				rjson::value* find(rjson::value& value, rjson::string_ref_type name);

				const rjson::value* find(const rjson::value& value, rjson::string_ref_type name);

				// Returns a reference to JSON member if it exists, throws otherwise

				rjson::value& get(rjson::value& value, rjson::string_ref_type name);

				const rjson::value& get(const rjson::value& value, rjson::string_ref_type name);

				// Sets a member in given JSON object by moving the member - allocates the name.

				// Throws if base is not a JSON object.

				void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member);

				// Sets a string member in given JSON object by assigning its reference - allocates the name.

				// NOTICE: member string liveness must be ensured to be at least as long as base's.

				// Throws if base is not a JSON object.

				void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member);

				// Sets a member in given JSON object by moving the member.

				// NOTICE: name liveness must be ensured to be at least as long as base's.

				// Throws if base is not a JSON object.

				void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member);

				// Sets a string member in given JSON object by assigning its reference.

				// NOTICE: name liveness must be ensured to be at least as long as base's.

				// NOTICE: member liveness must be ensured to be at least as long as base's.

				// Throws if base is not a JSON object.

				void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member);

				// Adds a value to a JSON list by moving the item to its end.

				// Throws if base_array is not a JSON array.

				void push_back(rjson::value& base_array, rjson::value&& item);

				} // end namespace rjson

				namespace std {

				std::ostream& operator<<(std::ostream& os, const rjson::value& v);

				}

									
										128

alternator/rmw_operation.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,128 @@

				/*

				 * Copyright 2020 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include "seastarx.hh"

				#include "service/storage_proxy.hh"

				#include "service/storage_proxy.hh"

				#include "utils/rjson.hh"

				#include "executor.hh"

				namespace alternator {

				// An rmw_operation encapsulates the common logic of all the item update

				// operations which may involve a read of the item before the write

				// (so-called Read-Modify-Write operations). These operations include PutItem,

				// UpdateItem and DeleteItem: All of these may be conditional operations (the

				// "Expected" parameter) which requir a read before the write, and UpdateItem

				// may also have an update expression which refers to the item's old value.

				//

				// The code below supports running the read and the write together as one

				// transaction using LWT (this is why rmw_operation is a subclass of

				// cas_request, as required by storage_proxy::cas()), but also has optional

				// modes not using LWT.

				class rmw_operation : public service::cas_request, public enable_shared_from_this<rmw_operation> {

				public:

				    // The following options choose which mechanism to use for isolating

				    // parallel write operations:

				    // * The FORBID_RMW option forbids RMW (read-modify-write) operations

				    //   such as conditional updates. For the remaining write-only

				    //   operations, ordinary quorum writes are isolated enough.

				    // * The LWT_ALWAYS option always uses LWT (lightweight transactions)

				    //   for any write operation - whether or not it also has a read.

				    // * The LWT_RMW_ONLY option uses LWT only for RMW operations, and uses

				    //   ordinary quorum writes for write-only operations.

				    //   This option is not safe if the user may send both RMW and write-only

				    //   operations on the same item.

				    // * The UNSAFE_RMW option does read-modify-write operations as separate

				    //   read and write. It is unsafe - concurrent RMW operations are not

				    //   isolated at all. This option will likely be removed in the future.

				    enum class write_isolation {

				        FORBID_RMW, LWT_ALWAYS, LWT_RMW_ONLY, UNSAFE_RMW

				    };

				    static constexpr auto WRITE_ISOLATION_TAG_KEY = "system:write_isolation";

				    static write_isolation get_write_isolation_for_schema(schema_ptr schema);

				    static write_isolation default_write_isolation;

				public:

				    static void set_default_write_isolation(std::string_view mode);

				protected:

				    // The full request JSON

				    rjson::value _request;

				    // All RMW operations involve a single item with a specific partition

				    // and optional clustering key, in a single table, so the following

				    // information is common to all of them:

				    schema_ptr _schema;

				    partition_key _pk = partition_key::make_empty();

				    clustering_key _ck = clustering_key::make_empty();

				    write_isolation _write_isolation;

				    // All RMW operations can have a ReturnValues parameter from the following

				    // choices. But note that only UpdateItem actually supports all of them:

				    enum class returnvalues {

				        NONE, ALL_OLD, UPDATED_OLD, ALL_NEW, UPDATED_NEW

				    } _returnvalues;

				    static returnvalues parse_returnvalues(const rjson::value& request);

				    // When _returnvalues != NONE, apply() should store here, in JSON form,

				    // the values which are to be returned in the "Attributes" field.

				    // The default null JSON means do not return an Attributes field at all.

				    // This field is marked "mutable" so that the const apply() can modify

				    // it (see explanation below), but note that because apply() may be

				    // called more than once, if apply() will sometimes set this field it

				    // must set it (even if just to the default empty value) every time.

				    mutable rjson::value _return_attributes;

				public:

				    // The constructor of a rmw_operation subclass should parse the request

				    // and try to discover as many input errors as it can before really

				    // attempting the read or write operations.

				    rmw_operation(service::storage_proxy& proxy, rjson::value&& request);

				    // rmw_operation subclasses (update_item_operation, put_item_operation

				    // and delete_item_operation) shall implement an apply() function which

				    // takes the previous value of the item (if it was read) and creates the

				    // write mutation. If the previous value of item does not pass the needed

				    // conditional expression, apply() should return an empty optional.

				    // apply() may throw if it encounters input errors not discovered during

				    // the constructor.

				    // apply() may be called more than once in case of contention, so it must

				    // not change the state saved in the object (issue #7218 was caused by

				    // violating this). We mark apply() "const" to let the compiler validate

				    // this for us. The output-only field _return_attributes is marked

				    // "mutable" above so that apply() can still write to it.

				    virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts) const = 0;

				    // Convert the above apply() into the signature needed by cas_request:

				    virtual std::optional<mutation> apply(foreign_ptr<lw_shared_ptr<query::result>> qr, const query::partition_slice& slice, api::timestamp_type ts) override;

				    virtual ~rmw_operation() = default;

				    schema_ptr schema() const { return _schema; }

				    const rjson::value& request() const { return _request; }

				    rjson::value&& move_request() && { return std::move(_request); }

				    future<executor::request_return_type> execute(service::storage_proxy& proxy,

				            service::client_state& client_state,

				            tracing::trace_state_ptr trace_state,

				            service_permit permit,

				            bool needs_read_before_write,

				            stats& stats);

				    std::optional<shard_id> shard_for_execute(bool needs_read_before_write);

				};

				} // namespace alternator

									
										199

alternator/serialization.cc
									
												View File
												
				@@ -25,13 +25,14 @@

				#include "error.hh"

				#include "rapidjson/writer.h"

				#include "concrete_types.hh"

				#include "cql3/type_json.hh"

				static logging::logger slogger("alternator-serialization");

				namespace alternator {

				type_info type_info_from_string(std::string type) {

				    static thread_local const std::unordered_map<std::string, type_info> type_infos = {

				type_info type_info_from_string(std::string_view type) {

				    static thread_local const std::unordered_map<std::string_view, type_info> type_infos = {

				        {"S", {alternator_type::S, utf8_type}},

				        {"B", {alternator_type::B, bytes_type}},

				        {"BOOL", {alternator_type::BOOL, boolean_type}},

				@@ -64,7 +65,7 @@ struct from_json_visitor {

				    void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), from_json_visitor{v, bo}); };

				    void operator()(const string_type_impl& t) {

				        bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));

				        bo.write(t.from_string(rjson::to_string_view(v)));

				    }

				    void operator()(const bytes_type_impl& t) const {

				        bo.write(base64_decode(v));

				@@ -73,23 +74,27 @@ struct from_json_visitor {

				        bo.write(boolean_type->decompose(v.GetBool()));

				    }

				    void operator()(const decimal_type_impl& t) const {

				        bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));

				        try {

				            bo.write(t.from_string(rjson::to_string_view(v)));

				        } catch (const marshal_exception& e) {

				            throw api_error::validation(format("The parameter cannot be converted to a numeric value: {}", v));

				        }

				    }

				    // default

				    void operator()(const abstract_type& t) const {

				        bo.write(t.from_json_object(Json::Value(rjson::print(v)), cql_serialization_format::internal()));

				        bo.write(from_json_object(t, v, cql_serialization_format::internal()));

				    }

				};

				bytes serialize_item(const rjson::value& item) {

				    if (item.IsNull() || item.MemberCount() != 1) {

				        throw api_error("ValidationException", format("An item can contain only one attribute definition: {}", item));

				        throw api_error::validation(format("An item can contain only one attribute definition: {}", item));

				    }

				    auto it = item.MemberBegin();

				    type_info type_info = type_info_from_string(it->name.GetString()); // JSON keys are guaranteed to be strings

				    type_info type_info = type_info_from_string(rjson::to_string_view(it->name)); // JSON keys are guaranteed to be strings

				    if (type_info.atype == alternator_type::NOT_SUPPORTED_YET) {

				        slogger.trace("Non-optimal serialization of type {}", it->name.GetString());

				        slogger.trace("Non-optimal serialization of type {}", it->name);

				        return bytes{int8_t(type_info.atype)} + to_bytes(rjson::print(item));

				    }

				@@ -107,7 +112,7 @@ struct to_json_visitor {

				    void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), to_json_visitor{deserialized, type_ident, bv}); };

				    void operator()(const decimal_type_impl& t) const {

				        auto s = decimal_type->to_json_string(bytes(bv));

				        auto s = to_json_string(*decimal_type, bytes(bv));

				        //FIXME(sarna): unnecessary copy

				        rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(s));

				    }

				@@ -120,14 +125,14 @@ struct to_json_visitor {

				    }

				    // default

				    void operator()(const abstract_type& t) const {

				        rjson::set_with_string_name(deserialized, type_ident, rjson::parse(t.to_string(bytes(bv))));

				        rjson::set_with_string_name(deserialized, type_ident, rjson::parse(to_json_string(t, bytes(bv))));

				    }

				};

				rjson::value deserialize_item(bytes_view bv) {

				    rjson::value deserialized(rapidjson::kObjectType);

				    if (bv.empty()) {

				        throw api_error("ValidationException", "Serialized value empty");

				        throw api_error::validation("Serialized value empty");

				    }

				    alternator_type atype = alternator_type(bv[0]);

				@@ -135,7 +140,7 @@ rjson::value deserialize_item(bytes_view bv) {

				    if (atype == alternator_type::NOT_SUPPORTED_YET) {

				        slogger.trace("Non-optimal deserialization of alternator type {}", int8_t(atype));

				        return rjson::parse_raw(reinterpret_cast<const char *>(bv.data()), bv.size());

				        return rjson::parse(std::string_view(reinterpret_cast<const char *>(bv.data()), bv.size()));

				    }

				    type_representation type_representation = represent_type(atype);

				    visit(*type_representation.dtype, to_json_visitor{deserialized, type_representation.ident, bv});

				@@ -152,34 +157,48 @@ std::string type_to_string(data_type type) {

				    };

				    auto it = types.find(type);

				    if (it == types.end()) {

				        throw std::runtime_error(format("Unknown type {}", type->name()));

				        // fall back to string, in order to be able to present

				        // internal Scylla types in a human-readable way

				        return "S";

				    }

				    return it->second;

				}

				bytes get_key_column_value(const rjson::value& item, const column_definition& column) {

				    std::string column_name = column.name_as_text();

				    std::string expected_type = type_to_string(column.type);

				    const rjson::value& key_typed_value = rjson::get(item, rjson::value::StringRefType(column_name.c_str()));

				    if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1) {

				        throw api_error("ValidationException",

				                format("Missing or invalid value object for key column {}: {}", column_name, item));

				    const rjson::value* key_typed_value = rjson::find(item, column_name);

				    if (!key_typed_value) {

				        throw api_error::validation(format("Key column {} not found", column_name));

				    }

				    return get_key_from_typed_value(key_typed_value, column, expected_type);

				    return get_key_from_typed_value(*key_typed_value, column);

				}

				bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type) {

				// Parses the JSON encoding for a key value, which is a map with a single

				// entry, whose key is the type (expected to match the key column's type)

				// and the value is the encoded value.

				bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column) {

				    if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1 ||

				            !key_typed_value.MemberBegin()->value.IsString()) {

				        throw api_error::validation(

				                format("Malformed value object for key column {}: {}",

				                        column.name_as_text(), key_typed_value));

				    }

				    auto it = key_typed_value.MemberBegin();

				    if (it->name.GetString() != expected_type) {

				        throw api_error("ValidationException",

				    if (it->name != type_to_string(column.type)) {

				        throw api_error::validation(

				                format("Type mismatch: expected type {} for key column {}, got type {}",

				                        expected_type, column.name_as_text(), it->name.GetString()));

				                        type_to_string(column.type), column.name_as_text(), it->name));

				    }

				    std::string_view value_view = rjson::to_string_view(it->value);

				    if (value_view.empty()) {

				        throw api_error::validation(

				                format("The AttributeValue for a key attribute cannot contain an empty string value. Key: {}", column.name_as_text()));

				    }

				    if (column.type == bytes_type) {

				        return base64_decode(it->value);

				    } else {

				        return column.type->from_string(it->value.GetString());

				        return column.type->from_string(rjson::to_string_view(it->value));

				    }

				}

				@@ -194,11 +213,14 @@ rjson::value json_key_column_value(bytes_view cell, const column_definition& col

				        // FIXME: use specialized Alternator number type, not the more

				        // general "decimal_type". A dedicated type can be more efficient

				        // in storage space and in parsing speed.

				        auto s = decimal_type->to_json_string(bytes(cell));

				        auto s = to_json_string(*decimal_type, bytes(cell));

				        return rjson::from_string(s);

				    } else {

				        // We shouldn't get here, we shouldn't see such key columns.

				        throw std::runtime_error(format("Unexpected key type: {}", column.type->name()));

				        // Support for arbitrary key types is useful for parsing values of virtual tables,

				        // which can involve any type supported by Scylla.

				        // In order to guarantee that the returned type is parsable by alternator clients,

				        // they are represented simply as strings.

				        return rjson::from_string(column.type->to_string(bytes(cell)));

				    }

				}

				@@ -229,20 +251,125 @@ clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {

				big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {

				    if (!v.IsObject() || v.MemberCount() != 1) {

				        throw api_error("ValidationException", format("{}: invalid number object", diagnostic));

				        throw api_error::validation(format("{}: invalid number object", diagnostic));

				    }

				    auto it = v.MemberBegin();

				    if (it->name != "N") {

				        throw api_error("ValidationException", format("{}: expected number, found type '{}'", diagnostic, it->name));

				        throw api_error::validation(format("{}: expected number, found type '{}'", diagnostic, it->name));

				    }

				    if (it->value.IsNumber()) {

				         // FIXME(sarna): should use big_decimal constructor with numeric values directly:

				        return big_decimal(rjson::print(it->value));

				    try {

				        if (it->value.IsNumber()) {

				             // FIXME(sarna): should use big_decimal constructor with numeric values directly:

				            return big_decimal(rjson::print(it->value));

				        }

				        if (!it->value.IsString()) {

				            throw api_error::validation(format("{}: improperly formatted number constant", diagnostic));

				        }

				        return big_decimal(rjson::to_string_view(it->value));

				    } catch (const marshal_exception& e) {

				        throw api_error::validation(format("The parameter cannot be converted to a numeric value: {}", it->value));

				    }

				    if (!it->value.IsString()) {

				        throw api_error("ValidationException", format("{}: improperly formatted number constant", diagnostic));

				}

				const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {

				    if (!v.IsObject() || v.MemberCount() != 1) {

				        return {"", nullptr};

				    }

				    return big_decimal(it->value.GetString());

				    auto it = v.MemberBegin();

				    const std::string it_key = it->name.GetString();

				    if (it_key != "SS" && it_key != "BS" && it_key != "NS") {

				        return {"", nullptr};

				    }

				    return std::make_pair(it_key, &(it->value));

				}

				const rjson::value* unwrap_list(const rjson::value& v) {

				    if (!v.IsObject() || v.MemberCount() != 1) {

				        return nullptr;

				    }

				    auto it = v.MemberBegin();

				    if (it->name != std::string("L")) {

				        return nullptr;

				    }

				    return &(it->value);

				}

				// Take two JSON-encoded numeric values ({"N": "thenumber"}) and return the

				// sum, again as a JSON-encoded number.

				rjson::value number_add(const rjson::value& v1, const rjson::value& v2) {

				    auto n1 = unwrap_number(v1, "UpdateExpression");

				    auto n2 = unwrap_number(v2, "UpdateExpression");

				    rjson::value ret = rjson::empty_object();

				    std::string str_ret = std::string((n1 + n2).to_string());

				    rjson::set(ret, "N", rjson::from_string(str_ret));

				    return ret;

				}

				rjson::value number_subtract(const rjson::value& v1, const rjson::value& v2) {

				    auto n1 = unwrap_number(v1, "UpdateExpression");

				    auto n2 = unwrap_number(v2, "UpdateExpression");

				    rjson::value ret = rjson::empty_object();

				    std::string str_ret = std::string((n1 - n2).to_string());

				    rjson::set(ret, "N", rjson::from_string(str_ret));

				    return ret;

				}

				// Take two JSON-encoded set values (e.g. {"SS": [...the actual set]}) and

				// return the sum of both sets, again as a set value.

				rjson::value set_sum(const rjson::value& v1, const rjson::value& v2) {

				    auto [set1_type, set1] = unwrap_set(v1);

				    auto [set2_type, set2] = unwrap_set(v2);

				    if (set1_type != set2_type) {

				        throw api_error::validation(format("Mismatched set types: {} and {}", set1_type, set2_type));

				    }

				    if (!set1 || !set2) {

				        throw api_error::validation("UpdateExpression: ADD operation for sets must be given sets as arguments");

				    }

				    rjson::value sum = rjson::copy(*set1);

				    std::set<rjson::value, rjson::single_value_comp> set1_raw;

				    for (auto it = sum.Begin(); it != sum.End(); ++it) {

				        set1_raw.insert(rjson::copy(*it));

				    }

				    for (const auto& a : set2->GetArray()) {

				        if (!set1_raw.contains(a)) {

				            rjson::push_back(sum, rjson::copy(a));

				        }

				    }

				    rjson::value ret = rjson::empty_object();

				    rjson::set_with_string_name(ret, set1_type, std::move(sum));

				    return ret;

				}

				// Take two JSON-encoded set values (e.g. {"SS": [...the actual list]}) and

				// return the difference of s1 - s2, again as a set value.

				// DynamoDB does not allow empty sets, so if resulting set is empty, return

				// an unset optional instead.

				std::optional<rjson::value> set_diff(const rjson::value& v1, const rjson::value& v2) {

				    auto [set1_type, set1] = unwrap_set(v1);

				    auto [set2_type, set2] = unwrap_set(v2);

				    if (set1_type != set2_type) {

				        throw api_error::validation(format("Mismatched set types: {} and {}", set1_type, set2_type));

				    }

				    if (!set1 || !set2) {

				        throw api_error::validation("UpdateExpression: DELETE operation can only be performed on a set");

				    }

				    std::set<rjson::value, rjson::single_value_comp> set1_raw;

				    for (auto it = set1->Begin(); it != set1->End(); ++it) {

				        set1_raw.insert(rjson::copy(*it));

				    }

				    for (const auto& a : set2->GetArray()) {

				        set1_raw.erase(a);

				    }

				    if (set1_raw.empty()) {

				        return std::nullopt;

				    }

				    rjson::value ret = rjson::empty_object();

				    rjson::set_with_string_name(ret, set1_type, rjson::empty_array());

				    rjson::value& result_set = ret[set1_type];

				    for (const auto& a : set1_raw) {

				        rjson::push_back(result_set, rjson::copy(a));

				    }

				    return ret;

				}

				}

									
										31

alternator/serialization.hh
									
												View File
												
				@@ -24,9 +24,9 @@

				#include <string>

				#include <string_view>

				#include "types.hh"

				#include "schema.hh"

				#include "schema_fwd.hh"

				#include "keys.hh"

				#include "rjson.hh"

				#include "utils/rjson.hh"

				#include "utils/big_decimal.hh"

				namespace alternator {

				@@ -45,7 +45,7 @@ struct type_representation {

				    data_type dtype;

				};

				type_info type_info_from_string(std::string type);

				type_info type_info_from_string(std::string_view type);

				type_representation represent_type(alternator_type atype);

				bytes serialize_item(const rjson::value& item);

				@@ -54,7 +54,7 @@ rjson::value deserialize_item(bytes_view bv);

				std::string type_to_string(data_type type);

				bytes get_key_column_value(const rjson::value& item, const column_definition& column);

				bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type);

				bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column);

				rjson::value json_key_column_value(bytes_view cell, const column_definition& column);

				partition_key pk_from_json(const rjson::value& item, schema_ptr schema);

				@@ -63,4 +63,27 @@ clustering_key ck_from_json(const rjson::value& item, schema_ptr schema);

				// If v encodes a number (i.e., it is a {"N": [...]}, returns an object representing it.  Otherwise,

				// raises ValidationException with diagnostic.

				big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);

				// Check if a given JSON object encodes a set (i.e., it is a {"SS": [...]}, or "NS", "BS"

				// and returns set's type and a pointer to that set. If the object does not encode a set,

				// returned value is {"", nullptr}

				const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v);

				// Check if a given JSON object encodes a list (i.e., it is a {"L": [...]}

				// and returns a pointer to that list.

				const rjson::value* unwrap_list(const rjson::value& v);

				// Take two JSON-encoded numeric values ({"N": "thenumber"}) and return the

				// sum, again as a JSON-encoded number.

				rjson::value number_add(const rjson::value& v1, const rjson::value& v2);

				rjson::value number_subtract(const rjson::value& v1, const rjson::value& v2);

				// Take two JSON-encoded set values (e.g. {"SS": [...the actual set]}) and

				// return the sum of both sets, again as a set value.

				rjson::value set_sum(const rjson::value& v1, const rjson::value& v2);

				// Take two JSON-encoded set values (e.g. {"SS": [...the actual list]}) and

				// return the difference of s1 - s2, again as a set value.

				// DynamoDB does not allow empty sets, so if resulting set is empty, return

				// an unset optional instead.

				std::optional<rjson::value> set_diff(const rjson::value& v1, const rjson::value& v2);

				}

									
										388

alternator/server.cc
									
												View File
												
				@@ -23,12 +23,14 @@

				#include "log.hh"

				#include <seastar/http/function_handlers.hh>

				#include <seastar/json/json_elements.hh>

				#include <seastarx.hh>

				#include "seastarx.hh"

				#include "error.hh"

				#include "rjson.hh"

				#include "utils/rjson.hh"

				#include "auth.hh"

				#include <cctype>

				#include "cql3/query_processor.hh"

				#include "service/storage_service.hh"

				#include "utils/overloaded_functor.hh"

				static logging::logger slogger("alternator-server");

				@@ -65,68 +67,121 @@ inline std::vector<std::string_view> split(std::string_view text, char separator

				// Internal Server Error.

				class api_handler : public handler_base {

				public:

				    api_handler(const future_json_function& _handle) : _f_handle(

				         [_handle](std::unique_ptr<request> req, std::unique_ptr<reply> rep) {

				         return seastar::futurize_apply(_handle, std::move(req)).then_wrapped([rep = std::move(rep)](future<json::json_return_type> resf) mutable {

				    api_handler(const std::function<future<executor::request_return_type>(std::unique_ptr<request> req)>& _handle) : _f_handle(

				         [this, _handle](std::unique_ptr<request> req, std::unique_ptr<reply> rep) {

				         return seastar::futurize_invoke(_handle, std::move(req)).then_wrapped([this, rep = std::move(rep)](future<executor::request_return_type> resf) mutable {

				             if (resf.failed()) {

				                 // Exceptions of type api_error are wrapped as JSON and

				                 // returned to the client as expected. Other types of

				                 // exceptions are unexpected, and returned to the user

				                 // as an internal server error:

				                 api_error ret;

				                 try {

				                     resf.get();

				                 } catch (api_error &ae) {

				                     ret = ae;

				                     generate_error_reply(*rep, ae);

				                 } catch (rjson::error & re) {

				                     ret = api_error("ValidationException", re.what());

				                     generate_error_reply(*rep,

				                             api_error::validation(re.what()));

				                 } catch (...) {

				                     ret = api_error(

				                             "Internal Server Error",

				                             format("Internal server error: {}", std::current_exception()),

				                             reply::status_type::internal_server_error);

				                     generate_error_reply(*rep,

				                             api_error::internal(format("Internal server error: {}", std::current_exception())));

				                 }

				                 // FIXME: what is this version number?

				                 rep->_content += "{\"__type\":\"com.amazonaws.dynamodb.v20120810#" + ret._type + "\"," +

				                         "\"message\":\"" + ret._msg + "\"}";

				                 rep->_status = ret._http_code;

				                 slogger.trace("api_handler error case: {}", rep->_content);

				                 return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				             }

				             slogger.trace("api_handler success case");

				             auto res = resf.get0();

				             if (res._body_writer) {

				                 rep->write_body("json", std::move(res._body_writer));

				             } else {

				                 rep->_content += res._res;

				             }

				             std::visit(overloaded_functor {

				                 [&] (const json::json_return_type& json_return_value) {

				                     slogger.trace("api_handler success case");

				                     if (json_return_value._body_writer) {

				                         // Unfortunately, write_body() forces us to choose

				                         // from a fixed and irrelevant list of "mime-types"

				                         // at this point. But we'll override it with the

				                         // one (application/x-amz-json-1.0) below.

				                         rep->write_body("json", std::move(json_return_value._body_writer));

				                     } else {

				                         rep->_content += json_return_value._res;

				                     }

				                 },

				                 [&] (const api_error& err) {

				                     generate_error_reply(*rep, err);

				                 }

				             }, res);

				             return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				         });

				    }), _type("json") { }

				    }) { }

				    api_handler(const api_handler&) = default;

				    future<std::unique_ptr<reply>> handle(const sstring& path,

				            std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {

				        return _f_handle(std::move(req), std::move(rep)).then(

				                [this](std::unique_ptr<reply> rep) {

				                    rep->done(_type);

				                    rep->set_mime_type("application/x-amz-json-1.0");

				                    rep->done();

				                    return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				                });

				    }

				protected:

				    void generate_error_reply(reply& rep, const api_error& err) {

				        rep._content += "{\"__type\":\"com.amazonaws.dynamodb.v20120810#" + err._type + "\"," +

				                "\"message\":\"" + err._msg + "\"}";

				        rep._status = err._http_code;

				        slogger.trace("api_handler error case: {}", rep._content);

				    }

				    future_handler_function _f_handle;

				    sstring _type;

				};

				class health_handler : public handler_base {

				    virtual future<std::unique_ptr<reply>> handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {

				class gated_handler : public handler_base {

				    seastar::gate& _gate;

				public:

				    gated_handler(seastar::gate& gate) : _gate(gate) {}

				    virtual future<std::unique_ptr<reply>> do_handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) = 0;

				    virtual future<std::unique_ptr<reply>> handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) final override {

				        return with_gate(_gate, [this, &path, req = std::move(req), rep = std::move(rep)] () mutable {

				            return do_handle(path, std::move(req), std::move(rep));

				        });

				    }

				};

				class health_handler : public gated_handler {

				public:

				    health_handler(seastar::gate& pending_requests) : gated_handler(pending_requests) {}

				protected:

				    virtual future<std::unique_ptr<reply>> do_handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {

				        rep->set_status(reply::status_type::ok);

				        rep->write_body("txt", format("healthy: {}", req->get_header("Host")));

				        return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				    }

				};

				class local_nodelist_handler : public gated_handler {

				public:

				    local_nodelist_handler(seastar::gate& pending_requests) : gated_handler(pending_requests) {}

				protected:

				    virtual future<std::unique_ptr<reply>> do_handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {

				        rjson::value results = rjson::empty_array();

				        // It's very easy to get a list of all live nodes on the cluster,

				        // using gms::get_local_gossiper().get_live_members(). But getting

				        // just the list of live nodes in this DC needs more elaborate code:

				        sstring local_dc = locator::i_endpoint_snitch::get_local_snitch_ptr()->get_datacenter(

				                utils::fb_utilities::get_broadcast_address());

				        std::unordered_set<gms::inet_address> local_dc_nodes =

				                service::get_local_storage_service().get_token_metadata().

				                get_topology().get_datacenter_endpoints().at(local_dc);

				        for (auto& ip : local_dc_nodes) {

				            if (gms::get_local_gossiper().is_alive(ip)) {

				                rjson::push_back(results, rjson::from_string(ip.to_sstring()));

				            }

				        }

				        rep->set_status(reply::status_type::ok);

				        rep->set_content_type("json");

				        rep->_content = rjson::print(results);

				        return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				    }

				};

				future<> server::verify_signature(const request& req) {

				    if (!_enforce_authorization) {

				        slogger.debug("Skipping authorization");

				@@ -134,31 +189,38 @@ future<> server::verify_signature(const request& req) {

				    }

				    auto host_it = req._headers.find("Host");

				    if (host_it == req._headers.end()) {

				        throw api_error("InvalidSignatureException", "Host header is mandatory for signature verification");

				        throw api_error::invalid_signature("Host header is mandatory for signature verification");

				    }

				    auto authorization_it = req._headers.find("Authorization");

				    if (host_it == req._headers.end()) {

				        throw api_error("InvalidSignatureException", "Authorization header is mandatory for signature verification");

				    if (authorization_it == req._headers.end()) {

				        throw api_error::missing_authentication_token("Authorization header is mandatory for signature verification");

				    }

				    std::string host = host_it->second;

				    std::vector<std::string_view> credentials_raw = split(authorization_it->second, ' ');

				    std::string_view authorization_header = authorization_it->second;

				    auto pos = authorization_header.find_first_of(' ');

				    if (pos == std::string_view::npos || authorization_header.substr(0, pos) != "AWS4-HMAC-SHA256") {

				        throw api_error::invalid_signature(format("Authorization header must use AWS4-HMAC-SHA256 algorithm: {}", authorization_header));

				    }

				    authorization_header.remove_prefix(pos+1);

				    std::string credential;

				    std::string user_signature;

				    std::string signed_headers_str;

				    std::vector<std::string_view> signed_headers;

				    for (std::string_view entry : credentials_raw) {

				    do {

				        // Either one of a comma or space can mark the end of an entry

				        pos = authorization_header.find_first_of(" ,");

				        std::string_view entry = authorization_header.substr(0, pos);

				        if (pos != std::string_view::npos) {

				            authorization_header.remove_prefix(pos + 1);

				        }

				        if (entry.empty()) {

				            continue;

				        }

				        std::vector<std::string_view> entry_split = split(entry, '=');

				        if (entry_split.size() != 2) {

				            if (entry != "AWS4-HMAC-SHA256") {

				                throw api_error("InvalidSignatureException", format("Only AWS4-HMAC-SHA256 algorithm is supported. Found: {}", entry));

				            }

				            continue;

				        }

				        std::string_view auth_value = entry_split[1];

				        // Commas appear as an additional (quite redundant) delimiter

				        if (auth_value.back() == ',') {

				            auth_value.remove_suffix(1);

				        }

				        if (entry_split[0] == "Credential") {

				            credential = std::string(auth_value);

				        } else if (entry_split[0] == "Signature") {

				@@ -168,10 +230,11 @@ future<> server::verify_signature(const request& req) {

				            signed_headers = split(auth_value, ';');

				            std::sort(signed_headers.begin(), signed_headers.end());

				        }

				    }

				    } while (pos != std::string_view::npos);

				    std::vector<std::string_view> credential_split = split(credential, '/');

				    if (credential_split.size() != 5) {

				        throw api_error("ValidationException", format("Incorrect credential information format: {}", credential));

				        throw api_error::validation(format("Incorrect credential information format: {}", credential));

				    }

				    std::string user(credential_split[0]);

				    std::string datestamp(credential_split[1]);

				@@ -192,8 +255,8 @@ future<> server::verify_signature(const request& req) {

				        }

				    }

				    auto cache_getter = [] (std::string username) {

				        return get_key_from_roles(cql3::get_query_processor().local(), std::move(username));

				    auto cache_getter = [&qp = _qp] (std::string username) {

				        return get_key_from_roles(qp, std::move(username));

				    };

				    return _key_cache.get_ptr(user, cache_getter).then([this, &req,

				                                                    user = std::move(user),

				@@ -209,31 +272,46 @@ future<> server::verify_signature(const request& req) {

				        if (signature != std::string_view(user_signature)) {

				            _key_cache.remove(user);

				            throw api_error("UnrecognizedClientException", "The security token included in the request is invalid.");

				            throw api_error::unrecognized_client("The security token included in the request is invalid.");

				        }

				    });

				}

				future<json::json_return_type> server::handle_api_request(std::unique_ptr<request>&& req) {

				future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request>&& req) {

				    _executor._stats.total_operations++;

				    sstring target = req->get_header(TARGET);

				    std::vector<std::string_view> split_target = split(target, '.');

				    //NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)

				    std::string op = split_target.empty() ? std::string() : std::string(split_target.back());

				    slogger.trace("Request: {} {}", op, req->content);

				    slogger.trace("Request: {} {} {}", op, req->content, req->_headers);

				    return verify_signature(*req).then([this, op, req = std::move(req)] () mutable {

				        auto callback_it = _callbacks.find(op);

				        if (callback_it == _callbacks.end()) {

				            _executor.local()._stats.unsupported_operations++;

				            throw api_error("UnknownOperationException",

				                    format("Unsupported operation {}", op));

				            _executor._stats.unsupported_operations++;

				            throw api_error::unknown_operation(format("Unsupported operation {}", op));

				        }

				        //FIXME: Client state can provide more context, e.g. client's endpoint address

				        // We use unique_ptr because client_state cannot be moved or copied

				        return do_with(std::make_unique<executor::client_state>(executor::client_state::internal_tag()), [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (std::unique_ptr<executor::client_state>& client_state) mutable {

				            client_state->set_raw_keyspace(executor::KEYSPACE_NAME);

				            executor::maybe_trace_query(*client_state, op, req->content);

				            tracing::trace(client_state->get_trace_state(), op);

				            return callback_it->second(_executor.local(), *client_state, std::move(req));

				        return with_gate(_pending_requests, [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] () mutable {

				            //FIXME: Client state can provide more context, e.g. client's endpoint address

				            // We use unique_ptr because client_state cannot be moved or copied

				            return do_with(std::make_unique<executor::client_state>(executor::client_state::internal_tag()),

				                    [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (std::unique_ptr<executor::client_state>& client_state) mutable {

				                tracing::trace_state_ptr trace_state = executor::maybe_trace_query(*client_state, op, req->content);

				                tracing::trace(trace_state, op);

				                // JSON parsing can allocate up to roughly 2x the size of the raw document, + a couple of bytes for maintenance.

				                // FIXME: by this time, the whole HTTP request was already read, so some memory is already occupied.

				                // Once HTTP allows working on streams, we should grab the permit *before* reading the HTTP payload.

				                size_t mem_estimate = req->content.size() * 3 + 8000;

				                auto units_fut = get_units(*_memory_limiter, mem_estimate);

				                if (_memory_limiter->waiters()) {

				                    ++_executor._stats.requests_blocked_memory;

				                }

				                return units_fut.then([this, callback_it = std::move(callback_it), &client_state, trace_state, req = std::move(req)] (semaphore_units<> units) mutable {

				                    return _json_parser.parse(req->content).then([this, callback_it = std::move(callback_it), &client_state, trace_state,

				                            units = std::move(units), req = std::move(req)] (rjson::value json_request) mutable {

				                        return callback_it->second(_executor, *client_state, trace_state, make_service_permit(std::move(units)), std::move(json_request), std::move(req)).finally([trace_state] {});

				                    });

				                });

				            });

				        });

				    });

				}

				@@ -243,35 +321,104 @@ void server::set_routes(routes& r) {

				        return handle_api_request(std::move(req));

				    });

				    r.add(operation_type::POST, url("/"), req_handler);

				    r.add(operation_type::GET, url("/"), new health_handler);

				    r.put(operation_type::POST, "/", req_handler);

				    r.put(operation_type::GET, "/", new health_handler(_pending_requests));

				    // The "/localnodes" request is a new Alternator feature, not supported by

				    // DynamoDB and not required for DynamoDB compatibility. It allows a

				    // client to enquire - using a trivial HTTP request without requiring

				    // authentication - the list of all live nodes in the same data center of

				    // the Alternator cluster. The client can use this list to balance its

				    // request load to all the nodes in the same geographical region.

				    // Note that this API exposes - openly without authentication - the

				    // information on the cluster's members inside one data center. We do not

				    // consider this to be a security risk, because an attacker can already

				    // scan an entire subnet for nodes responding to the health request,

				    // or even just scan for open ports.

				    r.put(operation_type::GET, "/localnodes", new local_nodelist_handler(_pending_requests));

				}

				//FIXME: A way to immediately invalidate the cache should be considered,

				// e.g. when the system table which stores the keys is changed.

				// For now, this propagation may take up to 1 minute.

				server::server(seastar::sharded<executor>& e)

				        : _executor(e), _key_cache(1024, 1min, slogger), _enforce_authorization(false)

				server::server(executor& exec, cql3::query_processor& qp)

				        : _http_server("http-alternator")

				        , _https_server("https-alternator")

				        , _executor(exec)

				        , _qp(qp)

				        , _key_cache(1024, 1min, slogger)

				        , _enforce_authorization(false)

				        , _enabled_servers{}

				        , _pending_requests{}

				      , _callbacks{

				        {"CreateTable", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) {

				            return e.maybe_create_keyspace().then([&e, &client_state, req = std::move(req)] { return e.create_table(client_state, req->content); }); }

				        },

				        {"DescribeTable", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.describe_table(client_state, req->content); }},

				        {"DeleteTable", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.delete_table(client_state, req->content); }},

				        {"PutItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.put_item(client_state, req->content); }},

				        {"UpdateItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.update_item(client_state, req->content); }},

				        {"GetItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.get_item(client_state, req->content); }},

				        {"DeleteItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.delete_item(client_state, req->content); }},

				        {"ListTables", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.list_tables(client_state, req->content); }},

				        {"Scan", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.scan(client_state, req->content); }},

				        {"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.describe_endpoints(client_state, req->content, req->get_header("Host")); }},

				        {"BatchWriteItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.batch_write_item(client_state, req->content); }},

				        {"BatchGetItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.batch_get_item(client_state, req->content); }},

				        {"Query", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.query(client_state, req->content); }},

				        {"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.create_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.describe_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.delete_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"UpdateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.update_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.put_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.update_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.delete_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.list_tables(client_state, std::move(permit), std::move(json_request));

				        }},

				        {"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.scan(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.describe_endpoints(client_state, std::move(permit), std::move(json_request), req->get_header("Host"));

				        }},

				        {"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.batch_write_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.batch_get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.query(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				        {"TagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.tag_resource(client_state, std::move(permit), std::move(json_request));

				        }},

				        {"UntagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.untag_resource(client_state, std::move(permit), std::move(json_request));

				        }},

				        {"ListTagsOfResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.list_tags_of_resource(client_state, std::move(permit), std::move(json_request));

				        }},

				        {"ListStreams", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.list_streams(client_state, std::move(permit), std::move(json_request));

				        }},

				        {"DescribeStream", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.describe_stream(client_state, std::move(permit), std::move(json_request));

				        }},

				        {"GetShardIterator", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.get_shard_iterator(client_state, std::move(permit), std::move(json_request));

				        }},

				        {"GetRecords", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.get_records(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				        }},

				    } {

				}

				future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization) {

				future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,

				        bool enforce_authorization, semaphore* memory_limiter) {

				    _memory_limiter = memory_limiter;

				    _enforce_authorization = enforce_authorization;

				    if (!port && !https_port) {

				        return make_exception_future<>(std::runtime_error("Either regular port or TLS port"

				@@ -279,33 +426,86 @@ future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std:

				    }

				    return seastar::async([this, addr, port, https_port, creds] {

				        try {

				            _executor.invoke_on_all([] (executor& e) {

				                return e.start();

				            }).get();

				            _executor.start().get();

				            if (port) {

				                _control.start().get();

				                _control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();

				                _control.listen(socket_address{addr, *port}).get();

				                slogger.info("Alternator HTTP server listening on {} port {}", addr, *port);

				                set_routes(_http_server._routes);

				                _http_server.set_content_length_limit(server::content_length_limit);

				                _http_server.listen(socket_address{addr, *port}).get();

				                _enabled_servers.push_back(std::ref(_http_server));

				            }

				            if (https_port) {

				                _https_control.start().get();

				                _https_control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();

				                _https_control.server().invoke_on_all([creds] (http_server& serv) {

				                    return serv.set_tls_credentials(creds->build_server_credentials());

				                }).get();

				                _https_control.listen(socket_address{addr, *https_port}).get();

				                slogger.info("Alternator HTTPS server listening on {} port {}", addr, *https_port);

				                set_routes(_https_server._routes);

				                _https_server.set_content_length_limit(server::content_length_limit);

				                _https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {

				                    if (ep) {

				                        slogger.warn("Exception loading {}: {}", files, ep);

				                    } else {

				                        slogger.info("Reloaded {}", files);

				                    }

				                }).get0());

				                _https_server.listen(socket_address{addr, *https_port}).get();

				                _enabled_servers.push_back(std::ref(_https_server));

				            }

				        } catch (...) {

				            slogger.warn("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",

				            slogger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",

				                    addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF", std::current_exception());

				            throw;

				            std::throw_with_nested(std::runtime_error(

				                    format("Failed to set up Alternator HTTP server on {} port {}, TLS port {}",

				                            addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF")));

				        }

				    });

				}

				future<> server::stop() {

				    return parallel_for_each(_enabled_servers, [] (http_server& server) {

				        return server.stop();

				    }).then([this] {

				        return _pending_requests.close();

				    }).then([this] {

				        return _json_parser.stop();

				    });

				}

				server::json_parser::json_parser() : _run_parse_json_thread(async([this] {

				        while (true) {

				            _document_waiting.wait().get();

				            if (_as.abort_requested()) {

				                return;

				            }

				            try {

				                _parsed_document = rjson::parse_yieldable(_raw_document);

				                _current_exception = nullptr;

				            } catch (...) {

				                _current_exception = std::current_exception();

				            }

				            _document_parsed.signal();

				        }

				    })) {

				}

				future<rjson::value> server::json_parser::parse(std::string_view content) {

				    if (content.size() < yieldable_parsing_threshold) {

				        return make_ready_future<rjson::value>(rjson::parse(content));

				    }

				    return with_semaphore(_parsing_sem, 1, [this, content] {

				        _raw_document = content;

				        _document_waiting.signal();

				        return _document_parsed.wait().then([this] {

				            if (_current_exception) {

				                return make_exception_future<rjson::value>(_current_exception);

				            }

				            return make_ready_future<rjson::value>(std::move(_parsed_document));

				        });

				    });

				}

				future<> server::json_parser::stop() {

				    _as.request_abort();

				    _document_waiting.signal();

				    _document_parsed.broken();

				    return std::move(_run_parse_json_thread);

				}

				}

									
										48

alternator/server.hh
									
												View File
												
				@@ -26,28 +26,58 @@

				#include <seastar/http/httpd.hh>

				#include <seastar/net/tls.hh>

				#include <optional>

				#include <alternator/auth.hh>

				#include "alternator/auth.hh"

				#include "utils/small_vector.hh"

				#include <seastar/core/units.hh>

				namespace alternator {

				class server {

				    using alternator_callback = std::function<future<json::json_return_type>(executor&, executor::client_state&, std::unique_ptr<request>)>;

				    static constexpr size_t content_length_limit = 16*MB;

				    using alternator_callback = std::function<future<executor::request_return_type>(executor&, executor::client_state&,

				            tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<request>)>;

				    using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;

				    seastar::httpd::http_server_control _control;

				    seastar::httpd::http_server_control _https_control;

				    seastar::sharded<executor>& _executor;

				    http_server _http_server;

				    http_server _https_server;

				    executor& _executor;

				    cql3::query_processor& _qp;

				    key_cache _key_cache;

				    bool _enforce_authorization;

				    utils::small_vector<std::reference_wrapper<seastar::httpd::http_server>, 2> _enabled_servers;

				    gate _pending_requests;

				    alternator_callbacks_map _callbacks;

				public:

				    server(seastar::sharded<executor>& executor);

				    seastar::future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization);

				    semaphore* _memory_limiter;

				    class json_parser {

				        static constexpr size_t yieldable_parsing_threshold = 16*KB;

				        std::string_view _raw_document;

				        rjson::value _parsed_document;

				        std::exception_ptr _current_exception;

				        semaphore _parsing_sem{1};

				        condition_variable _document_waiting;

				        condition_variable _document_parsed;

				        abort_source _as;

				        future<> _run_parse_json_thread;

				    public:

				        json_parser();

				        future<rjson::value> parse(std::string_view content);

				        future<> stop();

				    };

				    json_parser _json_parser;

				public:

				    server(executor& executor, cql3::query_processor& qp);

				    future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,

				            bool enforce_authorization, semaphore* memory_limiter);

				    future<> stop();

				private:

				    void set_routes(seastar::httpd::routes& r);

				    future<> verify_signature(const seastar::httpd::request& r);

				    future<json::json_return_type> handle_api_request(std::unique_ptr<request>&& req);

				    future<executor::request_return_type> handle_api_request(std::unique_ptr<request>&& req);

				};

				}

									
										16

alternator/stats.cc
									
												View File
												
				@@ -20,7 +20,7 @@

				 */

				#include "stats.hh"

				#include "utils/histogram_metrics_helper.hh"

				#include <seastar/core/metrics.hh>

				namespace alternator {

				@@ -37,7 +37,8 @@ stats::stats() : api_operations{} {

				                        seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),

				#define OPERATION_LATENCY(name, CamelCaseName) \

				                seastar::metrics::make_histogram("op_latency", \

				                        seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return api_operations.name.get_histogram(1,20);}),

				                        seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name);}),

				            OPERATION(batch_get_item, "BatchGetItem")

				            OPERATION(batch_write_item, "BatchWriteItem")

				            OPERATION(create_backup, "CreateBackup")

				            OPERATION(create_global_table, "CreateGlobalTable")

				@@ -77,6 +78,11 @@ stats::stats() : api_operations{} {

				            OPERATION_LATENCY(get_item_latency, "GetItem")

				            OPERATION_LATENCY(delete_item_latency, "DeleteItem")

				            OPERATION_LATENCY(update_item_latency, "UpdateItem")

				            OPERATION(list_streams, "ListStreams")

				            OPERATION(describe_stream, "DescribeStream")

				            OPERATION(get_shard_iterator, "GetShardIterator")

				            OPERATION(get_records, "GetRecords")

				            OPERATION_LATENCY(get_records_latency, "GetRecords")

				    });

				    _metrics.add_group("alternator", {

				            seastar::metrics::make_total_operations("unsupported_operations", unsupported_operations,

				@@ -85,6 +91,12 @@ stats::stats() : api_operations{} {

				                    seastar::metrics::description("number of total operations via Alternator API")),

				            seastar::metrics::make_total_operations("reads_before_write", reads_before_write,

				                    seastar::metrics::description("number of performed read-before-write operations")),

				            seastar::metrics::make_total_operations("write_using_lwt", write_using_lwt,

				                    seastar::metrics::description("number of writes that used LWT")),

				            seastar::metrics::make_total_operations("shard_bounce_for_lwt", shard_bounce_for_lwt,

				                    seastar::metrics::description("number writes that had to be bounced from this shard because of LWT requirements")),

				            seastar::metrics::make_total_operations("requests_blocked_memory", requests_blocked_memory,

				                    seastar::metrics::description("Counts a number of requests blocked due to memory pressure.")),

				            seastar::metrics::make_total_operations("filtered_rows_read_total", cql_stats.filtered_rows_read_total,

				                    seastar::metrics::description("number of rows read during filtering operations")),

				            seastar::metrics::make_total_operations("filtered_rows_matched_total", cql_stats.filtered_rows_matched_total,

									
										16

alternator/stats.hh
									
												View File
												
				@@ -74,16 +74,24 @@ public:

				        uint64_t update_item = 0;

				        uint64_t update_table = 0;

				        uint64_t update_time_to_live = 0;

				        uint64_t list_streams = 0;

				        uint64_t describe_stream = 0;

				        uint64_t get_shard_iterator = 0;

				        uint64_t get_records = 0;

				        utils::estimated_histogram put_item_latency;

				        utils::estimated_histogram get_item_latency;

				        utils::estimated_histogram delete_item_latency;

				        utils::estimated_histogram update_item_latency;

				        utils::time_estimated_histogram put_item_latency;

				        utils::time_estimated_histogram get_item_latency;

				        utils::time_estimated_histogram delete_item_latency;

				        utils::time_estimated_histogram update_item_latency;

				        utils::time_estimated_histogram get_records_latency;

				    } api_operations;

				    // Miscellaneous event counters

				    uint64_t total_operations = 0;

				    uint64_t unsupported_operations = 0;

				    uint64_t reads_before_write = 0;

				    uint64_t write_using_lwt = 0;

				    uint64_t shard_bounce_for_lwt = 0;

				    uint64_t requests_blocked_memory = 0;

				    // CQL-derived stats

				    cql3::cql_stats cql_stats;

				private:

1116

alternator/streams.cc Normal file

View File

File diff suppressed because it is too large Load Diff

									
										53

alternator/tags_extension.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,53 @@

				/*

				 * Copyright 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU Affero General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include "serializer.hh"

				#include "schema.hh"

				#include "db/extensions.hh"

				namespace alternator {

				class tags_extension : public schema_extension {

				public:

				    static constexpr auto NAME = "scylla_tags";

				    tags_extension() = default;

				    explicit tags_extension(const std::map<sstring, sstring>& tags) : _tags(std::move(tags)) {}

				    explicit tags_extension(bytes b) : _tags(tags_extension::deserialize(b)) {}

				    explicit tags_extension(const sstring& s) {

				        throw std::logic_error("Cannot create tags from string");

				    }

				    bytes serialize() const override {

				        return ser::serialize_to_buffer<bytes>(_tags);

				    }

				    static std::map<sstring, sstring> deserialize(bytes_view buffer) {

				        return ser::deserialize_from_buffer(buffer, boost::type<std::map<sstring, sstring>>());

				    }

				    const std::map<sstring, sstring>& tags() const {

				        return _tags;

				    }

				private:

				    std::map<sstring, sstring> _tags;

				};

				}

									
										30

api/api-doc/cache_service.json
									
												View File
												
				@@ -13,7 +13,7 @@

				            {

				               "method":"GET",

				               "summary":"get row cache save period in seconds",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_save_period_in_seconds",

				               "produces":[

				                  "application/json"

				@@ -35,7 +35,7 @@

				                     "description":"row cache save period in seconds",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -48,7 +48,7 @@

				            {

				               "method":"GET",

				               "summary":"get key cache save period in seconds",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_key_cache_save_period_in_seconds",

				               "produces":[

				                  "application/json"

				@@ -70,7 +70,7 @@

				                     "description":"key cache save period in seconds",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -83,7 +83,7 @@

				            {

				               "method":"GET",

				               "summary":"get counter cache save period in seconds",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_counter_cache_save_period_in_seconds",

				               "produces":[

				                  "application/json"

				@@ -105,7 +105,7 @@

				                     "description":"counter cache save period in seconds",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -118,7 +118,7 @@

				            {

				               "method":"GET",

				               "summary":"get row cache keys to save",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_keys_to_save",

				               "produces":[

				                  "application/json"

				@@ -140,7 +140,7 @@

				                     "description":"row cache keys to save",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -153,7 +153,7 @@

				            {

				               "method":"GET",

				               "summary":"get key cache keys to save",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_key_cache_keys_to_save",

				               "produces":[

				                  "application/json"

				@@ -175,7 +175,7 @@

				                     "description":"key cache keys to save",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -188,7 +188,7 @@

				            {

				               "method":"GET",

				               "summary":"get counter cache keys to save",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_counter_cache_keys_to_save",

				               "produces":[

				                  "application/json"

				@@ -210,7 +210,7 @@

				                     "description":"counter cache keys to save",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -448,7 +448,7 @@

				        {

				          "method": "GET",

				          "summary": "Get key entries",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_key_entries",

				          "produces": [

				            "application/json"

				@@ -568,7 +568,7 @@

				        {

				          "method": "GET",

				          "summary": "Get row entries",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_row_entries",

				          "produces": [

				            "application/json"

				@@ -688,7 +688,7 @@

				        {

				          "method": "GET",

				          "summary": "Get counter entries",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_counter_entries",

				          "produces": [

				            "application/json"

									
										128

api/api-doc/column_family.json
									
												View File
												
				@@ -70,7 +70,7 @@

				            {

				               "method":"POST",

				               "summary":"Force a major compaction of this column family",

				               "type":"string",

				               "type":"void",

				               "nickname":"force_major_compaction",

				               "produces":[

				                  "application/json"

				@@ -121,7 +121,7 @@

				                     "description":"The minimum number of sstables in queue before compaction kicks off",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -172,7 +172,7 @@

				                     "description":"The maximum number of sstables in queue before compaction kicks off",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -223,7 +223,7 @@

				                     "description":"The maximum number of sstables in queue before compaction kicks off",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  },

				                  {

				@@ -231,7 +231,7 @@

				                     "description":"The minimum number of sstables in queue before compaction kicks off",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -380,16 +380,54 @@

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"check if the auto compaction disabled",

				               "summary":"check if the auto_compaction property is enabled for a given table",

				               "type":"boolean",

				               "nickname":"is_auto_compaction_disabled",

				               "nickname":"get_auto_compaction",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"name",

				                     "description":"The column family name in keyspace:name format",

				                     "description":"The table name in keyspace:name format",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  }

				               ]

				            },

				            {

				               "method":"POST",

				               "summary":"Enable table auto compaction",

				               "type":"void",

				               "nickname":"enable_auto_compaction",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"name",

				                     "description":"The table name in keyspace:name format",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  }

				               ]

				            },

				            {

				               "method":"DELETE",

				               "summary":"Disable table auto compaction",

				               "type":"void",

				               "nickname":"disable_auto_compaction",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"name",

				                     "description":"The table name in keyspace:name format",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -544,7 +582,7 @@

				               "summary":"sstable count for each level. empty unless leveled compaction is used",

				               "type":"array",

				               "items":{

				                  "type":"int"

				                  "type": "long"

				               },

				               "nickname":"get_sstable_count_per_level",

				               "produces":[

				@@ -636,7 +674,7 @@

				                     "description":"Duration (in milliseconds) of monitoring operation",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  },

				                  {

				@@ -644,7 +682,7 @@

				                    "description":"number of the top partitions to list",

				                    "required":false,

				                    "allowMultiple":false,

				                    "type":"int",

				                    "type": "long",

				                    "paramType":"query"

				                 },

				                 {

				@@ -652,7 +690,7 @@

				                    "description":"capacity of stream summary: determines amount of resources used in query processing",

				                    "required":false,

				                    "allowMultiple":false,

				                    "type":"int",

				                    "type": "long",

				                    "paramType":"query"

				                 }

				              ]

				@@ -921,7 +959,7 @@

				            {

				               "method":"GET",

				               "summary":"Get memtable switch count",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_memtable_switch_count",

				               "produces":[

				                  "application/json"

				@@ -945,7 +983,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all memtable switch count",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_memtable_switch_count",

				               "produces":[

				                  "application/json"

				@@ -1082,7 +1120,7 @@

				            {

				               "method":"GET",

				               "summary":"Get read latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_read_latency",

				               "produces":[

				                  "application/json"

				@@ -1235,7 +1273,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all read latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_read_latency",

				               "produces":[

				                  "application/json"

				@@ -1251,7 +1289,7 @@

				            {

				               "method":"GET",

				               "summary":"Get range latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_range_latency",

				               "produces":[

				                  "application/json"

				@@ -1275,7 +1313,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all range latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_range_latency",

				               "produces":[

				                  "application/json"

				@@ -1291,7 +1329,7 @@

				            {

				               "method":"GET",

				               "summary":"Get write latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_write_latency",

				               "produces":[

				                  "application/json"

				@@ -1444,7 +1482,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all write latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_write_latency",

				               "produces":[

				                  "application/json"

				@@ -1460,7 +1498,7 @@

				            {

				               "method":"GET",

				               "summary":"Get pending flushes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_pending_flushes",

				               "produces":[

				                  "application/json"

				@@ -1484,7 +1522,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all pending flushes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_pending_flushes",

				               "produces":[

				                  "application/json"

				@@ -1500,7 +1538,7 @@

				            {

				               "method":"GET",

				               "summary":"Get pending compactions",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_pending_compactions",

				               "produces":[

				                  "application/json"

				@@ -1524,7 +1562,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all pending compactions",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_pending_compactions",

				               "produces":[

				                  "application/json"

				@@ -1540,7 +1578,7 @@

				            {

				               "method":"GET",

				               "summary":"Get live ss table count",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_live_ss_table_count",

				               "produces":[

				                  "application/json"

				@@ -1564,7 +1602,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all live ss table count",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_live_ss_table_count",

				               "produces":[

				                  "application/json"

				@@ -1580,7 +1618,7 @@

				            {

				               "method":"GET",

				               "summary":"Get live disk space used",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_live_disk_space_used",

				               "produces":[

				                  "application/json"

				@@ -1604,7 +1642,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all live disk space used",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_live_disk_space_used",

				               "produces":[

				                  "application/json"

				@@ -1620,7 +1658,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total disk space used",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_disk_space_used",

				               "produces":[

				                  "application/json"

				@@ -1644,7 +1682,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all total disk space used",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_total_disk_space_used",

				               "produces":[

				                  "application/json"

				@@ -2100,7 +2138,7 @@

				            {

				               "method":"GET",

				               "summary":"Get speculative retries",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_speculative_retries",

				               "produces":[

				                  "application/json"

				@@ -2124,7 +2162,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all speculative retries",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_speculative_retries",

				               "produces":[

				                  "application/json"

				@@ -2204,7 +2242,7 @@

				            {

				               "method":"GET",

				               "summary":"Get row cache hit out of range",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_hit_out_of_range",

				               "produces":[

				                  "application/json"

				@@ -2228,7 +2266,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all row cache hit out of range",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_row_cache_hit_out_of_range",

				               "produces":[

				                  "application/json"

				@@ -2244,7 +2282,7 @@

				            {

				               "method":"GET",

				               "summary":"Get row cache hit",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_hit",

				               "produces":[

				                  "application/json"

				@@ -2268,7 +2306,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all row cache hit",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_row_cache_hit",

				               "produces":[

				                  "application/json"

				@@ -2284,7 +2322,7 @@

				            {

				               "method":"GET",

				               "summary":"Get row cache miss",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_row_cache_miss",

				               "produces":[

				                  "application/json"

				@@ -2308,7 +2346,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all row cache miss",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_row_cache_miss",

				               "produces":[

				                  "application/json"

				@@ -2324,7 +2362,7 @@

				            {

				               "method":"GET",

				               "summary":"Get cas prepare",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_cas_prepare",

				               "produces":[

				                  "application/json"

				@@ -2348,7 +2386,7 @@

				            {

				               "method":"GET",

				               "summary":"Get cas propose",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_cas_propose",

				               "produces":[

				                  "application/json"

				@@ -2372,7 +2410,7 @@

				            {

				               "method":"GET",

				               "summary":"Get cas commit",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_cas_commit",

				               "produces":[

				                  "application/json"

				@@ -2887,6 +2925,10 @@

				         "id":"toppartitions_query_results",

				         "description":"nodetool toppartitions query results",

				         "properties":{

				            "read_cardinality":{

				               "type":"long",

				               "description":"Number of the unique operations in the sample set"

				            },

				            "read":{

				               "type":"array",

				               "items":{

				@@ -2894,6 +2936,10 @@

				               },

				               "description":"Read results"

				            },

				            "write_cardinality":{

				               "type":"long",

				               "description":"Number of the unique operations in the sample set"

				            },

				            "write":{

				               "type":"array",

				               "items":{

									
										6

api/api-doc/compaction_manager.json
									
												View File
												
				@@ -118,7 +118,7 @@

				        {

				          "method": "GET",

				          "summary": "Get pending tasks",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_pending_tasks",

				          "produces": [

				            "application/json"

				@@ -181,7 +181,7 @@

				        {

				          "method": "GET",

				          "summary": "Get bytes compacted",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_bytes_compacted",

				          "produces": [

				            "application/json"

				@@ -197,7 +197,7 @@

				         "description":"A row merged information",

				         "properties":{

				            "key":{

				               "type":"int",

				               "type": "long",

				               "description":"The number of sstable"

				            },

				            "value":{

									
										90

api/api-doc/error_injection.json
									
										Normal file
									
												View File
												
				@@ -0,0 +1,90 @@

				{

				   "apiVersion":"0.0.1",

				   "swaggerVersion":"1.2",

				   "basePath":"{{Protocol}}://{{Host}}",

				   "resourcePath":"/error_injection",

				   "produces":[

				      "application/json"

				   ],

				   "apis":[

				      {

				         "path":"/v2/error_injection/injection/{injection}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Activate an injection that triggers an error in code",

				               "type":"void",

				               "nickname":"enable_injection",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"injection",

				                     "description":"injection name, should correspond to an injection added in code",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"one_shot",

				                     "description":"boolean flag indicating whether the injection should be enabled to trigger only once",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            },

				            {

				               "method":"DELETE",

				               "summary":"Deactivate an injection previously activated by the API",

				               "type":"void",

				               "nickname":"disable_injection",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"injection",

				                     "description":"injection name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/v2/error_injection/injection",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"List all enabled injections on all shards, i.e. injections that will trigger an error in the code",

				               "type":"array",

				               "items":{

				                  "type":"string"

				               },

				               "nickname":"get_enabled_injections_on_all",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            },

				            {

				               "method":"DELETE",

				               "summary":"Deactivate all injections previously activated on all shards by the API",

				               "type":"void",

				               "nickname":"disable_on_all",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      }

				   ]

				}

									
										12

api/api-doc/failure_detector.json
									
												View File
												
				@@ -110,7 +110,7 @@

				            {

				               "method":"GET",

				               "summary":"Get count down endpoint",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_down_endpoint_count",

				               "produces":[

				                  "application/json"

				@@ -126,7 +126,7 @@

				            {

				               "method":"GET",

				               "summary":"Get count up endpoint",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_up_endpoint_count",

				               "produces":[

				                  "application/json"

				@@ -180,11 +180,11 @@

				                    "description": "The endpoint address"

				                },

				                "generation": {

				                    "type": "int",

				                    "type": "long",

				                    "description": "The heart beat generation"

				                },

				                "version": {

				                    "type": "int",

				                    "type": "long",

				                    "description": "The heart beat version"

				                },

				                "update_time": {

				@@ -209,7 +209,7 @@

				           "description": "Holds a version value for an application state",

				               "properties": {

				                "application_state": {

				                    "type": "int",

				                    "type": "long",

				                    "description": "The application state enum index"

				                },

				                "value": {

				@@ -217,7 +217,7 @@

				                    "description": "The version value"

				                },

				                "version": {

				                    "type": "int",

				                    "type": "long",

				                    "description": "The application state version"

				                }

				            }

									
										28

api/api-doc/gossiper.json
									
												View File
												
				@@ -75,7 +75,7 @@

				            {

				               "method":"GET",

				               "summary":"Returns files which are pending for archival attempt. Does NOT include failed archive attempts",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_current_generation_number",

				               "produces":[

				                  "application/json"

				@@ -99,7 +99,7 @@

				            {

				               "method":"GET",

				               "summary":"Get heart beat version for a node",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_current_heart_beat_version",

				               "produces":[

				                  "application/json"

				@@ -148,6 +148,30 @@

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/gossiper/force_remove_endpoint/{addr}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Force remove an endpoint from gossip",

				               "type":"void",

				               "nickname":"force_remove_endpoint",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"addr",

				                     "description":"The endpoint address",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  }

				               ]

				            }

				         ]

				      }

				   ]

				}

									
										4

api/api-doc/hinted_handoff.json
									
												View File
												
				@@ -99,7 +99,7 @@

				        {

				          "method": "GET",

				          "summary": "Get create hint count",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_create_hint_count",

				          "produces": [

				            "application/json"

				@@ -123,7 +123,7 @@

				        {

				          "method": "GET",

				          "summary": "Get not stored hints count",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_not_stored_hints_count",

				          "produces": [

				            "application/json"

									
										4

api/api-doc/messaging_service.json
									
												View File
												
				@@ -191,7 +191,7 @@

				            {

				               "method":"GET",

				               "summary":"Get the version number",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_version",

				               "produces":[

				                  "application/json"

				@@ -249,7 +249,7 @@

				                 "MIGRATION_REQUEST",

				                 "PREPARE_MESSAGE",

				                 "PREPARE_DONE_MESSAGE",

				                 "STREAM_MUTATION",

				                 "UNUSED__STREAM_MUTATION",

				                 "STREAM_MUTATION_DONE",

				                 "COMPLETE_MESSAGE",

				                 "REPAIR_CHECKSUM_RANGE",

									
										51

api/api-doc/storage_proxy.json
									
												View File
												
				@@ -68,7 +68,7 @@

				               "summary":"Get the hinted handoff enabled by dc",

				               "type":"array",

				               "items":{

				                  "type":"mapper_list"

				                  "type":"array"

				               },

				               "nickname":"get_hinted_handoff_enabled_by_dc",

				               "produces":[

				@@ -105,7 +105,7 @@

				            {

				               "method":"GET",

				               "summary":"Get the max hint window",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_max_hint_window",

				               "produces":[

				                  "application/json"

				@@ -128,7 +128,7 @@

				                     "description":"max hint window in ms",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -141,7 +141,7 @@

				            {

				               "method":"GET",

				               "summary":"Get max hints in progress",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_max_hints_in_progress",

				               "produces":[

				                  "application/json"

				@@ -164,7 +164,7 @@

				                     "description":"max hints in progress",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -177,7 +177,7 @@

				            {

				               "method":"GET",

				               "summary":"get hints in progress",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_hints_in_progress",

				               "produces":[

				                  "application/json"

				@@ -602,7 +602,7 @@

				        {

				          "method": "GET",

				          "summary": "Get cas write metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_cas_write_metrics_unfinished_commit",

				          "produces": [

				            "application/json"

				@@ -632,7 +632,7 @@

				        {

				          "method": "GET",

				          "summary": "Get cas write metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_cas_write_metrics_condition_not_met",

				          "produces": [

				            "application/json"

				@@ -641,13 +641,28 @@

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/cas_write/failed_read_round_optimization",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get cas write metrics",

				          "type": "long",

				          "nickname": "get_cas_write_metrics_failed_read_round_optimization",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    },

				    {

				      "path": "/storage_proxy/metrics/cas_read/unfinished_commit",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get cas read metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_cas_read_metrics_unfinished_commit",

				          "produces": [

				            "application/json"

				@@ -677,7 +692,7 @@

				        {

				          "method": "GET",

				          "summary": "Get read metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_read_metrics_timeouts",

				          "produces": [

				            "application/json"

				@@ -692,7 +707,7 @@

				        {

				          "method": "GET",

				          "summary": "Get read metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_read_metrics_unavailables",

				          "produces": [

				            "application/json"

				@@ -827,7 +842,7 @@

				        {

				          "method": "GET",

				          "summary": "Get range metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_range_metrics_timeouts",

				          "produces": [

				            "application/json"

				@@ -842,7 +857,7 @@

				        {

				          "method": "GET",

				          "summary": "Get range metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_range_metrics_unavailables",

				          "produces": [

				            "application/json"

				@@ -887,7 +902,7 @@

				        {

				          "method": "GET",

				          "summary": "Get write metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_write_metrics_timeouts",

				          "produces": [

				            "application/json"

				@@ -902,7 +917,7 @@

				        {

				          "method": "GET",

				          "summary": "Get write metrics",

				          "type": "int",

				          "type": "long",

				          "nickname": "get_write_metrics_unavailables",

				          "produces": [

				            "application/json"

				@@ -1008,7 +1023,7 @@

				            {

				               "method":"GET",

				               "summary":"Get read latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_read_latency",

				               "produces":[

				                  "application/json"

				@@ -1040,7 +1055,7 @@

				            {

				               "method":"GET",

				               "summary":"Get write latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_write_latency",

				               "produces":[

				                  "application/json"

				@@ -1072,7 +1087,7 @@

				            {

				               "method":"GET",

				               "summary":"Get range latency",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_range_latency",

				               "produces":[

				                  "application/json"

									
										116

api/api-doc/storage_service.json
									
												View File
												
				@@ -458,7 +458,7 @@

				            {

				               "method":"GET",

				               "summary":"Return the generation value for this node.",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_current_generation_number",

				               "produces":[

				                  "application/json"

				@@ -511,6 +511,21 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/cdc_streams_check_and_repair",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Checks that CDC streams reflect current cluster topology and regenerates them if not.",

				               "type":"void",

				               "nickname":"cdc_streams_check_and_repair",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/snapshots",

				         "operations":[

				@@ -582,7 +597,15 @@

				                  },

				                  {

				                     "name":"kn",

				                     "description":"Comma seperated keyspaces name to snapshot",

				                     "description":"Comma seperated keyspaces name that their snapshot will be deleted",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"an optional table name that its snapshot will be deleted",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -646,7 +669,7 @@

				            {

				               "method":"POST",

				               "summary":"Trigger a cleanup of keys on a single keyspace",

				               "type":"int",

				               "type": "long",

				               "nickname":"force_keyspace_cleanup",

				               "produces":[

				                  "application/json"

				@@ -678,7 +701,7 @@

				            {

				               "method":"GET",

				               "summary":"Scrub (deserialize + reserialize at the latest version, skipping bad rows if any) the given keyspace. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false",

				               "type":"int",

				               "type": "long",

				               "nickname":"scrub",

				               "produces":[

				                  "application/json"

				@@ -726,7 +749,7 @@

				            {

				               "method":"GET",

				               "summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first.",

				               "type":"int",

				               "type": "long",

				               "nickname":"upgrade_sstables",

				               "produces":[

				                  "application/json"

				@@ -800,7 +823,7 @@

				               "summary":"Return an array with the ids of the currently active repairs",

				               "type":"array",

				               "items":{

				                  "type":"int"

				                  "type": "long"

				               },

				               "nickname":"get_active_repair_async",

				               "produces":[

				@@ -810,13 +833,50 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/repair_status/",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Query the repair status and return when the repair is finished or timeout",

				               "type":"string",

				               "enum":[

				                  "RUNNING",

				                  "SUCCESSFUL",

				                  "FAILED"

				               ],

				               "nickname":"repair_await_completion",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"id",

				                     "description":"The repair ID to check for status",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type": "long",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"timeout",

				                     "description":"Seconds to wait before the query returns even if the repair is not finished. The value -1 or not providing this parameter means no timeout",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/repair_async/{keyspace}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Invoke repair asynchronously. You can track repair progress by using the get supplying id",

				               "type":"int",

				               "type": "long",

				               "nickname":"repair_async",

				               "produces":[

				                  "application/json"

				@@ -947,7 +1007,7 @@

				                     "description":"The repair ID to check for status",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -1277,18 +1337,18 @@

				                  },

				                  {

				                     "name":"dynamic_update_interval",

				                     "description":"integer, in ms (default 100)",

				                     "description":"interval in ms (default 100)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "type":"long",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"dynamic_reset_interval",

				                     "description":"integer, in ms (default 600,000)",

				                     "description":"interval in ms (default 600,000)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "type":"long",

				                     "paramType":"query"

				                  },

				                  {

				@@ -1493,7 +1553,7 @@

				                     "description":"Stream throughput",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -1501,7 +1561,7 @@

				            {

				               "method":"GET",

				               "summary":"Get stream throughput mb per sec",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_stream_throughput_mb_per_sec",

				               "produces":[

				                  "application/json"

				@@ -1517,7 +1577,7 @@

				            {

				               "method":"GET",

				               "summary":"get compaction throughput mb per sec",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_compaction_throughput_mb_per_sec",

				               "produces":[

				                  "application/json"

				@@ -1539,7 +1599,7 @@

				                     "description":"compaction throughput",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -1943,7 +2003,7 @@

				            {

				               "method":"GET",

				               "summary":"Returns the threshold for warning of queries with many tombstones",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_tombstone_warn_threshold",

				               "produces":[

				                  "application/json"

				@@ -1965,7 +2025,7 @@

				                     "description":"tombstone debug threshold",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -1978,7 +2038,7 @@

				            {

				               "method":"GET",

				               "summary":"",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_tombstone_failure_threshold",

				               "produces":[

				                  "application/json"

				@@ -2000,7 +2060,7 @@

				                     "description":"tombstone debug threshold",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -2013,7 +2073,7 @@

				            {

				               "method":"GET",

				               "summary":"Returns the threshold for rejecting queries due to a large batch size",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_batch_size_failure_threshold",

				               "produces":[

				                  "application/json"

				@@ -2035,7 +2095,7 @@

				                     "description":"batch size debug threshold",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -2059,7 +2119,7 @@

				                     "description":"throttle in kb",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"int",

				                     "type": "long",

				                     "paramType":"query"

				                  }

				               ]

				@@ -2072,7 +2132,7 @@

				            {

				               "method":"GET",

				               "summary":"Get load",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_metrics_load",

				               "produces":[

				                  "application/json"

				@@ -2088,7 +2148,7 @@

				            {

				               "method":"GET",

				               "summary":"Get exceptions",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_exceptions",

				               "produces":[

				                  "application/json"

				@@ -2104,7 +2164,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total hints in progress",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_hints_in_progress",

				               "produces":[

				                  "application/json"

				@@ -2120,7 +2180,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total hints",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_hints",

				               "produces":[

				                  "application/json"

				@@ -2408,7 +2468,7 @@

				            "version":{

				               "type":"string",

				               "enum":[

				                  "ka", "la", "mc"

				                  "ka", "la", "mc", "md"

				               ],

				               "description":"SSTable version"

				            },

									
										16

api/api-doc/stream_manager.json
									
												View File
												
				@@ -32,7 +32,7 @@

				            {

				               "method":"GET",

				               "summary":"Get number of active outbound streams",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_active_streams_outbound",

				               "produces":[

				                  "application/json"

				@@ -48,7 +48,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total incoming bytes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_incoming_bytes",

				               "produces":[

				                  "application/json"

				@@ -72,7 +72,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all total incoming bytes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_total_incoming_bytes",

				               "produces":[

				                  "application/json"

				@@ -88,7 +88,7 @@

				            {

				               "method":"GET",

				               "summary":"Get total outgoing bytes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_total_outgoing_bytes",

				               "produces":[

				                  "application/json"

				@@ -112,7 +112,7 @@

				            {

				               "method":"GET",

				               "summary":"Get all total outgoing bytes",

				               "type":"int",

				               "type": "long",

				               "nickname":"get_all_total_outgoing_bytes",

				               "produces":[

				                  "application/json"

				@@ -154,7 +154,7 @@

				               "description":"The peer"

				            },

				            "session_index":{

				               "type":"int",

				               "type": "long",

				               "description":"The session index"

				            },

				            "connecting":{

				@@ -211,7 +211,7 @@

				               "description":"The ID"

				            },

				            "files":{

				               "type":"int",

				               "type": "long",

				               "description":"Number of files to transfer. Can be 0 if nothing to transfer for some streaming request."

				            },

				            "total_size":{

				@@ -242,7 +242,7 @@

				               "description":"The peer address"

				            },

				            "session_index":{

				               "type":"int",

				               "type": "long",

				               "description":"The session index"

				            },

				            "file_name":{

									
										15

api/api-doc/system.json
									
												View File
												
				@@ -52,6 +52,21 @@

				            }

				         ]

				      },

				      {

				         "path":"/system/uptime_ms",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Get system uptime, in milliseconds",

				               "type":"long",

				               "nickname":"get_system_uptime",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/system/logger/{name}",

				         "operations":[

									
										53

api/api.cc
									
												View File
												
				@@ -36,6 +36,7 @@

				#include "endpoint_snitch.hh"

				#include "compaction_manager.hh"

				#include "hinted_handoff.hh"

				#include "error_injection.hh"

				#include <seastar/http/exception.hh>

				#include "stream_manager.hh"

				#include "system.hh"

				@@ -68,13 +69,19 @@ future<> set_server_init(http_context& ctx) {

				        rb->set_api_doc(r);

				        rb02->set_api_doc(r);

				        rb02->register_api_file(r, "swagger20_header");

				        set_config(rb02, ctx, r);

				        rb->register_function(r, "system",

				                "The system related API");

				        set_system(ctx, r);

				    });

				}

				future<> set_server_config(http_context& ctx) {

				    auto rb02 = std::make_shared < api_registry_builder20 > (ctx.api_doc, "/v2");

				    return ctx.http_server.set_routes([&ctx, rb02](routes& r) {

				        set_config(rb02, ctx, r);

				    });

				}

				static future<> register_api(http_context& ctx, const sstring& api_name,

				        const sstring api_desc,

				        std::function<void(http_context& ctx, routes& r)> f) {

				@@ -86,10 +93,42 @@ static future<> register_api(http_context& ctx, const sstring& api_name,

				    });

				}

				future<> set_transport_controller(http_context& ctx, cql_transport::controller& ctl) {

				    return ctx.http_server.set_routes([&ctx, &ctl] (routes& r) { set_transport_controller(ctx, r, ctl); });

				}

				future<> unset_transport_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_transport_controller(ctx, r); });

				}

				future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl) {

				    return ctx.http_server.set_routes([&ctx, &ctl] (routes& r) { set_rpc_controller(ctx, r, ctl); });

				}

				future<> unset_rpc_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_rpc_controller(ctx, r); });

				}

				future<> set_server_storage_service(http_context& ctx) {

				    return register_api(ctx, "storage_service", "The storage service API", set_storage_service);

				}

				future<> set_server_repair(http_context& ctx, sharded<netw::messaging_service>& ms) {

				    return ctx.http_server.set_routes([&ctx, &ms] (routes& r) { set_repair(ctx, r, ms); });

				}

				future<> unset_server_repair(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_repair(ctx, r); });

				}

				future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl) {

				    return ctx.http_server.set_routes([&ctx, &snap_ctl] (routes& r) { set_snapshot(ctx, r, snap_ctl); });

				}

				future<> unset_server_snapshot(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_snapshot(ctx, r); });

				}

				future<> set_server_snitch(http_context& ctx) {

				    return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", set_endpoint_snitch);

				}

				@@ -104,9 +143,14 @@ future<> set_server_load_sstable(http_context& ctx) {

				                "The column family API", set_column_family);

				}

				future<> set_server_messaging_service(http_context& ctx) {

				future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms) {

				    return register_api(ctx, "messaging_service",

				                "The messaging service API", set_messaging_service);

				                "The messaging service API", [&ms] (http_context& ctx, routes& r) {

				                    set_messaging_service(ctx, r, ms);

				                });

				}

				future<> unset_server_messaging_service(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_messaging_service(ctx, r); });

				}

				future<> set_server_storage_proxy(http_context& ctx) {

				@@ -153,6 +197,9 @@ future<> set_server_done(http_context& ctx) {

				        rb->register_function(r, "collectd",

				                "The collectd API");

				        set_collectd(ctx, r);

				        rb->register_function(r, "error_injection",

				                "The error injection API");

				        set_error_injection(ctx, r);

				    });

				}

									
										2

api/api.hh
									
												View File
												
				@@ -256,4 +256,6 @@ public:

				    operator T() const { return value; }

				};

				utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimated_histogram& val);

				}

									
										29

api/api_init.hh
									
												View File
												
				@@ -23,6 +23,13 @@

				#include "service/storage_proxy.hh"

				#include <seastar/http/httpd.hh>

				namespace service { class load_meter; }

				namespace locator { class shared_token_metadata; }

				namespace cql_transport { class controller; }

				class thrift_controller;

				namespace db { class snapshot_ctl; }

				namespace netw { class messaging_service; }

				namespace api {

				struct http_context {

				@@ -31,18 +38,34 @@ struct http_context {

				    httpd::http_server_control http_server;

				    distributed<database>& db;

				    distributed<service::storage_proxy>& sp;

				    service::load_meter& lmeter;

				    const sharded<locator::shared_token_metadata>& shared_token_metadata;

				    http_context(distributed<database>& _db,

				            distributed<service::storage_proxy>& _sp)

				            : db(_db), sp(_sp) {

				            distributed<service::storage_proxy>& _sp,

				            service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm)

				            : db(_db), sp(_sp), lmeter(_lm), shared_token_metadata(_stm) {

				    }

				    const locator::token_metadata& get_token_metadata();

				};

				future<> set_server_init(http_context& ctx);

				future<> set_server_config(http_context& ctx);

				future<> set_server_snitch(http_context& ctx);

				future<> set_server_storage_service(http_context& ctx);

				future<> set_server_repair(http_context& ctx, sharded<netw::messaging_service>& ms);

				future<> unset_server_repair(http_context& ctx);

				future<> set_transport_controller(http_context& ctx, cql_transport::controller& ctl);

				future<> unset_transport_controller(http_context& ctx);

				future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl);

				future<> unset_rpc_controller(http_context& ctx);

				future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl);

				future<> unset_server_snapshot(http_context& ctx);

				future<> set_server_gossip(http_context& ctx);

				future<> set_server_load_sstable(http_context& ctx);

				future<> set_server_messaging_service(http_context& ctx);

				future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);

				future<> unset_server_messaging_service(http_context& ctx);

				future<> set_server_storage_proxy(http_context& ctx);

				future<> set_server_stream_manager(http_context& ctx);

				future<> set_server_gossip_settle(http_context& ctx);

									
										24

api/cache_service.cc
									
												View File
												
				@@ -208,9 +208,11 @@ void set_cache_service(http_context& ctx, routes& r) {

				    });

				    cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, uint64_t(0), [](const column_family& cf) {

				            return cf.get_row_cache().get_cache_tracker().region().occupancy().used_space();

				        }, std::plus<uint64_t>());

				        return ctx.db.map_reduce0([](database& db) -> uint64_t {

				            return db.row_cache_tracker().region().occupancy().used_space();

				        }, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    cs::get_row_hits.set(r, [&ctx] (std::unique_ptr<request> req) {

				@@ -251,15 +253,19 @@ void set_cache_service(http_context& ctx, routes& r) {

				    cs::get_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				        // In origin row size is the weighted size.

				        // We currently do not support weights, so we use num entries instead

				        return map_reduce_cf(ctx, 0, [](const column_family& cf) {

				            return cf.get_row_cache().partitions();

				        }, std::plus<uint64_t>());

				        return ctx.db.map_reduce0([](database& db) -> uint64_t {

				            return db.row_cache_tracker().partitions();

				        }, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    cs::get_row_entries.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, 0, [](const column_family& cf) {

				            return cf.get_row_cache().partitions();

				        }, std::plus<uint64_t>());

				        return ctx.db.map_reduce0([](database& db) -> uint64_t {

				            return db.row_cache_tracker().partitions();

				        }, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    cs::get_counter_capacity.set(r, [] (std::unique_ptr<request> req) {

									
										2

api/collectd.cc
									
												View File
												
				@@ -64,7 +64,7 @@ static const char* str_to_regex(const sstring& v) {

				void set_collectd(http_context& ctx, routes& r) {

				    cd::get_collectd.set(r, [&ctx](std::unique_ptr<request> req) {

				        auto id = make_shared<scollectd::type_instance_id>(req->param["pluginid"],

				        auto id = ::make_shared<scollectd::type_instance_id>(req->param["pluginid"],

				                req->get_query_param("instance"), req->get_query_param("type"),

				                req->get_query_param("type_instance"));

									
										103

api/column_family.cc
									
												View File
												
				@@ -249,6 +249,12 @@ static future<json::json_return_type> sum_sstable(http_context& ctx, bool total)

				    });

				}

				future<json::json_return_type> map_reduce_cf_time_histogram(http_context& ctx, const sstring& name, std::function<utils::time_estimated_histogram(const column_family&)> f) {

				    return map_reduce_cf_raw(ctx, name, utils::time_estimated_histogram(), f, utils::time_estimated_histogram_merge).then([](const utils::time_estimated_histogram& res) {

				        return make_ready_future<json::json_return_type>(time_to_json_histogram(res));

				    });

				}

				template <typename T>

				class sum_ratio {

				    uint64_t _n = 0;

				@@ -304,7 +310,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        return res;

				    });

				    cf::get_column_family.set(r, [&ctx] (const_req req){

				    cf::get_column_family.set(r, [&ctx] (std::unique_ptr<request> req){

				            vector<cf::column_family_info> res;

				            for (auto i: ctx.db.local().get_column_families_mapping()) {

				                cf::column_family_info info;

				@@ -313,7 +319,7 @@ void set_column_family(http_context& ctx, routes& r) {

				                info.type = "ColumnFamilies";

				                res.push_back(info);

				            }

				            return res;

				            return make_ready_future<json::json_return_type>(json::stream_object(std::move(res)));

				        });

				    cf::get_column_family_name_keyspace.set(r, [&ctx] (const_req req){

				@@ -325,15 +331,15 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t{0}, [](column_family& cf) {

				            return cf.active_memtable().partition_count();

				        }, std::plus<int>());

				        }, std::plus<>());

				    });

				    cf::get_all_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, 0, [](column_family& cf) {

				        return map_reduce_cf(ctx, uint64_t{0}, [](column_family& cf) {

				            return cf.active_memtable().partition_count();

				        }, std::plus<int>());

				        }, std::plus<>());

				    });

				    cf::get_memtable_on_heap_size.set(r, [] (const_req req) {

				@@ -650,7 +656,7 @@ void set_column_family(http_context& ctx, routes& r) {

				    cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {

				            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				                return sst->filter_size();

				                return s + sst->filter_size();

				            });

				        }, std::plus<uint64_t>());

				    });

				@@ -658,7 +664,7 @@ void set_column_family(http_context& ctx, routes& r) {

				    cf::get_all_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {

				            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				                return sst->filter_size();

				                return s + sst->filter_size();

				            });

				        }, std::plus<uint64_t>());

				    });

				@@ -666,7 +672,7 @@ void set_column_family(http_context& ctx, routes& r) {

				    cf::get_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {

				            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				                return sst->filter_memory_size();

				                return s + sst->filter_memory_size();

				            });

				        }, std::plus<uint64_t>());

				    });

				@@ -674,7 +680,7 @@ void set_column_family(http_context& ctx, routes& r) {

				    cf::get_all_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {

				            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				                return sst->filter_memory_size();

				                return s + sst->filter_memory_size();

				            });

				        }, std::plus<uint64_t>());

				    });

				@@ -682,7 +688,7 @@ void set_column_family(http_context& ctx, routes& r) {

				    cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {

				            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				                return sst->get_summary().memory_footprint();

				                return s + sst->get_summary().memory_footprint();

				            });

				        }, std::plus<uint64_t>());

				    });

				@@ -690,7 +696,7 @@ void set_column_family(http_context& ctx, routes& r) {

				    cf::get_all_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {

				            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				                return sst->get_summary().memory_footprint();

				                return s + sst->get_summary().memory_footprint();

				            });

				        }, std::plus<uint64_t>());

				    });

				@@ -796,24 +802,21 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {

				            return cf.get_stats().estimated_cas_prepare;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				        });

				    });

				    cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				            return cf.get_stats().estimated_cas_propose;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {

				            return cf.get_stats().estimated_cas_accept;

				        });

				    });

				    cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				            return cf.get_stats().estimated_cas_commit;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {

				            return cf.get_stats().estimated_cas_learn;

				        });

				    });

				    cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				@@ -839,15 +842,32 @@ void set_column_family(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(res);

				    });

				    cf::is_auto_compaction_disabled.set(r, [] (const_req req) {

				        // FIXME

				        // currently auto compaction is disable

				        // it should be changed when it would have an API

				        return true;

				    cf::get_auto_compaction.set(r, [&ctx] (const_req req) {

				        const utils::UUID& uuid = get_uuid(req.param["name"], ctx.db.local());

				        column_family& cf = ctx.db.local().find_column_family(uuid);

				        return !cf.is_auto_compaction_disabled_by_user();

				    });

				    cf::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {

				        return foreach_column_family(ctx, req->param["name"], [](column_family &cf) {

				            cf.enable_auto_compaction();

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    cf::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {

				        return foreach_column_family(ctx, req->param["name"], [](column_family &cf) {

				            cf.disable_auto_compaction();

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {

				        auto [ks, cf_name] = parse_fully_qualified_cf_name(req->param["name"]);

				        auto ks_cf = parse_fully_qualified_cf_name(req->param["name"]);

				        auto&& ks = std::get<0>(ks_cf);

				        auto&& cf_name = std::get<1>(ks_cf);

				        return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace::view_build_progress>& vb) mutable {

				            std::set<sstring> vp;

				            for (auto b : vb) {

				@@ -860,7 +880,7 @@ void set_column_family(http_context& ctx, routes& r) {

				            column_family& cf = ctx.db.local().find_column_family(uuid);

				            res.reserve(cf.get_index_manager().list_indexes().size());

				            for (auto&& i : cf.get_index_manager().list_indexes()) {

				                if (vp.find(secondary_index::index_table_name(i.metadata().name())) == vp.end()) {

				                if (!vp.contains(secondary_index::index_table_name(i.metadata().name()))) {

				                    res.emplace_back(i.metadata().name());

				                }

				            }

				@@ -894,17 +914,15 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {

				            return cf.get_stats().estimated_read;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				        });

				    });

				    cf::get_write_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const column_family& cf) {

				            return cf.get_stats().estimated_write;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				        });

				    });

				    cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<request> req) {

				@@ -973,6 +991,9 @@ void set_column_family(http_context& ctx, routes& r) {

				                        apilog.debug("toppartitions query: processing results");

				                        cf::toppartitions_query_results results;

				                        results.read_cardinality = topk_results.read.size();

				                        results.write_cardinality = topk_results.write.size();

				                        for (auto& d: topk_results.read.top(q.list_size())) {

				                            cf::toppartitions_record r;

				                            r.partition = sstring(d.item);

				@@ -994,5 +1015,15 @@ void set_column_family(http_context& ctx, routes& r) {

				        });

				    });

				    cf::force_major_compaction.set(r, [&ctx](std::unique_ptr<request> req) {

				        if (req->get_query_param("split_output") != "") {

				            fail(unimplemented::cause::API);

				        }

				        return foreach_column_family(ctx, req->param["name"], [](column_family &cf) {

				            return cf.compact_all_sstables();

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				}

				}

									
										2

api/column_family.hh
									
												View File
												
				@@ -68,6 +68,8 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& n

				    });

				}

				future<json::json_return_type> map_reduce_cf_time_histogram(http_context& ctx, const sstring& name, std::function<utils::time_estimated_histogram(const column_family&)> f);

				struct map_reduce_column_families_locally {

				    std::any init;

				    std::function<std::unique_ptr<std::any>(column_family&)> mapper;

									
										2

api/commitlog.cc
									
												View File
												
				@@ -20,7 +20,7 @@

				 */

				#include "commitlog.hh"

				#include <db/commitlog/commitlog.hh>

				#include "db/commitlog/commitlog.hh"

				#include "api/api-doc/commitlog.json.hh"

				#include "database.hh"

				#include <vector>

									
										69

api/error_injection.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,69 @@

				/*

				 * Copyright (C) 2020 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#include "api/api-doc/error_injection.json.hh"

				#include "api/api.hh"

				#include <seastar/http/exception.hh>

				#include "log.hh"

				#include "utils/error_injection.hh"

				#include "seastar/core/future-util.hh"

				namespace api {

				namespace hf = httpd::error_injection_json;

				void set_error_injection(http_context& ctx, routes& r) {

				    hf::enable_injection.set(r, [](std::unique_ptr<request> req) {

				        sstring injection = req->param["injection"];

				        bool one_shot = req->get_query_param("one_shot") == "True";

				        auto& errinj = utils::get_local_injector();

				        return errinj.enable_on_all(injection, one_shot).then([] {

				            return make_ready_future<json::json_return_type>(json::json_void());

				        });

				    });

				    hf::get_enabled_injections_on_all.set(r, [](std::unique_ptr<request> req) {

				        auto& errinj = utils::get_local_injector();

				        auto ret = errinj.enabled_injections_on_all();

				        return make_ready_future<json::json_return_type>(ret);

				    });

				    hf::disable_injection.set(r, [](std::unique_ptr<request> req) {

				        sstring injection = req->param["injection"];

				        auto& errinj = utils::get_local_injector();

				        return errinj.disable_on_all(injection).then([] {

				            return make_ready_future<json::json_return_type>(json::json_void());

				        });

				    });

				    hf::disable_on_all.set(r, [](std::unique_ptr<request> req) {

				        auto& errinj = utils::get_local_injector();

				        return errinj.disable_on_all().then([] {

				            return make_ready_future<json::json_return_type>(json::json_void());

				        });

				    });

				}

				} // namespace api

									
										30

api/error_injection.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,30 @@

				/*

				 * Copyright (C) 2019 ScyllaDB

				 */

				/*

				 * This file is part of Scylla.

				 *

				 * Scylla is free software: you can redistribute it and/or modify

				 * it under the terms of the GNU Affero General Public License as published by

				 * the Free Software Foundation, either version 3 of the License, or

				 * (at your option) any later version.

				 *

				 * Scylla is distributed in the hope that it will be useful,

				 * but WITHOUT ANY WARRANTY; without even the implied warranty of

				 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

				 * GNU General Public License for more details.

				 *

				 * You should have received a copy of the GNU General Public License

				 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.

				 */

				#pragma once

				#include "api.hh"

				namespace api {

				void set_error_injection(http_context& ctx, routes& r);

				}

									
										9

api/gossiper.cc
									
												View File
												
				@@ -21,7 +21,7 @@

				#include "gossiper.hh"

				#include "api/api-doc/gossiper.json.hh"

				#include <gms/gossiper.hh>

				#include "gms/gossiper.hh"

				namespace api {

				using namespace json;

				@@ -66,6 +66,13 @@ void set_gossiper(http_context& ctx, routes& r) {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    httpd::gossiper_json::force_remove_endpoint.set(r, [](std::unique_ptr<request> req) {

				        gms::inet_address ep(req->param["addr"]);

				        return gms::get_local_gossiper().force_remove_endpoint(ep).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				}

				}

									
										49

api/messaging_service.cc
									
												View File
												
				@@ -53,8 +53,8 @@ std::vector<message_counter> map_to_message_counters(

				 * according to a function that it gets as a parameter.

				 *

				 */

				future_json_function get_client_getter(std::function<uint64_t(const shard_info&)> f) {

				    return [f](std::unique_ptr<request> req) {

				future_json_function get_client_getter(sharded<netw::messaging_service>& ms, std::function<uint64_t(const shard_info&)> f) {

				    return [&ms, f](std::unique_ptr<request> req) {

				        using map_type = std::unordered_map<gms::inet_address, uint64_t>;

				        auto get_shard_map = [f](messaging_service& ms) {

				            std::unordered_map<gms::inet_address, unsigned long> map;

				@@ -63,15 +63,15 @@ future_json_function get_client_getter(std::function<uint64_t(const shard_info&)

				            });

				            return map;

				        };

				        return  get_messaging_service().map_reduce0(get_shard_map, map_type(), map_sum<map_type>).

				        return ms.map_reduce0(get_shard_map, map_type(), map_sum<map_type>).

				                then([](map_type&& map) {

				            return make_ready_future<json::json_return_type>(map_to_message_counters(map));

				        });

				    };

				}

				future_json_function get_server_getter(std::function<uint64_t(const rpc::stats&)> f) {

				    return [f](std::unique_ptr<request> req) {

				future_json_function get_server_getter(sharded<netw::messaging_service>& ms, std::function<uint64_t(const rpc::stats&)> f) {

				    return [&ms, f](std::unique_ptr<request> req) {

				        using map_type = std::unordered_map<gms::inet_address, uint64_t>;

				        auto get_shard_map = [f](messaging_service& ms) {

				            std::unordered_map<gms::inet_address, unsigned long> map;

				@@ -80,53 +80,53 @@ future_json_function get_server_getter(std::function<uint64_t(const rpc::stats&)

				            });

				            return map;

				        };

				        return  get_messaging_service().map_reduce0(get_shard_map, map_type(), map_sum<map_type>).

				        return ms.map_reduce0(get_shard_map, map_type(), map_sum<map_type>).

				                then([](map_type&& map) {

				            return make_ready_future<json::json_return_type>(map_to_message_counters(map));

				        });

				    };

				}

				void set_messaging_service(http_context& ctx, routes& r) {

				    get_timeout_messages.set(r, get_client_getter([](const shard_info& c) {

				void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms) {

				    get_timeout_messages.set(r, get_client_getter(ms, [](const shard_info& c) {

				        return c.get_stats().timeout;

				    }));

				    get_sent_messages.set(r, get_client_getter([](const shard_info& c) {

				    get_sent_messages.set(r, get_client_getter(ms, [](const shard_info& c) {

				        return c.get_stats().sent_messages;

				    }));

				    get_dropped_messages.set(r, get_client_getter([](const shard_info& c) {

				    get_dropped_messages.set(r, get_client_getter(ms, [](const shard_info& c) {

				        // We don't have the same drop message mechanism

				        // as origin has.

				        // hence we can always return 0

				        return 0;

				    }));

				    get_exception_messages.set(r, get_client_getter([](const shard_info& c) {

				    get_exception_messages.set(r, get_client_getter(ms, [](const shard_info& c) {

				        return c.get_stats().exception_received;

				    }));

				    get_pending_messages.set(r, get_client_getter([](const shard_info& c) {

				    get_pending_messages.set(r, get_client_getter(ms, [](const shard_info& c) {

				        return c.get_stats().pending;

				    }));

				    get_respond_pending_messages.set(r, get_server_getter([](const rpc::stats& c) {

				    get_respond_pending_messages.set(r, get_server_getter(ms, [](const rpc::stats& c) {

				        return c.pending;

				    }));

				    get_respond_completed_messages.set(r, get_server_getter([](const rpc::stats& c) {

				    get_respond_completed_messages.set(r, get_server_getter(ms, [](const rpc::stats& c) {

				        return c.sent_messages;

				    }));

				    get_version.set(r, [](const_req req) {

				        return netw::get_local_messaging_service().get_raw_version(req.get_query_param("addr"));

				    get_version.set(r, [&ms](const_req req) {

				        return ms.local().get_raw_version(req.get_query_param("addr"));

				    });

				    get_dropped_messages_by_ver.set(r, [](std::unique_ptr<request> req) {

				    get_dropped_messages_by_ver.set(r, [&ms](std::unique_ptr<request> req) {

				        shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb);

				        return netw::get_messaging_service().map_reduce([map](const uint64_t* local_map) mutable {

				        return ms.map_reduce([map](const uint64_t* local_map) mutable {

				            for (auto i = 0; i < num_verb; i++) {

				                (*map)[i]+= local_map[i];

				            }

				@@ -151,5 +151,18 @@ void set_messaging_service(http_context& ctx, routes& r) {

				        });

				    });

				}

				void unset_messaging_service(http_context& ctx, routes& r) {

				    get_timeout_messages.unset(r);

				    get_sent_messages.unset(r);

				    get_dropped_messages.unset(r);

				    get_exception_messages.unset(r);

				    get_pending_messages.unset(r);

				    get_respond_pending_messages.unset(r);

				    get_respond_completed_messages.unset(r);

				    get_version.unset(r);

				    get_dropped_messages_by_ver.unset(r);

				}

				}

									
										5

api/messaging_service.hh
									
												View File
												
				@@ -23,8 +23,11 @@

				#include "api.hh"

				namespace netw { class messaging_service; }

				namespace api {

				void set_messaging_service(http_context& ctx, routes& r);

				void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms);

				void unset_messaging_service(http_context& ctx, routes& r);

				}

									
										233

api/storage_proxy.cc
									
												View File
												
				@@ -27,6 +27,7 @@

				#include "db/config.hh"

				#include "utils/histogram.hh"

				#include "database.hh"

				#include "seastar/core/scheduling_specific.hh"

				namespace api {

				@@ -34,12 +35,70 @@ namespace sp = httpd::storage_proxy_json;

				using proxy = service::storage_proxy;

				using namespace json;

				static future<utils::rate_moving_average>  sum_timed_rate(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {

				    return d.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average(),

				            std::plus<utils::rate_moving_average>());

				/**

				 * This function implement a two dimentional map reduce where

				 * the first level is a distributed storage_proxy class and the

				 * second level is the stats per scheduling group class.

				 * @param d -  a reference to the storage_proxy distributed class.

				 * @param mapper -  the internal mapper that is used to map the internal

				 * stat class into a value of type `V`.

				 * @param reducer - the reducer that is used in both outer and inner

				 * aggregations.

				 * @param initial_value - the initial value to use for both aggregations

				 * @return A future that resolves to the result of the aggregation.

				 */

				template<typename V, typename Reducer, typename InnerMapper>

				future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,

				        InnerMapper mapper, Reducer reducer, V initial_value) {

				    return d.map_reduce0( [mapper, reducer, initial_value] (const service::storage_proxy& sp) {

				        return map_reduce_scheduling_group_specific<service::storage_proxy_stats::stats>(

				                mapper, reducer, initial_value, sp.get_stats_key());

				    }, initial_value, reducer);

				}

				static future<json::json_return_type>  sum_timed_rate_as_obj(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {

				/**

				 * This function implement a two dimentional map reduce where

				 * the first level is a distributed storage_proxy class and the

				 * second level is the stats per scheduling group class.

				 * @param d -  a reference to the storage_proxy distributed class.

				 * @param f - a field pointer which is the implicit internal reducer.

				 * @param reducer - the reducer that is used in both outer and inner

				 * aggregations.

				 * @param initial_value - the initial value to use for both aggregations* @return

				 * @return A future that resolves to the result of the aggregation.

				 */

				template<typename V, typename Reducer, typename F>

				future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,

				        V F::*f, Reducer reducer, V initial_value) {

				    return two_dimensional_map_reduce(d, [f] (F& stats) {

				        return stats.*f;

				    }, reducer, initial_value);

				}

				/**

				 * A partial Specialization of sum_stats for the storage proxy

				 * case where the get stats function doesn't return a

				 * stats object with fields but a per scheduling group

				 * stats object, the name was also changed since functions

				 * partial specialization is not supported in C++.

				 *

				 */

				template<typename V, typename F>

				future<json::json_return_type>  sum_stats_storage_proxy(distributed<proxy>& d, V F::*f) {

				    return two_dimensional_map_reduce(d, [f] (F& stats) { return stats.*f; }, std::plus<V>(), V(0)).then([] (V val) {

				        return make_ready_future<json::json_return_type>(val);

				    });

				}

				static future<utils::rate_moving_average>  sum_timed_rate(distributed<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {

				    return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {

				        return (stats.*f).rate();

				    }, std::plus<utils::rate_moving_average>(), utils::rate_moving_average());

				}

				static future<json::json_return_type>  sum_timed_rate_as_obj(distributed<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {

				    return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {

				        httpd::utils_json::rate_moving_average m;

				        m = val;

				@@ -51,29 +110,89 @@ httpd::utils_json::rate_moving_average_and_histogram get_empty_moving_average()

				    return timer_to_json(utils::rate_moving_average_and_histogram());

				}

				static future<json::json_return_type>  sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {

				static future<json::json_return_type>  sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {

				    return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {

				        return make_ready_future<json::json_return_type>(val.count);

				    });

				}

				static future<json::json_return_type>  sum_estimated_histogram(http_context& ctx, utils::estimated_histogram proxy::stats::*f) {

				    return ctx.sp.map_reduce0([f](const proxy& p) {return p.get_stats().*f;}, utils::estimated_histogram(),

				            utils::estimated_histogram_merge).then([](const utils::estimated_histogram& val) {

				utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimated_histogram& val) {

				    utils_json::estimated_histogram res;

				    for (size_t i = 0; i < val.size(); i++) {

				        res.buckets.push(val.get(i));

				        res.bucket_offsets.push(val.get_bucket_lower_limit(i));

				    }

				    return res;

				}

				static future<json::json_return_type>  sum_estimated_histogram(http_context& ctx, utils::time_estimated_histogram service::storage_proxy_stats::stats::*f) {

				    return two_dimensional_map_reduce(ctx.sp, f, utils::time_estimated_histogram_merge,

				            utils::time_estimated_histogram()).then([](const utils::time_estimated_histogram& val) {

				        return make_ready_future<json::json_return_type>(time_to_json_histogram(val));

				    });

				}

				static future<json::json_return_type>  sum_estimated_histogram(http_context& ctx, utils::estimated_histogram service::storage_proxy_stats::stats::*f) {

				    return two_dimensional_map_reduce(ctx.sp, f, utils::estimated_histogram_merge,

				            utils::estimated_histogram()).then([](const utils::estimated_histogram& val) {

				        utils_json::estimated_histogram res;

				        res = val;

				        return make_ready_future<json::json_return_type>(res);

				    });

				}

				static future<json::json_return_type>  total_latency(http_context& ctx, utils::timed_rate_moving_average_and_histogram proxy::stats::*f) {

				    return ctx.sp.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).hist.mean * (p.get_stats().*f).hist.count;}, 0.0,

				            std::plus<double>()).then([](double val) {

				static future<json::json_return_type>  total_latency(http_context& ctx, utils::timed_rate_moving_average_and_histogram service::storage_proxy_stats::stats::*f) {

				    return two_dimensional_map_reduce(ctx.sp, [f] (service::storage_proxy_stats::stats& stats) {

				            return (stats.*f).hist.mean * (stats.*f).hist.count;

				        }, std::plus<double>(), 0.0).then([](double val) {

				        int64_t res = val;

				        return make_ready_future<json::json_return_type>(res);

				    });

				}

				/**

				 * A partial Specialization of sum_histogram_stats

				 * for the storage proxy case where the get stats

				 * function doesn't return a stats object with

				 * fields but a per scheduling group stats object,

				 * the name was also changed since function partial

				 * specialization is not supported in C++.

				 */

				template<typename F>

				future<json::json_return_type>

				sum_histogram_stats_storage_proxy(distributed<proxy>& d,

				        utils::timed_rate_moving_average_and_histogram F::*f) {

				    return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {

				        return (stats.*f).hist;

				    }, std::plus<utils::ihistogram>(), utils::ihistogram()).

				            then([](const utils::ihistogram& val) {

				        return make_ready_future<json::json_return_type>(to_json(val));

				    });

				}

				/**

				 * A partial Specialization of sum_timer_stats for the

				 * storage proxy case where the get stats function

				 * doesn't return a stats object with fields but a

				 * per scheduling group stats object, the name

				 * was also changed since partial function specialization

				 * is not supported in C++.

				 */

				template<typename F>

				future<json::json_return_type>

				sum_timer_stats_storage_proxy(distributed<proxy>& d,

				        utils::timed_rate_moving_average_and_histogram F::*f) {

				    return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {

				        return (stats.*f).rate();

				    }, std::plus<utils::rate_moving_average_and_histogram>(),

				            utils::rate_moving_average_and_histogram()).then([](const utils::rate_moving_average_and_histogram& val) {

				        return make_ready_future<json::json_return_type>(timer_to_json(val));

				    });

				}

				void set_storage_proxy(http_context& ctx, routes& r) {

				    sp::get_total_hints.set(r, [](std::unique_ptr<request> req)  {

				        //TBD

				@@ -82,29 +201,39 @@ void set_storage_proxy(http_context& ctx, routes& r) {

				    });

				    sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req)  {

				        auto enabled = ctx.db.local().get_config().hinted_handoff_enabled();

				        return make_ready_future<json::json_return_type>(enabled);

				        const auto& filter = service::get_storage_proxy().local().get_hints_host_filter();

				        return make_ready_future<json::json_return_type>(!filter.is_disabled_for_all());

				    });

				    sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("enable");

				        return make_ready_future<json::json_return_type>(json_void());

				        auto filter = (enable == "true" || enable == "1")

				                ? db::hints::host_filter(db::hints::host_filter::enabled_for_all_tag {})

				                : db::hints::host_filter(db::hints::host_filter::disabled_for_all_tag {});

				        return service::get_storage_proxy().invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {

				            return sp.change_hints_host_filter(filter);

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    sp::get_hinted_handoff_enabled_by_dc.set(r, [](std::unique_ptr<request> req)  {

				        //TBD

				        unimplemented();

				        std::vector<sp::mapper_list> res;

				        std::vector<sstring> res;

				        const auto& filter = service::get_storage_proxy().local().get_hints_host_filter();

				        const auto& dcs = filter.get_dcs();

				        res.reserve(res.size());

				        std::copy(dcs.begin(), dcs.end(), std::back_inserter(res));

				        return make_ready_future<json::json_return_type>(res);

				    });

				    sp::set_hinted_handoff_enabled_by_dc_list.set(r, [](std::unique_ptr<request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("dcs");

				        return make_ready_future<json::json_return_type>(json_void());

				        auto dcs = req->get_query_param("dcs");

				        auto filter = db::hints::host_filter::parse_from_dc_list(std::move(dcs));

				        return service::get_storage_proxy().invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {

				            return sp.change_hints_host_filter(filter);

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    sp::get_max_hint_window.set(r, [](std::unique_ptr<request> req)  {

				@@ -223,15 +352,15 @@ void set_storage_proxy(http_context& ctx, routes& r) {

				    });

				    sp::get_read_repair_attempted.set(r, [&ctx](std::unique_ptr<request> req)  {

				        return sum_stats(ctx.sp, &proxy::stats::read_repair_attempts);

				        return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_attempts);

				    });

				    sp::get_read_repair_repaired_blocking.set(r, [&ctx](std::unique_ptr<request> req)  {

				        return sum_stats(ctx.sp, &proxy::stats::read_repair_repaired_blocking);

				        return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_repaired_blocking);

				    });

				    sp::get_read_repair_repaired_background.set(r, [&ctx](std::unique_ptr<request> req)  {

				        return sum_stats(ctx.sp, &proxy::stats::read_repair_repaired_background);

				        return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_repaired_background);

				    });

				    sp::get_schema_versions.set(r, [](std::unique_ptr<request> req)  {

				@@ -275,6 +404,10 @@ void set_storage_proxy(http_context& ctx, routes& r) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_write_condition_not_met);

				    });

				    sp::get_cas_write_metrics_failed_read_round_optimization.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_failed_read_round_optimization);

				    });

				    sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_read_unfinished_commit);

				    });

				@@ -284,71 +417,71 @@ void set_storage_proxy(http_context& ctx, routes& r) {

				    });

				    sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::read_timeouts);

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::read_timeouts);

				    });

				    sp::get_read_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::read_unavailables);

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::read_unavailables);

				    });

				    sp::get_range_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::range_slice_timeouts);

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::range_slice_timeouts);

				    });

				    sp::get_range_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::range_slice_unavailables);

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::range_slice_unavailables);

				    });

				    sp::get_write_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::write_timeouts);

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::write_timeouts);

				    });

				    sp::get_write_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::write_unavailables);

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::write_unavailables);

				    });

				    sp::get_read_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::read_timeouts);

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::read_timeouts);

				    });

				    sp::get_read_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::read_unavailables);

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::read_unavailables);

				    });

				    sp::get_range_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::range_slice_timeouts);

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::range_slice_timeouts);

				    });

				    sp::get_range_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::range_slice_unavailables);

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::range_slice_unavailables);

				    });

				    sp::get_write_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::write_timeouts);

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::write_timeouts);

				    });

				    sp::get_write_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::write_unavailables);

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::write_unavailables);

				    });

				    sp::get_range_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_histogram_stats(ctx.sp, &proxy::stats::range);

				        return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);

				    });

				    sp::get_write_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_histogram_stats(ctx.sp, &proxy::stats::write);

				        return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::write);

				    });

				    sp::get_read_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_histogram_stats(ctx.sp, &proxy::stats::read);

				        return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read);

				    });

				    sp::get_range_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::range);

				        return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);

				    });

				    sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::write);

				        return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::write);

				    });

				    sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::cas_write);

				@@ -367,30 +500,30 @@ void set_storage_proxy(http_context& ctx, routes& r) {

				    });

				    sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::read);

				        return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read);

				    });

				    sp::get_read_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_estimated_histogram(ctx, &proxy::stats::estimated_read);

				        return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::estimated_read);

				    });

				    sp::get_read_latency.set(r, [&ctx](std::unique_ptr<request> req) {

				        return total_latency(ctx, &proxy::stats::read);

				        return total_latency(ctx, &service::storage_proxy_stats::stats::read);

				    });

				    sp::get_write_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_estimated_histogram(ctx, &proxy::stats::estimated_write);

				        return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::estimated_write);

				    });

				    sp::get_write_latency.set(r, [&ctx](std::unique_ptr<request> req) {

				        return total_latency(ctx, &proxy::stats::write);

				        return total_latency(ctx, &service::storage_proxy_stats::stats::write);

				    });

				    sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::range);

				        return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);

				    });

				    sp::get_range_latency.set(r, [&ctx](std::unique_ptr<request> req) {

				        return total_latency(ctx, &proxy::stats::range);

				        return total_latency(ctx, &service::storage_proxy_stats::stats::range);

				    });

				}

									
										608

api/storage_service.cc
									
												View File
												
				@@ -22,11 +22,13 @@

				#include "storage_service.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include "db/config.hh"

				#include "db/schema_tables.hh"

				#include <optional>

				#include <time.h>

				#include <boost/range/adaptor/map.hpp>

				#include <boost/range/adaptor/filtered.hpp>

				#include "service/storage_service.hh"

				#include "service/load_meter.hh"

				#include "db/commitlog/commitlog.hh"

				#include "gms/gossiper.hh"

				#include "db/system_keyspace.hh"

				@@ -40,11 +42,17 @@

				#include "sstables/sstables.hh"

				#include "database.hh"

				#include "db/extensions.hh"

				sstables::sstable::version_types get_highest_supported_format();

				#include "db/snapshot-ctl.hh"

				#include "transport/controller.hh"

				#include "thrift/controller.hh"

				#include "locator/token_metadata.hh"

				namespace api {

				const locator::token_metadata& http_context::get_token_metadata() {

				        return *shared_token_metadata.local().get();

				}

				namespace ss = httpd::storage_service_json;

				using namespace json;

				@@ -55,57 +63,213 @@ static sstring validate_keyspace(http_context& ctx, const parameters& param) {

				    throw bad_param_exception("Keyspace " + param["keyspace"] + " Does not exist");

				}

				static std::vector<ss::token_range> describe_ring(const sstring& keyspace) {

				    std::vector<ss::token_range> res;

				    for (auto d : service::get_local_storage_service().describe_ring(keyspace)) {

				        ss::token_range r;

				        r.start_token = d._start_token;

				        r.end_token = d._end_token;

				        r.endpoints = d._endpoints;

				        r.rpc_endpoints = d._rpc_endpoints;

				        for (auto det : d._endpoint_details) {

				            ss::endpoint_detail ed;

				            ed.host = det._host;

				            ed.datacenter = det._datacenter;

				            if (det._rack != "") {

				                ed.rack = det._rack;

				            }

				            r.endpoint_details.push(ed);

				static ss::token_range token_range_endpoints_to_json(const dht::token_range_endpoints& d) {

				    ss::token_range r;

				    r.start_token = d._start_token;

				    r.end_token = d._end_token;

				    r.endpoints = d._endpoints;

				    r.rpc_endpoints = d._rpc_endpoints;

				    for (auto det : d._endpoint_details) {

				        ss::endpoint_detail ed;

				        ed.host = det._host;

				        ed.datacenter = det._datacenter;

				        if (det._rack != "") {

				            ed.rack = det._rack;

				        }

				        res.push_back(r);

				        r.endpoint_details.push(ed);

				    }

				    return res;

				    return r;

				}

				using ks_cf_func = std::function<future<json::json_return_type>(http_context&, std::unique_ptr<request>, sstring, std::vector<sstring>)>;

				static auto wrap_ks_cf(http_context &ctx, ks_cf_func f) {

				    return [&ctx, f = std::move(f)](std::unique_ptr<request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto column_families = split_cf(req->get_query_param("cf"));

				        if (column_families.empty()) {

				            column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());

				        }

				        return f(ctx, std::move(req), std::move(keyspace), std::move(column_families));

				    };

				}

				future<json::json_return_type> set_tables_autocompaction(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {

				    if (tables.empty()) {

				        tables = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());

				    }

				    return service::get_local_storage_service().set_tables_autocompaction(keyspace, tables, enabled).then([]{

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				}

				void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl) {

				    ss::start_native_transport.set(r, [&ctl](std::unique_ptr<request> req) {

				        return ctl.start_server().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::stop_native_transport.set(r, [&ctl](std::unique_ptr<request> req) {

				        return ctl.stop_server().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::is_native_transport_running.set(r, [&ctl] (std::unique_ptr<request> req) {

				        return ctl.is_server_running().then([] (bool running) {

				            return make_ready_future<json::json_return_type>(running);

				        });

				    });

				}

				void unset_transport_controller(http_context& ctx, routes& r) {

				    ss::start_native_transport.unset(r);

				    ss::stop_native_transport.unset(r);

				    ss::is_native_transport_running.unset(r);

				}

				void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl) {

				    ss::stop_rpc_server.set(r, [&ctl](std::unique_ptr<request> req) {

				        return ctl.stop_server().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::start_rpc_server.set(r, [&ctl](std::unique_ptr<request> req) {

				        return ctl.start_server().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::is_rpc_server_running.set(r, [&ctl] (std::unique_ptr<request> req) {

				        return ctl.is_server_running().then([] (bool running) {

				            return make_ready_future<json::json_return_type>(running);

				        });

				    });

				}

				void unset_rpc_controller(http_context& ctx, routes& r) {

				    ss::stop_rpc_server.unset(r);

				    ss::start_rpc_server.unset(r);

				    ss::is_rpc_server_running.unset(r);

				}

				void set_repair(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms) {

				    ss::repair_async.set(r, [&ctx, &ms](std::unique_ptr<request> req) {

				        static std::vector<sstring> options = {"primaryRange", "parallelism", "incremental",

				                "jobThreads", "ranges", "columnFamilies", "dataCenters", "hosts", "trace",

				                "startToken", "endToken" };

				        std::unordered_map<sstring, sstring> options_map;

				        for (auto o : options) {

				            auto s = req->get_query_param(o);

				            if (s != "") {

				                options_map[o] = s;

				            }

				        }

				        // The repair process is asynchronous: repair_start only starts it and

				        // returns immediately, not waiting for the repair to finish. The user

				        // then has other mechanisms to track the ongoing repair's progress,

				        // or stop it.

				        return repair_start(ctx.db, ms, validate_keyspace(ctx, req->param),

				                options_map).then([] (int i) {

				                    return make_ready_future<json::json_return_type>(i);

				                });

				    });

				    ss::get_active_repair_async.set(r, [&ctx](std::unique_ptr<request> req) {

				        return get_active_repairs(ctx.db).then([] (std::vector<int> res){

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    ss::repair_async_status.set(r, [&ctx](std::unique_ptr<request> req) {

				        return repair_get_status(ctx.db, boost::lexical_cast<int>( req->get_query_param("id")))

				                .then_wrapped([] (future<repair_status>&& fut) {

				            ss::ns_repair_async_status::return_type_wrapper res;

				            try {

				                res = fut.get0();

				            } catch(std::runtime_error& e) {

				                throw httpd::bad_param_exception(e.what());

				            }

				            return make_ready_future<json::json_return_type>(json::json_return_type(res));

				        });

				    });

				    ss::repair_await_completion.set(r, [&ctx](std::unique_ptr<request> req) {

				        int id;

				        using clock = std::chrono::steady_clock;

				        clock::time_point expire;

				        try {

				            id = boost::lexical_cast<int>(req->get_query_param("id"));

				            // If timeout is not provided, it means no timeout.

				            sstring s = req->get_query_param("timeout");

				            int64_t timeout = s.empty() ? int64_t(-1) : boost::lexical_cast<int64_t>(s);

				            if (timeout < 0 && timeout != -1) {

				                return make_exception_future<json::json_return_type>(

				                        httpd::bad_param_exception("timeout can only be -1 (means no timeout) or non negative integer"));

				            }

				            if (timeout < 0) {

				                expire = clock::time_point::max();

				            } else {

				                expire = clock::now() + std::chrono::seconds(timeout);

				            }

				        } catch (std::exception& e) {

				            return make_exception_future<json::json_return_type>(httpd::bad_param_exception(e.what()));

				        }

				        return repair_await_completion(ctx.db, id, expire)

				                .then_wrapped([] (future<repair_status>&& fut) {

				            ss::ns_repair_async_status::return_type_wrapper res;

				            try {

				                res = fut.get0();

				            } catch (std::exception& e) {

				                return make_exception_future<json::json_return_type>(httpd::bad_param_exception(e.what()));

				            }

				            return make_ready_future<json::json_return_type>(json::json_return_type(res));

				        });

				    });

				    ss::force_terminate_all_repair_sessions.set(r, [](std::unique_ptr<request> req) {

				        return repair_abort_all(service::get_local_storage_service().db()).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::force_terminate_all_repair_sessions_new.set(r, [](std::unique_ptr<request> req) {

				        return repair_abort_all(service::get_local_storage_service().db()).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				}

				void unset_repair(http_context& ctx, routes& r) {

				    ss::repair_async.unset(r);

				    ss::get_active_repair_async.unset(r);

				    ss::repair_async_status.unset(r);

				    ss::repair_await_completion.unset(r);

				    ss::force_terminate_all_repair_sessions.unset(r);

				    ss::force_terminate_all_repair_sessions_new.unset(r);

				}

				void set_storage_service(http_context& ctx, routes& r) {

				    using ks_cf_func = std::function<future<json::json_return_type>(std::unique_ptr<request>, sstring, std::vector<sstring>)>;

				    auto wrap_ks_cf = [&ctx](ks_cf_func f) {

				        return [&ctx, f = std::move(f)](std::unique_ptr<request> req) {

				            auto keyspace = validate_keyspace(ctx, req->param);

				            auto column_families = split_cf(req->get_query_param("cf"));

				            if (column_families.empty()) {

				                column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());

				            }

				            return f(std::move(req), std::move(keyspace), std::move(column_families));

				        };

				    };

				    ss::local_hostid.set(r, [](std::unique_ptr<request> req) {

				        return db::system_keyspace::get_local_host_id().then([](const utils::UUID& id) {

				            return make_ready_future<json::json_return_type>(id.to_sstring());

				        });

				    });

				    ss::get_tokens.set(r, [] (std::unique_ptr<request> req) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_metadata().sorted_tokens(), [](const dht::token& i) {

				    ss::get_tokens.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.get_token_metadata().sorted_tokens(), [](const dht::token& i) {

				           return boost::lexical_cast<std::string>(i);

				        }));

				    });

				    ss::get_node_tokens.set(r, [] (std::unique_ptr<request> req) {

				    ss::get_node_tokens.set(r, [&ctx] (std::unique_ptr<request> req) {

				        gms::inet_address addr(req->param["endpoint"]);

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_metadata().get_tokens(addr), [](const dht::token& i) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.get_token_metadata().get_tokens(addr), [](const dht::token& i) {

				           return boost::lexical_cast<std::string>(i);

				       }));

				    });

				@@ -123,8 +287,8 @@ void set_storage_service(http_context& ctx, routes& r) {

				        }));

				    });

				    ss::get_leaving_nodes.set(r, [](const_req req) {

				        return container_to_vec(service::get_local_storage_service().get_token_metadata().get_leaving_endpoints());

				    ss::get_leaving_nodes.set(r, [&ctx](const_req req) {

				        return container_to_vec(ctx.get_token_metadata().get_leaving_endpoints());

				    });

				    ss::get_moving_nodes.set(r, [](const_req req) {

				@@ -132,8 +296,8 @@ void set_storage_service(http_context& ctx, routes& r) {

				        return container_to_vec(addr);

				    });

				    ss::get_joining_nodes.set(r, [](const_req req) {

				        auto points = service::get_local_storage_service().get_token_metadata().get_bootstrap_tokens();

				    ss::get_joining_nodes.set(r, [&ctx](const_req req) {

				        auto points = ctx.get_token_metadata().get_bootstrap_tokens();

				        std::unordered_set<sstring> addr;

				        for (auto i: points) {

				            addr.insert(boost::lexical_cast<std::string>(i.second));

				@@ -161,11 +325,26 @@ void set_storage_service(http_context& ctx, routes& r) {

				    });

				    ss::get_range_to_endpoint_map.set(r, [&ctx](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        auto keyspace = validate_keyspace(ctx, req->param);

				        std::vector<ss::maplist_mapper> res;

				        return make_ready_future<json::json_return_type>(res);

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_range_to_address_map(keyspace),

				                [](const std::pair<dht::token_range, std::vector<gms::inet_address>>& entry){

				            ss::maplist_mapper m;

				            if (entry.first.start()) {

				                m.key.push(entry.first.start().value().value().to_sstring());

				            } else {

				                m.key.push("");

				            }

				            if (entry.first.end()) {

				                m.key.push(entry.first.end().value().value().to_sstring());

				            } else {

				                m.key.push("");

				            }

				            for (const gms::inet_address& address : entry.second) {

				                m.value.push(address.to_sstring());

				            }

				            return m;

				        }));

				    });

				    ss::get_pending_range_to_endpoint_map.set(r, [&ctx](std::unique_ptr<request> req) {

				@@ -176,27 +355,26 @@ void set_storage_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(res);

				    });

				    ss::describe_any_ring.set(r, [&ctx](const_req req) {

				        return describe_ring("");

				    ss::describe_any_ring.set(r, [&ctx](std::unique_ptr<request> req) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(""), token_range_endpoints_to_json));

				    });

				    ss::describe_ring.set(r, [&ctx](const_req req) {

				        auto keyspace = validate_keyspace(ctx, req.param);

				        return describe_ring(keyspace);

				    ss::describe_ring.set(r, [&ctx](std::unique_ptr<request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(keyspace), token_range_endpoints_to_json));

				    });

				    ss::get_host_id_map.set(r, [](const_req req) {

				    ss::get_host_id_map.set(r, [&ctx](const_req req) {

				        std::vector<ss::mapper> res;

				        return map_to_key_value(service::get_local_storage_service().

				                get_token_metadata().get_endpoint_to_host_id_map_for_reading(), res);

				        return map_to_key_value(ctx.get_token_metadata().get_endpoint_to_host_id_map_for_reading(), res);

				    });

				    ss::get_load.set(r, [&ctx](std::unique_ptr<request> req) {

				        return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);

				    });

				    ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {

				        return service::get_local_storage_service().get_load_map().then([] (auto&& load_map) {

				    ss::get_load_map.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return ctx.lmeter.get_load_map().then([] (auto&& load_map) {

				            std::vector<ss::map_string_double> res;

				            for (auto i : load_map) {

				                ss::map_string_double val;

				@@ -221,67 +399,12 @@ void set_storage_service(http_context& ctx, routes& r) {

				                req.get_query_param("key")));

				    });

				    ss::get_snapshot_details.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().get_snapshot_details().then([] (auto result) {

				            std::vector<ss::snapshots> res;

				            for (auto& map: result) {

				                ss::snapshots all_snapshots;

				                all_snapshots.key = map.first;

				                std::vector<ss::snapshot> snapshot;

				                for (auto& cf: map.second) {

				                    ss::snapshot s;

				                    s.ks = cf.ks;

				                    s.cf = cf.cf;

				                    s.live = cf.live;

				                    s.total = cf.total;

				                    snapshot.push_back(std::move(s));

				                }

				                all_snapshots.value = std::move(snapshot);

				                res.push_back(std::move(all_snapshots));

				            }

				            return make_ready_future<json::json_return_type>(std::move(res));

				        });

				    });

				    ss::take_snapshot.set(r, [](std::unique_ptr<request> req) {

				        auto tag = req->get_query_param("tag");

				        auto column_family = req->get_query_param("cf");

				        std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");

				        auto resp = make_ready_future<>();

				        if (column_family.empty()) {

				            resp = service::get_local_storage_service().take_snapshot(tag, keynames);

				        } else {

				            if (keynames.empty()) {

				                throw httpd::bad_param_exception("The keyspace of column families must be specified");

				            }

				            if (keynames.size() > 1) {

				                throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");

				            }

				            resp = service::get_local_storage_service().take_column_family_snapshot(keynames[0], column_family, tag);

				        }

				        return resp.then([] {

				    ss::cdc_streams_check_and_repair.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return service::get_local_storage_service().check_and_repair_cdc_streams().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::del_snapshot.set(r, [](std::unique_ptr<request> req) {

				        auto tag = req->get_query_param("tag");

				        std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");

				        return service::get_local_storage_service().clear_snapshot(tag, keynames).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::true_snapshots_size.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().true_snapshots_size().then([] (int64_t size) {

				            return make_ready_future<json::json_return_type>(size);

				        });

				    });

				    ss::force_keyspace_compaction.set(r, [&ctx](std::unique_ptr<request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto column_families = split_cf(req->get_query_param("cf"));

				@@ -319,8 +442,8 @@ void set_storage_service(http_context& ctx, routes& r) {

				                for (auto cf : column_families) {

				                    column_families_vec.push_back(&db.find_column_family(keyspace, cf));

				                }

				                return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {

				                    return cm.perform_cleanup(cf);

				                return parallel_for_each(column_families_vec, [&cm, &db] (column_family* cf) {

				                    return cm.perform_cleanup(db, cf);

				                });

				            }).then([]{

				                return make_ready_future<json::json_return_type>(0);

				@@ -328,39 +451,14 @@ void set_storage_service(http_context& ctx, routes& r) {

				        });

				    });

				    ss::scrub.set(r, wrap_ks_cf([&ctx](std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {

				        // TODO: respect this

				        auto skip_corrupted = req->get_query_param("skip_corrupted");

				        auto f = make_ready_future<>();

				        if (!req_param<bool>(*req, "disable_snapshot", false)) {

				            auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());

				            f = parallel_for_each(column_families, [keyspace, tag](sstring cf) {

				                return service::get_local_storage_service().take_column_family_snapshot(keyspace, cf, tag);

				            });

				        }

				        return f.then([&ctx, keyspace, column_families] {

				            return ctx.db.invoke_on_all([=] (database& db) {

				                return do_for_each(column_families, [=, &db](sstring cfname) {

				                    auto& cm = db.get_compaction_manager();

				                    auto& cf = db.find_column_family(keyspace, cfname);

				                    return cm.perform_sstable_scrub(&cf);

				                });

				            });

				        }).then([]{

				            return make_ready_future<json::json_return_type>(0);

				        });

				    }));

				    ss::upgrade_sstables.set(r, wrap_ks_cf([&ctx](std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {

				    ss::upgrade_sstables.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {

				        bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);

				        return ctx.db.invoke_on_all([=] (database& db) {

				            return do_for_each(column_families, [=, &db](sstring cfname) {

				                auto& cm = db.get_compaction_manager();

				                auto& cf = db.find_column_family(keyspace, cfname);

				                return cm.perform_sstable_upgrade(&cf, exclude_current_version);

				                return cm.perform_sstable_upgrade(db, &cf, exclude_current_version);

				            });

				        }).then([]{

				            return make_ready_future<json::json_return_type>(0);

				@@ -383,59 +481,6 @@ void set_storage_service(http_context& ctx, routes& r) {

				    });

				    ss::repair_async.set(r, [&ctx](std::unique_ptr<request> req) {

				        static std::vector<sstring> options = {"primaryRange", "parallelism", "incremental",

				                "jobThreads", "ranges", "columnFamilies", "dataCenters", "hosts", "trace",

				                "startToken", "endToken" };

				        std::unordered_map<sstring, sstring> options_map;

				        for (auto o : options) {

				            auto s = req->get_query_param(o);

				            if (s != "") {

				                options_map[o] = s;

				            }

				        }

				        // The repair process is asynchronous: repair_start only starts it and

				        // returns immediately, not waiting for the repair to finish. The user

				        // then has other mechanisms to track the ongoing repair's progress,

				        // or stop it.

				        return repair_start(ctx.db, validate_keyspace(ctx, req->param),

				                options_map).then([] (int i) {

				                    return make_ready_future<json::json_return_type>(i);

				                });

				    });

				    ss::get_active_repair_async.set(r, [&ctx](std::unique_ptr<request> req) {

				        return get_active_repairs(ctx.db).then([] (std::vector<int> res){

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    ss::repair_async_status.set(r, [&ctx](std::unique_ptr<request> req) {

				        return repair_get_status(ctx.db, boost::lexical_cast<int>( req->get_query_param("id")))

				                .then_wrapped([] (future<repair_status>&& fut) {

				            ss::ns_repair_async_status::return_type_wrapper res;

				            try {

				                res = fut.get0();

				            } catch(std::runtime_error& e) {

				                throw httpd::bad_param_exception(e.what());

				            }

				            return make_ready_future<json::json_return_type>(json::json_return_type(res));

				        });

				    });

				    ss::force_terminate_all_repair_sessions.set(r, [](std::unique_ptr<request> req) {

				        return repair_abort_all(service::get_local_storage_service().db()).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::force_terminate_all_repair_sessions_new.set(r, [](std::unique_ptr<request> req) {

				        return repair_abort_all(service::get_local_storage_service().db()).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::decommission.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().decommission().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				@@ -571,46 +616,8 @@ void set_storage_service(http_context& ctx, routes& r) {

				        });

				    });

				    ss::stop_rpc_server.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().stop_rpc_server().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::start_rpc_server.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().start_rpc_server().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::is_rpc_server_running.set(r, [] (std::unique_ptr<request> req) {

				        return service::get_local_storage_service().is_rpc_server_running().then([] (bool running) {

				            return make_ready_future<json::json_return_type>(running);

				        });

				    });

				    ss::start_native_transport.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().start_native_transport().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::stop_native_transport.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().stop_native_transport().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::is_native_transport_running.set(r, [] (std::unique_ptr<request> req) {

				        return service::get_local_storage_service().is_native_transport_running().then([] (bool running) {

				            return make_ready_future<json::json_return_type>(running);

				        });

				    });

				    ss::join_ring.set(r, [](std::unique_ptr<request> req) {

				        return service::get_local_storage_service().join_ring().then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    ss::is_joined.set(r, [] (std::unique_ptr<request> req) {

				@@ -731,14 +738,17 @@ void set_storage_service(http_context& ctx, routes& r) {

				    });

				    ss::reset_local_schema.set(r, [](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(json_void());

				        // FIXME: We should truncate schema tables if more than one node in the cluster.

				        auto& sp = service::get_storage_proxy();

				        auto& fs = service::get_local_storage_service().features();

				        return db::schema_tables::recalculate_schema_version(sp, fs).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::set_trace_probability.set(r, [](std::unique_ptr<request> req) {

				        auto probability = req->get_query_param("probability");

				        return futurize<json::json_return_type>::apply([probability] {

				        return futurize_invoke([probability] {

				            double real_prob = std::stod(probability.c_str());

				            return tracing::tracing::tracing_instance().invoke_on_all([real_prob] (auto& local_tracing) {

				                local_tracing.set_trace_probability(real_prob);

				@@ -793,19 +803,17 @@ void set_storage_service(http_context& ctx, routes& r) {

				    });

				    ss::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto column_family = req->get_query_param("cf");

				        return make_ready_future<json::json_return_type>(json_void());

				        auto tables = split_cf(req->get_query_param("cf"));

				        return set_tables_autocompaction(ctx, keyspace, tables, true);

				    });

				    ss::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {

				        //TBD

				        unimplemented();

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto column_family = req->get_query_param("cf");

				        return make_ready_future<json::json_return_type>(json_void());

				        auto tables = split_cf(req->get_query_param("cf"));

				        return set_tables_autocompaction(ctx, keyspace, tables, false);

				    });

				    ss::deliver_hints.set(r, [](std::unique_ptr<request> req) {

				@@ -983,7 +991,7 @@ void set_storage_service(http_context& ctx, routes& r) {

				                                    e.value = p.second;

				                                    nm.attributes.push(std::move(e));

				                                }

				                                if (!cp->options().count(compression_parameters::SSTABLE_COMPRESSION)) {

				                                if (!cp->options().contains(compression_parameters::SSTABLE_COMPRESSION)) {

				                                    ss::mapper e;

				                                    e.key = compression_parameters::SSTABLE_COMPRESSION;

				                                    e.value = cp->name();

				@@ -1041,4 +1049,114 @@ void set_storage_service(http_context& ctx, routes& r) {

				}

				void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_ctl) {

				    ss::get_snapshot_details.set(r, [&snap_ctl](std::unique_ptr<request> req) {

				        return snap_ctl.local().get_snapshot_details().then([] (std::unordered_map<sstring, std::vector<db::snapshot_ctl::snapshot_details>>&& result) {

				            std::function<future<>(output_stream<char>&&)> f = [result = std::move(result)](output_stream<char>&& s) {

				                return do_with(output_stream<char>(std::move(s)), true, [&result] (output_stream<char>& s, bool& first){

				                    return s.write("[").then([&s, &first, &result] {

				                        return do_for_each(result, [&s, &first](std::tuple<sstring, std::vector<db::snapshot_ctl::snapshot_details>>&& map){

				                            return do_with(ss::snapshots(), [&s, &first, &map](ss::snapshots& all_snapshots) {

				                                all_snapshots.key = std::get<0>(map);

				                                future<> f = first ? make_ready_future<>() : s.write(", ");

				                                first = false;

				                                std::vector<ss::snapshot> snapshot;

				                                for (auto& cf: std::get<1>(map)) {

				                                    ss::snapshot snp;

				                                    snp.ks = cf.ks;

				                                    snp.cf = cf.cf;

				                                    snp.live = cf.live;

				                                    snp.total = cf.total;

				                                    snapshot.push_back(std::move(snp));

				                                }

				                                all_snapshots.value = std::move(snapshot);

				                                return f.then([&s, &all_snapshots] {

				                                    return all_snapshots.write(s);

				                                });

				                            });

				                        });

				                    }).then([&s] {

				                        return s.write("]").then([&s] {

				                            return s.close();

				                        });

				                    });

				                });

				            };

				            return make_ready_future<json::json_return_type>(std::move(f));

				        });

				    });

				    ss::take_snapshot.set(r, [&snap_ctl](std::unique_ptr<request> req) {

				        auto tag = req->get_query_param("tag");

				        auto column_families = split(req->get_query_param("cf"), ",");

				        std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");

				        auto resp = make_ready_future<>();

				        if (column_families.empty()) {

				            resp = snap_ctl.local().take_snapshot(tag, keynames);

				        } else {

				            if (keynames.empty()) {

				                throw httpd::bad_param_exception("The keyspace of column families must be specified");

				            }

				            if (keynames.size() > 1) {

				                throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");

				            }

				            resp = snap_ctl.local().take_column_family_snapshot(keynames[0], column_families, tag);

				        }

				        return resp.then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::del_snapshot.set(r, [&snap_ctl](std::unique_ptr<request> req) {

				        auto tag = req->get_query_param("tag");

				        auto column_family = req->get_query_param("cf");

				        std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");

				        return snap_ctl.local().clear_snapshot(tag, keynames, column_family).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::true_snapshots_size.set(r, [&snap_ctl](std::unique_ptr<request> req) {

				        return snap_ctl.local().true_snapshots_size().then([] (int64_t size) {

				            return make_ready_future<json::json_return_type>(size);

				        });

				    });

				    ss::scrub.set(r, wrap_ks_cf(ctx, [&snap_ctl] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {

				        const auto skip_corrupted = req_param<bool>(*req, "skip_corrupted", false);

				        auto f = make_ready_future<>();

				        if (!req_param<bool>(*req, "disable_snapshot", false)) {

				            auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());

				            f = parallel_for_each(column_families, [&snap_ctl, keyspace, tag](sstring cf) {

				                return snap_ctl.local().take_column_family_snapshot(keyspace, cf, tag);

				            });

				        }

				        return f.then([&ctx, keyspace, column_families, skip_corrupted] {

				            return ctx.db.invoke_on_all([=] (database& db) {

				                return do_for_each(column_families, [=, &db](sstring cfname) {

				                    auto& cm = db.get_compaction_manager();

				                    auto& cf = db.find_column_family(keyspace, cfname);

				                    return cm.perform_sstable_scrub(&cf, skip_corrupted);

				                });

				            });

				        }).then([]{

				            return make_ready_future<json::json_return_type>(0);

				        });

				    }));

				}

				void unset_snapshot(http_context& ctx, routes& r) {

				    ss::get_snapshot_details.unset(r);

				    ss::take_snapshot.unset(r);

				    ss::del_snapshot.unset(r);

				    ss::true_snapshots_size.unset(r);

				    ss::scrub.unset(r);

				}

				}

									
										14

api/storage_service.hh
									
												View File
												
				@@ -21,10 +21,24 @@

				#pragma once

				#include <seastar/core/sharded.hh>

				#include "api.hh"

				namespace cql_transport { class controller; }

				class thrift_controller;

				namespace db { class snapshot_ctl; }

				namespace netw { class messaging_service; }

				namespace api {

				void set_storage_service(http_context& ctx, routes& r);

				void set_repair(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms);

				void unset_repair(http_context& ctx, routes& r);

				void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl);

				void unset_transport_controller(http_context& ctx, routes& r);

				void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl);

				void unset_rpc_controller(http_context& ctx, routes& r);

				void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_ctl);

				void unset_snapshot(http_context& ctx, routes& r);

				}

									
										5

api/system.cc
									
												View File
												
				@@ -22,6 +22,7 @@

				#include "api/api-doc/system.json.hh"

				#include "api/api.hh"

				#include <seastar/core/reactor.hh>

				#include <seastar/http/exception.hh>

				#include "log.hh"

				@@ -30,6 +31,10 @@ namespace api {

				namespace hs = httpd::system_json;

				void set_system(http_context& ctx, routes& r) {

				    hs::get_system_uptime.set(r, [](const_req req) {

				        return std::chrono::duration_cast<std::chrono::milliseconds>(engine().uptime()).count();

				    });

				    hs::get_all_logger_names.set(r, [](const_req req) {

				        return logging::logger_registry().get_all_logger_names();

				    });

									
										62

atomic_cell.cc
									
												View File
												
				@@ -21,6 +21,7 @@

				#include "atomic_cell.hh"

				#include "atomic_cell_or_collection.hh"

				#include "counters.hh"

				#include "types.hh"

				/// LSA mirator for cells with irrelevant type

				@@ -207,13 +208,68 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)

				            external_value_size = cell_view.value_size();

				        }

				        // Add overhead of chunk headers. The last one is a special case.

				        external_value_size += (external_value_size - 1) / data::cell::maximum_external_chunk_length * data::cell::external_chunk_overhead;

				        external_value_size += (external_value_size - 1) / data::cell::effective_external_chunk_length * data::cell::external_chunk_overhead;

				        external_value_size += data::cell::external_last_chunk_overhead;

				    }

				    return data::cell::structure::serialized_object_size(_data.get(), ctx)

				        + imr_object_type::size_overhead + external_value_size;

				}

				std::ostream&

				operator<<(std::ostream& os, const atomic_cell_view& acv) {

				    if (acv.is_live()) {

				        return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",

				            acv.is_counter_update()

				                    ? "counter_update_value=" + to_sstring(acv.counter_update_value())

				                    : to_hex(acv.value().linearize()),

				            acv.timestamp(),

				            acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,

				            acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);

				    } else {

				        return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",

				            acv.timestamp(), acv.deletion_time().time_since_epoch().count());

				    }

				}

				std::ostream&

				operator<<(std::ostream& os, const atomic_cell& ac) {

				    return os << atomic_cell_view(ac);

				}

				std::ostream&

				operator<<(std::ostream& os, const atomic_cell_view::printer& acvp) {

				    auto& type = acvp._type;

				    auto& acv = acvp._cell;

				    if (acv.is_live()) {

				        std::ostringstream cell_value_string_builder;

				        if (type.is_counter()) {

				            if (acv.is_counter_update()) {

				                cell_value_string_builder << "counter_update_value=" << acv.counter_update_value();

				            } else {

				                cell_value_string_builder << "shards: ";

				                counter_cell_view::with_linearized(acv, [&cell_value_string_builder] (counter_cell_view& ccv) {

				                    cell_value_string_builder << ::join(", ", ccv.shards());

				                });

				            }

				        } else {

				            cell_value_string_builder << type.to_string(acv.value().linearize());

				        }

				        return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",

				            cell_value_string_builder.str(),

				            acv.timestamp(),

				            acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,

				            acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);

				    } else {

				        return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",

				            acv.timestamp(), acv.deletion_time().time_since_epoch().count());

				    }

				}

				std::ostream&

				operator<<(std::ostream& os, const atomic_cell::printer& acp) {

				    return operator<<(os, static_cast<const atomic_cell_view::printer&>(acp));

				}

				std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::printer& p) {

				    if (!p._cell._data.get()) {

				        return os << "{ null atomic_cell_or_collection }";

				@@ -223,9 +279,9 @@ std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::prin

				    if (dc::structure::get_member<dc::tags::flags>(p._cell._data.get()).get<dc::tags::collection>()) {

				        os << "collection ";

				        auto cmv = p._cell.as_collection_mutation();

				        os << to_hex(cmv.data.linearize());

				        os << collection_mutation_view::printer(*p._cdef.type, cmv);

				    } else {

				        os << p._cell.as_atomic_cell(p._cdef);

				        os << atomic_cell_view::printer(*p._cdef.type, p._cell.as_atomic_cell(p._cdef));

				    }

				    return os << " }";

				}

									
										16

atomic_cell.hh
									
												View File
												
				@@ -29,7 +29,6 @@

				#include <seastar/net//byteorder.hh>

				#include <cstdint>

				#include <iosfwd>

				#include <seastar/util/gcc6-concepts.hh>

				#include "data/cell.hh"

				#include "data/schema_info.hh"

				#include "imr/utils.hh"

				@@ -39,6 +38,7 @@

				class abstract_type;

				class collection_type_impl;

				class atomic_cell_or_collection;

				using atomic_cell_value_view = data::value_view;

				using atomic_cell_value_mutable_view = data::value_mutable_view;

				@@ -153,6 +153,14 @@ public:

				    }

				    friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);

				    class printer {

				        const abstract_type& _type;

				        const atomic_cell_view& _cell;

				    public:

				        printer(const abstract_type& type, const atomic_cell_view& cell) : _type(type), _cell(cell) {}

				        friend std::ostream& operator<<(std::ostream& os, const printer& acvp);

				    };

				};

				class atomic_cell_mutable_view final : public basic_atomic_cell_view<mutable_view::yes> {

				@@ -219,6 +227,12 @@ public:

				    static atomic_cell make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size);

				    friend class atomic_cell_or_collection;

				    friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);

				    class printer : atomic_cell_view::printer {

				    public:

				        printer(const abstract_type& type, const atomic_cell_view& cell) : atomic_cell_view::printer(type, cell) {}

				        friend std::ostream& operator<<(std::ostream& os, const printer& acvp);

				    };

				};

				class column_definition;

									
										5

auth/allow_all_authenticator.cc
									
												View File
												
				@@ -26,10 +26,7 @@

				namespace auth {

				const sstring& allow_all_authenticator_name() {

				    static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthenticator";

				    return name;

				}

				constexpr std::string_view allow_all_authenticator_name("org.apache.cassandra.auth.AllowAllAuthenticator");

				// To ensure correct initialization order, we unfortunately need to use a string literal.

				static const class_registrator<

									
										6

auth/allow_all_authenticator.hh
									
												View File
												
				@@ -37,7 +37,7 @@ class migration_manager;

				namespace auth {

				const sstring& allow_all_authenticator_name();

				extern const std::string_view allow_all_authenticator_name;

				class allow_all_authenticator final : public authenticator {

				public:

				@@ -52,8 +52,8 @@ public:

				        return make_ready_future<>();

				    }

				    virtual const sstring& qualified_java_name() const override {

				        return allow_all_authenticator_name();

				    virtual std::string_view qualified_java_name() const override {

				        return allow_all_authenticator_name;

				    }

				    virtual bool require_authentication() const override {

									
										5

auth/allow_all_authorizer.cc
									
												View File
												
				@@ -26,10 +26,7 @@

				namespace auth {

				const sstring& allow_all_authorizer_name() {

				    static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthorizer";

				    return name;

				}

				constexpr std::string_view allow_all_authorizer_name("org.apache.cassandra.auth.AllowAllAuthorizer");

				// To ensure correct initialization order, we unfortunately need to use a string literal.

				static const class_registrator<

									
										6

auth/allow_all_authorizer.hh
									
												View File
												
				@@ -34,7 +34,7 @@ class migration_manager;

				namespace auth {

				const sstring& allow_all_authorizer_name();

				extern const std::string_view allow_all_authorizer_name;

				class allow_all_authorizer final  : public authorizer {

				public:

				@@ -49,8 +49,8 @@ public:

				        return make_ready_future<>();

				    }

				    virtual const sstring& qualified_java_name() const override {

				        return allow_all_authorizer_name();

				    virtual std::string_view qualified_java_name() const override {

				        return allow_all_authorizer_name;

				    }

				    virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override {

									
										2

auth/authenticator.hh
									
												View File
												
				@@ -96,7 +96,7 @@ public:

				    ///

				    /// A fully-qualified (class with package) Java-like name for this implementation.

				    ///

				    virtual const sstring& qualified_java_name() const = 0;

				    virtual std::string_view qualified_java_name() const = 0;

				    virtual bool require_authentication() const = 0;

									
										2

auth/authorizer.hh
									
												View File
												
				@@ -100,7 +100,7 @@ public:

				    ///

				    /// A fully-qualified (class with package) Java-like name for this implementation.

				    ///

				    virtual const sstring& qualified_java_name() const = 0;

				    virtual std::string_view qualified_java_name() const = 0;

				    ///

				    /// Query for the permissions granted directly to a role for a particular \ref resource (and not any of its

									
										33

auth/common.cc
									
												View File
												
				@@ -34,10 +34,9 @@ namespace auth {

				namespace meta {

				const sstring DEFAULT_SUPERUSER_NAME("cassandra");

				const sstring AUTH_KS("system_auth");

				const sstring USERS_CF("users");

				const sstring AUTH_PACKAGE_NAME("org.apache.cassandra.auth.");

				constexpr std::string_view AUTH_KS("system_auth");

				constexpr std::string_view USERS_CF("users");

				constexpr std::string_view AUTH_PACKAGE_NAME("org.apache.cassandra.auth.");

				}

				@@ -59,22 +58,22 @@ future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_f

				    }).discard_result();

				}

				future<> create_metadata_table_if_missing(

				static future<> create_metadata_table_if_missing_impl(

				        std::string_view table_name,

				        cql3::query_processor& qp,

				        std::string_view cql,

				        ::service::migration_manager& mm) {

				    static auto ignore_existing = [] (seastar::noncopyable_function<future<>()> func) {

				        return futurize_apply(std::move(func)).handle_exception_type([] (exceptions::already_exists_exception& ignored) { });

				        return futurize_invoke(std::move(func)).handle_exception_type([] (exceptions::already_exists_exception& ignored) { });

				    };

				    auto& db = qp.db();

				    auto parsed_statement = static_pointer_cast<cql3::statements::raw::cf_statement>(

				            cql3::query_processor::parse_statement(cql));

				    auto parsed_statement = cql3::query_processor::parse_statement(cql);

				    auto& parsed_cf_statement = static_cast<cql3::statements::raw::cf_statement&>(*parsed_statement);

				    parsed_statement->prepare_keyspace(meta::AUTH_KS);

				    parsed_cf_statement.prepare_keyspace(meta::AUTH_KS);

				    auto statement = static_pointer_cast<cql3::statements::create_table_statement>(

				            parsed_statement->prepare(db, qp.get_cql_stats())->statement);

				            parsed_cf_statement.prepare(db, qp.get_cql_stats())->statement);

				    const auto schema = statement->get_cf_meta_data(qp.db());

				    const auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());

				@@ -83,9 +82,16 @@ future<> create_metadata_table_if_missing(

				    b.set_uuid(uuid);

				    schema_ptr table = b.build();

				    return ignore_existing([&mm, table = std::move(table)] () {

				        return mm.announce_new_column_family(table, false);

				        return mm.announce_new_column_family(table);

				    });

				}

				future<> create_metadata_table_if_missing(

				        std::string_view table_name,

				        cql3::query_processor& qp,

				        std::string_view cql,

				        ::service::migration_manager& mm) noexcept {

				    return futurize_invoke(create_metadata_table_if_missing_impl, table_name, qp, cql, mm);

				}

				future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db, seastar::abort_source& as) {

				@@ -103,7 +109,12 @@ future<> wait_for_schema_agreement(::service::migration_manager& mm, const datab

				}

				const timeout_config& internal_distributed_timeout_config() noexcept {

				#ifdef DEBUG

				    // Give the much slower debug tests more headroom for completing auth queries.

				    static const auto t = 30s;

				#else

				    static const auto t = 5s;

				#endif

				    static const timeout_config tc{t, t, t, t, t, t, t};

				    return tc;

				}

									
										15

auth/common.hh
									
												View File
												
				@@ -27,9 +27,10 @@

				#include <seastar/core/future.hh>

				#include <seastar/core/abort_source.hh>

				#include <seastar/util/noncopyable_function.hh>

				#include <seastar/core/reactor.hh>

				#include <seastar/core/seastar.hh>

				#include <seastar/core/resource.hh>

				#include <seastar/core/sstring.hh>

				#include <seastar/core/smp.hh>

				#include "log.hh"

				#include "seastarx.hh"

				@@ -52,16 +53,16 @@ namespace auth {

				namespace meta {

				extern const sstring DEFAULT_SUPERUSER_NAME;

				extern const sstring AUTH_KS;

				extern const sstring USERS_CF;

				extern const sstring AUTH_PACKAGE_NAME;

				constexpr std::string_view DEFAULT_SUPERUSER_NAME("cassandra");

				extern const std::string_view AUTH_KS;

				extern const std::string_view USERS_CF;

				extern const std::string_view AUTH_PACKAGE_NAME;

				}

				template <class Task>

				future<> once_among_shards(Task&& f) {

				    if (engine().cpu_id() == 0u) {

				    if (this_shard_id() == 0u) {

				        return f();

				    }

				@@ -79,7 +80,7 @@ future<> create_metadata_table_if_missing(

				        std::string_view table_name,

				        cql3::query_processor&,

				        std::string_view cql,

				        ::service::migration_manager&);

				        ::service::migration_manager&) noexcept;

				future<> wait_for_schema_agreement(::service::migration_manager&, const database&, seastar::abort_source&);

									
										31

auth/default_authorizer.cc
									
												View File
												
				@@ -51,7 +51,7 @@ extern "C" {

				#include <boost/algorithm/string/join.hpp>

				#include <boost/range.hpp>

				#include <seastar/core/reactor.hh>

				#include <seastar/core/seastar.hh>

				#include "auth/authenticated_user.hh"

				#include "auth/common.hh"

				@@ -65,15 +65,14 @@ extern "C" {

				namespace auth {

				const sstring& default_authorizer_name() {

				    static const sstring name = meta::AUTH_PACKAGE_NAME + "CassandraAuthorizer";

				    return name;

				std::string_view default_authorizer::qualified_java_name() const {

				    return "org.apache.cassandra.auth.CassandraAuthorizer";

				}

				static const sstring ROLE_NAME = "role";

				static const sstring RESOURCE_NAME = "resource";

				static const sstring PERMISSIONS_NAME = "permissions";

				static const sstring PERMISSIONS_CF = "role_permissions";

				static constexpr std::string_view ROLE_NAME = "role";

				static constexpr std::string_view RESOURCE_NAME = "resource";

				static constexpr std::string_view PERMISSIONS_NAME = "permissions";

				static constexpr std::string_view PERMISSIONS_CF = "role_permissions";

				static logging::logger alogger("default_authorizer");

				@@ -101,7 +100,7 @@ bool default_authorizer::legacy_metadata_exists() const {

				future<bool> default_authorizer::any_granted() const {

				    static const sstring query = format("SELECT * FROM {}.{} LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::LOCAL_ONE,

				            infinite_timeout_config,

				@@ -115,7 +114,7 @@ future<> default_authorizer::migrate_legacy_metadata() const {

				    alogger.info("Starting migration of legacy permissions metadata.");

				    static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::LOCAL_ONE,

				            infinite_timeout_config).then([this](::shared_ptr<cql3::untyped_result_set> results) {

				@@ -195,7 +194,7 @@ default_authorizer::authorize(const role_or_anonymous& maybe_role, const resourc

				            ROLE_NAME,

				            RESOURCE_NAME);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::LOCAL_ONE,

				            infinite_timeout_config,

				@@ -224,7 +223,7 @@ default_authorizer::modify(

				                    ROLE_NAME,

				                    RESOURCE_NAME),

				            [this, &role_name, set, &resource](const auto& query) {

				        return _qp.process(

				        return _qp.execute_internal(

				                query,

				                db::consistency_level::ONE,

				                internal_distributed_timeout_config(),

				@@ -249,7 +248,7 @@ future<std::vector<permission_details>> default_authorizer::list_all() const {

				            meta::AUTH_KS,

				            PERMISSIONS_CF);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::ONE,

				            internal_distributed_timeout_config(),

				@@ -276,7 +275,7 @@ future<> default_authorizer::revoke_all(std::string_view role_name) const {

				            PERMISSIONS_CF,

				            ROLE_NAME);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::ONE,

				            internal_distributed_timeout_config(),

				@@ -296,7 +295,7 @@ future<> default_authorizer::revoke_all(const resource& resource) const {

				            PERMISSIONS_CF,

				            RESOURCE_NAME);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::LOCAL_ONE,

				            infinite_timeout_config,

				@@ -313,7 +312,7 @@ future<> default_authorizer::revoke_all(const resource& resource) const {

				                        ROLE_NAME,

				                        RESOURCE_NAME);

				                return _qp.process(

				                return _qp.execute_internal(

				                        query,

				                        db::consistency_level::LOCAL_ONE,

				                        infinite_timeout_config,

									
										6

auth/default_authorizer.hh
									
												View File
												
				@@ -51,8 +51,6 @@

				namespace auth {

				const sstring& default_authorizer_name();

				class default_authorizer : public authorizer {

				    cql3::query_processor& _qp;

				@@ -71,9 +69,7 @@ public:

				    virtual future<> stop() override;

				    virtual const sstring& qualified_java_name() const override {

				        return default_authorizer_name();

				    }

				    virtual std::string_view qualified_java_name() const override;

				    virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override;

									
										60

auth/password_authenticator.cc
									
												View File
												
				@@ -48,7 +48,7 @@

				#include <optional>

				#include <boost/algorithm/cxx11/all_of.hpp>

				#include <seastar/core/reactor.hh>

				#include <seastar/core/seastar.hh>

				#include "auth/authenticated_user.hh"

				#include "auth/common.hh"

				@@ -62,15 +62,12 @@

				namespace auth {

				const sstring& password_authenticator_name() {

				    static const sstring name = meta::AUTH_PACKAGE_NAME + "PasswordAuthenticator";

				    return name;

				}

				constexpr std::string_view password_authenticator_name("org.apache.cassandra.auth.PasswordAuthenticator");

				// name of the hash column.

				static const sstring SALTED_HASH = "salted_hash";

				static const sstring DEFAULT_USER_NAME = meta::DEFAULT_SUPERUSER_NAME;

				static const sstring DEFAULT_USER_PASSWORD = meta::DEFAULT_SUPERUSER_NAME;

				static constexpr std::string_view SALTED_HASH = "salted_hash";

				static constexpr std::string_view DEFAULT_USER_NAME = meta::DEFAULT_SUPERUSER_NAME;

				static const sstring DEFAULT_USER_PASSWORD = sstring(meta::DEFAULT_SUPERUSER_NAME);

				static logging::logger plogger("password_authenticator");

				@@ -96,10 +93,13 @@ static bool has_salted_hash(const cql3::untyped_result_set_row& row) {

				    return !row.get_or<sstring>(SALTED_HASH, "").empty();

				}

				static const sstring update_row_query = format("UPDATE {} SET {} = ? WHERE {} = ?",

				        meta::roles_table::qualified_name(),

				        SALTED_HASH,

				        meta::roles_table::role_col_name);

				static const sstring& update_row_query() {

				    static const sstring update_row_query = format("UPDATE {} SET {} = ? WHERE {} = ?",

				            meta::roles_table::qualified_name,

				            SALTED_HASH,

				            meta::roles_table::role_col_name);

				    return update_row_query;

				}

				static const sstring legacy_table_name{"credentials"};

				@@ -111,7 +111,7 @@ future<> password_authenticator::migrate_legacy_metadata() const {

				    plogger.info("Starting migration of legacy authentication metadata.");

				    static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::QUORUM,

				            internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {

				@@ -119,8 +119,8 @@ future<> password_authenticator::migrate_legacy_metadata() const {

				            auto username = row.get_as<sstring>("username");

				            auto salted_hash = row.get_as<sstring>(SALTED_HASH);

				            return _qp.process(

				                    update_row_query,

				            return _qp.execute_internal(

				                    update_row_query(),

				                    consistency_for_user(username),

				                    internal_distributed_timeout_config(),

				                    {std::move(salted_hash), username}).discard_result();

				@@ -136,8 +136,8 @@ future<> password_authenticator::migrate_legacy_metadata() const {

				future<> password_authenticator::create_default_if_missing() const {

				    return default_role_row_satisfies(_qp, &has_salted_hash).then([this](bool exists) {

				        if (!exists) {

				            return _qp.process(

				                    update_row_query,

				            return _qp.execute_internal(

				                    update_row_query(),

				                    db::consistency_level::QUORUM,

				                    internal_distributed_timeout_config(),

				                    {passwords::hash(DEFAULT_USER_PASSWORD, rng_for_salt), DEFAULT_USER_NAME}).then([](auto&&) {

				@@ -194,8 +194,8 @@ db::consistency_level password_authenticator::consistency_for_user(std::string_v

				    return db::consistency_level::LOCAL_ONE;

				}

				const sstring& password_authenticator::qualified_java_name() const {

				    return password_authenticator_name();

				std::string_view password_authenticator::qualified_java_name() const {

				    return password_authenticator_name;

				}

				bool password_authenticator::require_authentication() const {

				@@ -212,10 +212,10 @@ authentication_option_set password_authenticator::alterable_options() const {

				future<authenticated_user> password_authenticator::authenticate(

				                const credentials_map& credentials) const {

				    if (!credentials.count(USERNAME_KEY)) {

				    if (!credentials.contains(USERNAME_KEY)) {

				        throw exceptions::authentication_exception(format("Required key '{}' is missing", USERNAME_KEY));

				    }

				    if (!credentials.count(PASSWORD_KEY)) {

				    if (!credentials.contains(PASSWORD_KEY)) {

				        throw exceptions::authentication_exception(format("Required key '{}' is missing", PASSWORD_KEY));

				    }

				@@ -227,13 +227,13 @@ future<authenticated_user> password_authenticator::authenticate(

				    // obsolete prepared statements pretty quickly.

				    // Rely on query processing caching statements instead, and lets assume

				    // that a map lookup string->statement is not gonna kill us much.

				    return futurize_apply([this, username, password] {

				    return futurize_invoke([this, username, password] {

				        static const sstring query = format("SELECT {} FROM {} WHERE {} = ?",

				                SALTED_HASH,

				                meta::roles_table::qualified_name(),

				                meta::roles_table::qualified_name,

				                meta::roles_table::role_col_name);

				        return _qp.process(

				        return _qp.execute_internal(

				                query,

				                consistency_for_user(username),

				                internal_distributed_timeout_config(),

				@@ -267,8 +267,8 @@ future<> password_authenticator::create(std::string_view role_name, const authen

				        return make_ready_future<>();

				    }

				    return _qp.process(

				            update_row_query,

				    return _qp.execute_internal(

				            update_row_query(),

				            consistency_for_user(role_name),

				            internal_distributed_timeout_config(),

				            {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();

				@@ -280,11 +280,11 @@ future<> password_authenticator::alter(std::string_view role_name, const authent

				    }

				    static const sstring query = format("UPDATE {} SET {} = ? WHERE {} = ?",

				            meta::roles_table::qualified_name(),

				            meta::roles_table::qualified_name,

				            SALTED_HASH,

				            meta::roles_table::role_col_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            consistency_for_user(role_name),

				            internal_distributed_timeout_config(),

				@@ -294,10 +294,10 @@ future<> password_authenticator::alter(std::string_view role_name, const authent

				future<> password_authenticator::drop(std::string_view name) const {

				    static const sstring query = format("DELETE {} FROM {} WHERE {} = ?",

				            SALTED_HASH,

				            meta::roles_table::qualified_name(),

				            meta::roles_table::qualified_name,

				            meta::roles_table::role_col_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query, consistency_for_user(name),

				            internal_distributed_timeout_config(),

				            {sstring(name)}).discard_result();

									
										4

auth/password_authenticator.hh
									
												View File
												
				@@ -52,7 +52,7 @@ class migration_manager;

				namespace auth {

				const sstring& password_authenticator_name();

				extern const std::string_view password_authenticator_name;

				class password_authenticator : public authenticator {

				    cql3::query_processor& _qp;

				@@ -71,7 +71,7 @@ public:

				    virtual future<> stop() override;

				    virtual const sstring& qualified_java_name() const override;

				    virtual std::string_view qualified_java_name() const override;

				    virtual bool require_authentication() const override;

									
										5

auth/role_manager.hh
									
												View File
												
				@@ -33,6 +33,7 @@

				#include "auth/resource.hh"

				#include "seastarx.hh"

				#include "exceptions/exceptions.hh"

				namespace auth {

				@@ -52,9 +53,9 @@ struct role_config_update final {

				///

				/// A logical argument error for a role-management operation.

				///

				class roles_argument_exception : public std::invalid_argument {

				class roles_argument_exception : public exceptions::invalid_request_exception {

				public:

				    using std::invalid_argument::invalid_argument;

				    using exceptions::invalid_request_exception::invalid_request_exception;

				};

				class role_already_exists : public roles_argument_exception {

									
										17

auth/roles-metadata.cc
									
												View File
												
				@@ -45,16 +45,13 @@ std::string_view creation_query() {

				            "  member_of set<text>,"

				            "  salted_hash text"

				            ")",

				            qualified_name(),

				            qualified_name,

				            role_col_name);

				    return instance;

				}

				std::string_view qualified_name() noexcept {

				    static const sstring instance = AUTH_KS + "." + sstring(name);

				    return instance;

				}

				constexpr std::string_view qualified_name("system_auth.roles");

				}

				@@ -64,18 +61,18 @@ future<bool> default_role_row_satisfies(

				        cql3::query_processor& qp,

				        std::function<bool(const cql3::untyped_result_set_row&)> p) {

				    static const sstring query = format("SELECT * FROM {} WHERE {} = ?",

				            meta::roles_table::qualified_name(),

				            meta::roles_table::qualified_name,

				            meta::roles_table::role_col_name);

				    return do_with(std::move(p), [&qp](const auto& p) {

				        return qp.process(

				        return qp.execute_internal(

				                query,

				                db::consistency_level::ONE,

				                infinite_timeout_config,

				                {meta::DEFAULT_SUPERUSER_NAME},

				                true).then([&qp, &p](::shared_ptr<cql3::untyped_result_set> results) {

				            if (results->empty()) {

				                return qp.process(

				                return qp.execute_internal(

				                        query,

				                        db::consistency_level::QUORUM,

				                        internal_distributed_timeout_config(),

				@@ -97,10 +94,10 @@ future<bool> default_role_row_satisfies(

				future<bool> any_nondefault_role_row_satisfies(

				        cql3::query_processor& qp,

				        std::function<bool(const cql3::untyped_result_set_row&)> p) {

				    static const sstring query = format("SELECT * FROM {}", meta::roles_table::qualified_name());

				    static const sstring query = format("SELECT * FROM {}", meta::roles_table::qualified_name);

				    return do_with(std::move(p), [&qp](const auto& p) {

				        return qp.process(

				        return qp.execute_internal(

				                query,

				                db::consistency_level::QUORUM,

				                internal_distributed_timeout_config()).then([&p](::shared_ptr<cql3::untyped_result_set> results) {

									
										2

auth/roles-metadata.hh
									
												View File
												
				@@ -43,7 +43,7 @@ std::string_view creation_query();

				constexpr std::string_view name{"roles", 5};

				std::string_view qualified_name() noexcept;

				extern const std::string_view qualified_name;

				constexpr std::string_view role_col_name{"role", 4};

									
										114

auth/service.cc
									
												View File
												
				@@ -31,15 +31,13 @@

				#include "auth/allow_all_authenticator.hh"

				#include "auth/allow_all_authorizer.hh"

				#include "auth/common.hh"

				#include "auth/password_authenticator.hh"

				#include "auth/role_or_anonymous.hh"

				#include "auth/standard_role_manager.hh"

				#include "cql3/query_processor.hh"

				#include "cql3/untyped_result_set.hh"

				#include "db/consistency_level_type.hh"

				#include "exceptions/exceptions.hh"

				#include "log.hh"

				#include "service/migration_listener.hh"

				#include "service/migration_manager.hh"

				#include "utils/class_registrator.hh"

				#include "database.hh"

				@@ -114,45 +112,35 @@ static future<> validate_role_exists(const service& ser, std::string_view role_n

				service::service(

				        permissions_cache_config c,

				        cql3::query_processor& qp,

				        ::service::migration_manager& mm,

				        ::service::migration_notifier& mn,

				        std::unique_ptr<authorizer> z,

				        std::unique_ptr<authenticator> a,

				        std::unique_ptr<role_manager> r)

				            : _permissions_cache_config(std::move(c))

				            , _permissions_cache(nullptr)

				            , _qp(qp)

				            , _migration_manager(mm)

				            , _mnotifier(mn)

				            , _authorizer(std::move(z))

				            , _authenticator(std::move(a))

				            , _role_manager(std::move(r))

				            , _migration_listener(std::make_unique<auth_migration_listener>(*_authorizer)) {

				    // The password authenticator requires that the `standard_role_manager` is running so that the roles metadata table

				    // it manages is created and updated. This cross-module dependency is rather gross, but we have to maintain it for

				    // the sake of compatibility with Apache Cassandra and its choice of auth. schema.

				    if ((_authenticator->qualified_java_name() == password_authenticator_name())

				            && (_role_manager->qualified_java_name() != standard_role_manager_name())) {

				        throw incompatible_module_combination(

				                format("The {} authenticator must be loaded alongside the {} role-manager.",

				                        password_authenticator_name(),

				                        standard_role_manager_name()));

				    }

				}

				            , _migration_listener(std::make_unique<auth_migration_listener>(*_authorizer)) {}

				service::service(

				        permissions_cache_config c,

				        cql3::query_processor& qp,

				        ::service::migration_notifier& mn,

				        ::service::migration_manager& mm,

				        const service_config& sc)

				            : service(

				                      std::move(c),

				                      qp,

				                      mm,

				                      mn,

				                      create_object<authorizer>(sc.authorizer_java_name, qp, mm),

				                      create_object<authenticator>(sc.authenticator_java_name, qp, mm),

				                      create_object<role_manager>(sc.role_manager_java_name, qp, mm)) {

				}

				future<> service::create_keyspace_if_missing() const {

				future<> service::create_keyspace_if_missing(::service::migration_manager& mm) const {

				    auto& db = _qp.db();

				    if (!db.has_keyspace(meta::AUTH_KS)) {

				@@ -166,24 +154,24 @@ future<> service::create_keyspace_if_missing() const {

				        // We use min_timestamp so that default keyspace metadata will loose with any manual adjustments.

				        // See issue #2129.

				        return _migration_manager.announce_new_keyspace(ksm, api::min_timestamp, false);

				        return mm.announce_new_keyspace(ksm, api::min_timestamp);

				    }

				    return make_ready_future<>();

				}

				future<> service::start() {

				    return once_among_shards([this] {

				        return create_keyspace_if_missing();

				future<> service::start(::service::migration_manager& mm) {

				    return once_among_shards([this, &mm] {

				        return create_keyspace_if_missing(mm);

				    }).then([this] {

				        return _role_manager->start().then([this] {

				            return when_all_succeed(_authorizer->start(), _authenticator->start());

				            return when_all_succeed(_authorizer->start(), _authenticator->start()).discard_result();

				        });

				    }).then([this] {

				        _permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);

				    }).then([this] {

				        return once_among_shards([this] {

				            _migration_manager.register_listener(_migration_listener.get());

				            _mnotifier.register_listener(_migration_listener.get());

				            return make_ready_future<>();

				        });

				    });

				@@ -192,10 +180,13 @@ future<> service::start() {

				future<> service::stop() {

				    // Only one of the shards has the listener registered, but let's try to

				    // unregister on each one just to make sure.

				    _migration_manager.unregister_listener(_migration_listener.get());

				    return _permissions_cache->stop().then([this] {

				        return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop());

				    return _mnotifier.unregister_listener(_migration_listener.get()).then([this] {

				        if (_permissions_cache) {

				            return _permissions_cache->stop();

				        }

				        return make_ready_future<>();

				    }).then([this] {

				        return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop()).discard_result();

				    });

				}

				@@ -216,7 +207,7 @@ future<bool> service::has_existing_legacy_users() const {

				    // This logic is borrowed directly from Apache Cassandra. By first checking for the presence of the default user, we

				    // can potentially avoid doing a range query with a high consistency level.

				    return _qp.process(

				    return _qp.execute_internal(

				            default_user_query,

				            db::consistency_level::ONE,

				            infinite_timeout_config,

				@@ -226,7 +217,7 @@ future<bool> service::has_existing_legacy_users() const {

				            return make_ready_future<bool>(true);

				        }

				        return _qp.process(

				        return _qp.execute_internal(

				                default_user_query,

				                db::consistency_level::QUORUM,

				                infinite_timeout_config,

				@@ -236,7 +227,7 @@ future<bool> service::has_existing_legacy_users() const {

				                return make_ready_future<bool>(true);

				            }

				            return _qp.process(

				            return _qp.execute_internal(

				                    all_users_query,

				                    db::consistency_level::QUORUM,

				                    infinite_timeout_config).then([](auto results) {

				@@ -372,25 +363,28 @@ future<permission_set> get_permissions(const service& ser, const authenticated_u

				}

				bool is_enforcing(const service& ser)  {

				    const bool enforcing_authorizer = ser.underlying_authorizer().qualified_java_name() != allow_all_authorizer_name();

				    const bool enforcing_authorizer = ser.underlying_authorizer().qualified_java_name() != allow_all_authorizer_name;

				    const bool enforcing_authenticator = ser.underlying_authenticator().qualified_java_name()

				            != allow_all_authenticator_name();

				            != allow_all_authenticator_name;

				    return enforcing_authorizer || enforcing_authenticator;

				}

				bool is_protected(const service& ser, const resource& r) noexcept {

				    return ser.underlying_role_manager().protected_resources().count(r)

				            || ser.underlying_authenticator().protected_resources().count(r)

				            || ser.underlying_authorizer().protected_resources().count(r);

				bool is_protected(const service& ser, command_desc cmd) noexcept {

				    if (cmd.type_ == command_desc::type::ALTER_WITH_OPTS) {

				        return false; // Table attributes are OK to modify; see #7057.

				    }

				    return ser.underlying_role_manager().protected_resources().contains(cmd.resource)

				            || ser.underlying_authenticator().protected_resources().contains(cmd.resource)

				            || ser.underlying_authorizer().protected_resources().contains(cmd.resource);

				}

				static void validate_authentication_options_are_supported(

				        const authentication_options& options,

				        const authentication_option_set& supported) {

				    const auto check = [&supported](authentication_option k) {

				        if (supported.count(k) == 0) {

				        if (!supported.contains(k)) {

				            throw unsupported_authentication_option(k);

				        }

				    };

				@@ -415,7 +409,7 @@ future<> create_role(

				            return make_ready_future<>();

				        }

				        return futurize_apply(

				        return futurize_invoke(

				                &validate_authentication_options_are_supported,

				                options,

				                ser.underlying_authenticator().supported_options()).then([&ser, name, &options] {

				@@ -439,7 +433,7 @@ future<> alter_role(

				            return make_ready_future<>();

				        }

				        return futurize_apply(

				        return futurize_invoke(

				                &validate_authentication_options_are_supported,

				                options,

				                ser.underlying_authenticator().supported_options()).then([&ser, name, &options] {

				@@ -454,7 +448,9 @@ future<> drop_role(const service& ser, std::string_view name) {

				        return when_all_succeed(

				                a.revoke_all(name),

				                a.revoke_all(r)).handle_exception_type([](const unsupported_authorization_operation&) {

				                a.revoke_all(r))

				                    .discard_result()

				                    .handle_exception_type([](const unsupported_authorization_operation&) {

				            // Nothing.

				        });

				    }).then([&ser, name] {

				@@ -467,8 +463,8 @@ future<> drop_role(const service& ser, std::string_view name) {

				future<bool> has_role(const service& ser, std::string_view grantee, std::string_view name) {

				    return when_all_succeed(

				            validate_role_exists(ser, name),

				            ser.get_roles(grantee)).then([name](role_set all_roles) {

				        return make_ready_future<bool>(all_roles.count(sstring(name)) != 0);

				            ser.get_roles(grantee)).then_unpack([name](role_set all_roles) {

				        return make_ready_future<bool>(all_roles.contains(sstring(name)));

				    });

				}

				future<bool> has_role(const service& ser, const authenticated_user& u, std::string_view name) {

				@@ -525,14 +521,9 @@ future<std::vector<permission_details>> list_filtered_permissions(

				                    ? auth::expand_resource_family(r)

				                    : auth::resource_set{r};

				            all_details.erase(

				                    std::remove_if(

				                            all_details.begin(),

				                            all_details.end(),

				                            [&resources](const permission_details& pd) {

				                        return resources.count(pd.resource) == 0;

				                    }),

				                    all_details.end());

				            std::erase_if(all_details, [&resources](const permission_details& pd) {

				                return !resources.contains(pd.resource);

				            });

				        }

				        std::transform(

				@@ -545,11 +536,9 @@ future<std::vector<permission_details>> list_filtered_permissions(

				                });

				        // Eliminate rows with an empty permission set.

				        all_details.erase(

				                std::remove_if(all_details.begin(), all_details.end(), [](const permission_details& pd) {

				                    return pd.permissions.mask() == 0;

				                }),

				                all_details.end());

				        std::erase_if(all_details, [](const permission_details& pd) {

				            return pd.permissions.mask() == 0;

				        });

				        if (!role_name) {

				            return make_ready_future<std::vector<permission_details>>(std::move(all_details));

				@@ -561,14 +550,9 @@ future<std::vector<permission_details>> list_filtered_permissions(

				        return do_with(std::move(all_details), [&ser, role_name](auto& all_details) {

				            return ser.get_roles(*role_name).then([&all_details](role_set all_roles) {

				                all_details.erase(

				                        std::remove_if(

				                                all_details.begin(),

				                                all_details.end(),

				                                [&all_roles](const permission_details& pd) {

				                            return all_roles.count(pd.role_name) == 0;

				                        }),

				                        all_details.end());

				                std::erase_if(all_details, [&all_roles](const permission_details& pd) {

				                    return !all_roles.contains(pd.role_name);

				                });

				                return make_ready_future<std::vector<permission_details>>(std::move(all_details));

				            });

									
										28

auth/service.hh
									
												View File
												
				@@ -28,6 +28,7 @@

				#include <seastar/core/future.hh>

				#include <seastar/core/sstring.hh>

				#include <seastar/util/bool_class.hh>

				#include <seastar/core/sharded.hh>

				#include "auth/authenticator.hh"

				#include "auth/authorizer.hh"

				@@ -42,6 +43,7 @@ class query_processor;

				namespace service {

				class migration_manager;

				class migration_notifier;

				class migration_listener;

				}

				@@ -76,13 +78,15 @@ public:

				///

				/// All state associated with access-control is stored externally to any particular instance of this class.

				///

				class service final {

				/// peering_sharded_service inheritance is needed to be able to access shard local authentication service

				/// given an object from another shard. Used for bouncing lwt requests to correct shard.

				class service final : public seastar::peering_sharded_service<service> {

				    permissions_cache_config _permissions_cache_config;

				    std::unique_ptr<permissions_cache> _permissions_cache;

				    cql3::query_processor& _qp;

				    ::service::migration_manager& _migration_manager;

				    ::service::migration_notifier& _mnotifier;

				    std::unique_ptr<authorizer> _authorizer;

				@@ -97,7 +101,7 @@ public:

				    service(

				            permissions_cache_config,

				            cql3::query_processor&,

				            ::service::migration_manager&,

				            ::service::migration_notifier&,

				            std::unique_ptr<authorizer>,

				            std::unique_ptr<authenticator>,

				            std::unique_ptr<role_manager>);

				@@ -110,10 +114,11 @@ public:

				    service(

				            permissions_cache_config,

				            cql3::query_processor&,

				            ::service::migration_notifier&,

				            ::service::migration_manager&,

				            const service_config&);

				    future<> start();

				    future<> start(::service::migration_manager&);

				    future<> stop();

				@@ -159,7 +164,7 @@ public:

				private:

				    future<bool> has_existing_legacy_users() const;

				    future<> create_keyspace_if_missing() const;

				    future<> create_keyspace_if_missing(::service::migration_manager& mm) const;

				};

				future<bool> has_superuser(const service&, const authenticated_user&);

				@@ -176,10 +181,21 @@ future<permission_set> get_permissions(const service&, const authenticated_user&

				///

				bool is_enforcing(const service&);

				/// A description of a CQL command from which auth::service can tell whether or not this command could endanger

				/// internal data on which auth::service depends.

				struct command_desc {

				    auth::permission permission; ///< Nature of the command's alteration.

				    const ::auth::resource& resource; ///< Resource impacted by this command.

				    enum class type {

				        ALTER_WITH_OPTS, ///< Command is ALTER ... WITH ...

				        OTHER

				    } type_ = type::OTHER;

				};

				///

				/// Protected resources cannot be modified even if the performer has permissions to do so.

				///

				bool is_protected(const service&, const resource&) noexcept;

				bool is_protected(const service&, command_desc) noexcept;

				///

				/// Create a role with optional authentication information.

									
										72

auth/standard_role_manager.cc
									
												View File
												
				@@ -35,6 +35,7 @@

				#include "auth/common.hh"

				#include "auth/roles-metadata.hh"

				#include "cql3/query_processor.hh"

				#include "cql3/untyped_result_set.hh"

				#include "db/consistency_level_type.hh"

				#include "exceptions/exceptions.hh"

				#include "log.hh"

				@@ -48,11 +49,7 @@ namespace meta {

				namespace role_members_table {

				constexpr std::string_view name{"role_members" , 12};

				static std::string_view qualified_name() noexcept {

				    static const sstring instance = AUTH_KS + "." + sstring(name);

				    return instance;

				}

				constexpr std::string_view qualified_name("system_auth.role_members");

				}

				@@ -83,10 +80,10 @@ static db::consistency_level consistency_for_role(std::string_view role_name) no

				static future<std::optional<record>> find_record(cql3::query_processor& qp, std::string_view role_name) {

				    static const sstring query = format("SELECT * FROM {} WHERE {} = ?",

				            meta::roles_table::qualified_name(),

				            meta::roles_table::qualified_name,

				            meta::roles_table::role_col_name);

				    return qp.process(

				    return qp.execute_internal(

				            query,

				            consistency_for_role(role_name),

				            internal_distributed_timeout_config(),

				@@ -123,13 +120,8 @@ static bool has_can_login(const cql3::untyped_result_set_row& row) {

				    return row.has("can_login") && !(boolean_type->deserialize(row.get_blob("can_login")).is_null());

				}

				std::string_view standard_role_manager_name() noexcept {

				    static const sstring instance = meta::AUTH_PACKAGE_NAME + "CassandraRoleManager";

				    return instance;

				}

				std::string_view standard_role_manager::qualified_java_name() const noexcept {

				    return standard_role_manager_name();

				    return "org.apache.cassandra.auth.CassandraRoleManager";

				}

				const resource_set& standard_role_manager::protected_resources() const {

				@@ -147,7 +139,7 @@ future<> standard_role_manager::create_metadata_tables_if_missing() const {

				            "  member text,"

				            "  PRIMARY KEY (role, member)"

				            ")",

				            meta::role_members_table::qualified_name());

				            meta::role_members_table::qualified_name);

				    return when_all_succeed(

				@@ -160,17 +152,17 @@ future<> standard_role_manager::create_metadata_tables_if_missing() const {

				                    meta::role_members_table::name,

				                    _qp,

				                    create_role_members_query,

				                    _migration_manager));

				                    _migration_manager)).discard_result();

				}

				future<> standard_role_manager::create_default_role_if_missing() const {

				    return default_role_row_satisfies(_qp, &has_can_login).then([this](bool exists) {

				        if (!exists) {

				            static const sstring query = format("INSERT INTO {} ({}, is_superuser, can_login) VALUES (?, true, true)",

				                    meta::roles_table::qualified_name(),

				                    meta::roles_table::qualified_name,

				                    meta::roles_table::role_col_name);

				            return _qp.process(

				            return _qp.execute_internal(

				                    query,

				                    db::consistency_level::QUORUM,

				                    internal_distributed_timeout_config(),

				@@ -197,7 +189,7 @@ future<> standard_role_manager::migrate_legacy_metadata() const {

				    log.info("Starting migration of legacy user metadata.");

				    static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::QUORUM,

				            internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {

				@@ -255,10 +247,10 @@ future<> standard_role_manager::stop() {

				future<> standard_role_manager::create_or_replace(std::string_view role_name, const role_config& c) const {

				    static const sstring query = format("INSERT INTO {} ({}, is_superuser, can_login) VALUES (?, ?, ?)",

				            meta::roles_table::qualified_name(),

				            meta::roles_table::qualified_name,

				            meta::roles_table::role_col_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            consistency_for_role(role_name),

				            internal_distributed_timeout_config(),

				@@ -298,9 +290,9 @@ standard_role_manager::alter(std::string_view role_name, const role_config_updat

				            return make_ready_future<>();

				        }

				        return _qp.process(

				        return _qp.execute_internal(

				                format("UPDATE {} SET {} WHERE {} = ?",

				                        meta::roles_table::qualified_name(),

				                        meta::roles_table::qualified_name,

				                        build_column_assignments(u),

				                        meta::roles_table::role_col_name),

				                consistency_for_role(role_name),

				@@ -318,9 +310,9 @@ future<> standard_role_manager::drop(std::string_view role_name) const {

				        // First, revoke this role from all roles that are members of it.

				        const auto revoke_from_members = [this, role_name] {

				            static const sstring query = format("SELECT member FROM {} WHERE role = ?",

				                    meta::role_members_table::qualified_name());

				                    meta::role_members_table::qualified_name);

				            return _qp.process(

				            return _qp.execute_internal(

				                    query,

				                    consistency_for_role(role_name),

				                    internal_distributed_timeout_config(),

				@@ -356,17 +348,17 @@ future<> standard_role_manager::drop(std::string_view role_name) const {

				        // Finally, delete the role itself.

				        auto delete_role = [this, role_name] {

				            static const sstring query = format("DELETE FROM {} WHERE {} = ?",

				                    meta::roles_table::qualified_name(),

				                    meta::roles_table::qualified_name,

				                    meta::roles_table::role_col_name);

				            return _qp.process(

				            return _qp.execute_internal(

				                    query,

				                    consistency_for_role(role_name),

				                    internal_distributed_timeout_config(),

				                    {sstring(role_name)}).discard_result();

				        };

				        return when_all_succeed(revoke_from_members(), revoke_members_of()).then([delete_role = std::move(delete_role)] {

				        return when_all_succeed(revoke_from_members(), revoke_members_of()).then_unpack([delete_role = std::move(delete_role)] {

				            return delete_role();

				        });

				    });

				@@ -382,11 +374,11 @@ standard_role_manager::modify_membership(

				    const auto modify_roles = [this, role_name, grantee_name, ch] {

				        const auto query = format(

				                "UPDATE {} SET member_of = member_of {} ? WHERE {} = ?",

				                meta::roles_table::qualified_name(),

				                meta::roles_table::qualified_name,

				                (ch == membership_change::add ? '+' : '-'),

				                meta::roles_table::role_col_name);

				        return _qp.process(

				        return _qp.execute_internal(

				                query,

				                consistency_for_role(grantee_name),

				                internal_distributed_timeout_config(),

				@@ -396,17 +388,17 @@ standard_role_manager::modify_membership(

				    const auto modify_role_members = [this, role_name, grantee_name, ch] {

				        switch (ch) {

				            case membership_change::add:

				                return _qp.process(

				                return _qp.execute_internal(

				                        format("INSERT INTO {} (role, member) VALUES (?, ?)",

				                                meta::role_members_table::qualified_name()),

				                                meta::role_members_table::qualified_name),

				                        consistency_for_role(role_name),

				                        internal_distributed_timeout_config(),

				                        {sstring(role_name), sstring(grantee_name)}).discard_result();

				            case membership_change::remove:

				                return _qp.process(

				                return _qp.execute_internal(

				                        format("DELETE FROM {} WHERE role = ? AND member = ?",

				                                meta::role_members_table::qualified_name()),

				                                meta::role_members_table::qualified_name),

				                        consistency_for_role(role_name),

				                        internal_distributed_timeout_config(),

				                        {sstring(role_name), sstring(grantee_name)}).discard_result();

				@@ -415,7 +407,7 @@ standard_role_manager::modify_membership(

				        return make_ready_future<>();

				    };

				    return when_all_succeed(modify_roles(), modify_role_members());

				    return when_all_succeed(modify_roles(), modify_role_members).discard_result();

				}

				future<>

				@@ -424,7 +416,7 @@ standard_role_manager::grant(std::string_view grantee_name, std::string_view rol

				        return this->query_granted(

				                grantee_name,

				                recursive_role_query::yes).then([role_name, grantee_name](role_set roles) {

				            if (roles.count(sstring(role_name)) != 0) {

				            if (roles.contains(sstring(role_name))) {

				                throw role_already_included(grantee_name, role_name);

				            }

				@@ -436,7 +428,7 @@ standard_role_manager::grant(std::string_view grantee_name, std::string_view rol

				        return this->query_granted(

				                role_name,

				                recursive_role_query::yes).then([role_name, grantee_name](role_set roles) {

				            if (roles.count(sstring(grantee_name)) != 0) {

				            if (roles.contains(sstring(grantee_name))) {

				                throw role_already_included(role_name, grantee_name);

				            }

				@@ -444,7 +436,7 @@ standard_role_manager::grant(std::string_view grantee_name, std::string_view rol

				        });

				    };

				   return when_all_succeed(check_redundant(), check_cycle()).then([this, role_name, grantee_name] {

				   return when_all_succeed(check_redundant(), check_cycle()).then_unpack([this, role_name, grantee_name] {

				       return this->modify_membership(grantee_name, role_name, membership_change::add);

				   });

				}

				@@ -459,7 +451,7 @@ standard_role_manager::revoke(std::string_view revokee_name, std::string_view ro

				        return this->query_granted(

				                revokee_name,

				                recursive_role_query::no).then([revokee_name, role_name](role_set roles) {

				            if (roles.count(sstring(role_name)) == 0) {

				            if (!roles.contains(sstring(role_name))) {

				                throw revoke_ungranted_role(revokee_name, role_name);

				            }

				@@ -503,12 +495,12 @@ future<role_set> standard_role_manager::query_granted(std::string_view grantee_n

				future<role_set> standard_role_manager::query_all() const {

				    static const sstring query = format("SELECT {} FROM {}",

				            meta::roles_table::role_col_name,

				            meta::roles_table::qualified_name());

				            meta::roles_table::qualified_name);

				    // To avoid many copies of a view.

				    static const auto role_col_name_string = sstring(meta::roles_table::role_col_name);

				    return _qp.process(

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::QUORUM,

				            internal_distributed_timeout_config()).then([](::shared_ptr<cql3::untyped_result_set> results) {

									
										2

auth/standard_role_manager.hh
									
												View File
												
				@@ -42,8 +42,6 @@ class migration_manager;

				namespace auth {

				std::string_view standard_role_manager_name() noexcept;

				class standard_role_manager final : public role_manager {

				    cql3::query_processor& _qp;

				    ::service::migration_manager& _migration_manager;

									
										8

auth/transitional.cc
									
												View File
												
				@@ -82,7 +82,7 @@ public:

				        return _authenticator->stop();

				    }

				    virtual const sstring& qualified_java_name() const override {

				    virtual std::string_view qualified_java_name() const override {

				        return transitional_authenticator_name();

				    }

				@@ -101,7 +101,7 @@ public:

				    virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override {

				        auto i = credentials.find(authenticator::USERNAME_KEY);

				        if ((i == credentials.end() || i->second.empty())

				                && (!credentials.count(PASSWORD_KEY) || credentials.at(PASSWORD_KEY).empty())) {

				                && (!credentials.contains(PASSWORD_KEY) || credentials.at(PASSWORD_KEY).empty())) {

				            // return anon user

				            return make_ready_future<authenticated_user>(anonymous_user());

				        }

				@@ -158,7 +158,7 @@ public:

				            }

				            virtual future<authenticated_user> get_authenticated_user() const {

				                return futurize_apply([this] {

				                return futurize_invoke([this] {

				                    return _sasl->get_authenticated_user().handle_exception([](auto ep) {

				                        try {

				                            std::rethrow_exception(ep);

				@@ -201,7 +201,7 @@ public:

				        return _authorizer->stop();

				    }

				    virtual const sstring& qualified_java_name() const override {

				    virtual std::string_view qualified_java_name() const override {

				        return transitional_authorizer_name();

				    }

									
										4

backlog_controller.hh
									
												View File
												
				@@ -23,7 +23,11 @@

				#include <seastar/core/scheduling.hh>

				#include <seastar/core/timer.hh>

				#include <seastar/core/gate.hh>

				#include <seastar/core/file.hh>

				#include <chrono>

				#include <cmath>

				#include "seastarx.hh"

				// Simple proportional controller to adjust shares for processes for which a backlog can be clearly

				// defined.

									
										6

bytes.cc
									
												View File
												
				@@ -64,7 +64,7 @@ bytes from_hex(sstring_view s) {

				sstring to_hex(bytes_view b) {

				    static char digits[] = "0123456789abcdef";

				    sstring out(sstring::initialized_later(), b.size() * 2);

				    sstring out = uninitialized_string(b.size() * 2);

				    unsigned end = b.size();

				    for (unsigned i = 0; i != end; ++i) {

				        uint8_t x = b[i];

				@@ -100,3 +100,7 @@ std::ostream& operator<<(std::ostream& os, const bytes_view& b) {

				}

				}

				std::ostream& operator<<(std::ostream& os, const fmt_hex& b) {

				    return os << to_hex(b.v);

				}

									
										48

bytes.hh
									
												View File
												
				@@ -28,6 +28,7 @@

				#include <iosfwd>

				#include <functional>

				#include "utils/mutable_view.hh"

				#include <xxhash.h>

				using bytes = basic_sstring<int8_t, uint32_t, 31, false>;

				using bytes_view = std::basic_string_view<int8_t>;

				@@ -35,20 +36,24 @@ using bytes_mutable_view = basic_mutable_view<bytes_view::value_type>;

				using bytes_opt = std::optional<bytes>;

				using sstring_view = std::string_view;

				inline bytes to_bytes(bytes&& b) {

				    return std::move(b);

				}

				inline sstring_view to_sstring_view(bytes_view view) {

				    return {reinterpret_cast<const char*>(view.data()), view.size()};

				}

				namespace std {

				inline bytes_view to_bytes_view(sstring_view view) {

				    return {reinterpret_cast<const int8_t*>(view.data()), view.size()};

				}

				template <>

				struct hash<bytes_view> {

				    size_t operator()(bytes_view v) const {

				        return hash<sstring_view>()({reinterpret_cast<const char*>(v.begin()), v.size()});

				    }

				struct fmt_hex {

				    bytes_view& v;

				    fmt_hex(bytes_view& v) noexcept : v(v) {}

				};

				}

				std::ostream& operator<<(std::ostream& os, const fmt_hex& hex);

				bytes from_hex(sstring_view s);

				sstring to_hex(bytes_view b);

				@@ -83,10 +88,37 @@ struct appending_hash<bytes_view> {

				    }

				};

				struct bytes_view_hasher : public hasher {

				    XXH64_state_t _state;

				    bytes_view_hasher(uint64_t seed = 0) noexcept {

				        XXH64_reset(&_state, seed);

				    }

				    void update(const char* ptr, size_t length) noexcept {

				        XXH64_update(&_state, ptr, length);

				    }

				    size_t finalize() {

				        return static_cast<size_t>(XXH64_digest(&_state));

				    }

				};

				namespace std {

				template <>

				struct hash<bytes_view> {

				    size_t operator()(bytes_view v) const {

				        bytes_view_hasher h;

				        appending_hash<bytes_view>{}(h, v);

				        return h.finalize();

				    }

				};

				} // namespace std

				inline int32_t compare_unsigned(bytes_view v1, bytes_view v2) {

				    auto n = memcmp(v1.begin(), v2.begin(), std::min(v1.size(), v2.size()));

				  auto size = std::min(v1.size(), v2.size());

				  if (size) {

				    auto n = memcmp(v1.begin(), v2.begin(), size);

				    if (n) {

				        return n;

				    }

				  }

				    return (int32_t) (v1.size() - v2.size());

				}

									
										50

bytes_ostream.hh
									
												View File
												
				@@ -38,7 +38,8 @@ class bytes_ostream {

				public:

				    using size_type = bytes::size_type;

				    using value_type = bytes::value_type;

				    static constexpr size_type max_chunk_size() { return 128 * 1024; }

				    using fragment_type = bytes_view;

				    static constexpr size_type max_chunk_size() { return max_alloc_size() - sizeof(chunk); }

				private:

				    static_assert(sizeof(value_type) == 1, "value_type is assumed to be one byte long");

				    struct chunk {

				@@ -58,13 +59,21 @@ private:

				        void operator delete(void* ptr) { free(ptr); }

				    };

				    static constexpr size_type default_chunk_size{512};

				    static constexpr size_type max_alloc_size() { return 128 * 1024; }

				private:

				    std::unique_ptr<chunk> _begin;

				    chunk* _current;

				    size_type _size;

				    size_type _initial_chunk_size = default_chunk_size;

				public:

				    class fragment_iterator : public std::iterator<std::input_iterator_tag, bytes_view> {

				    class fragment_iterator {

				    public:

				        using iterator_category = std::input_iterator_tag;

				        using value_type = bytes_view;

				        using difference_type = std::ptrdiff_t;

				        using pointer = bytes_view*;

				        using reference = bytes_view&;

				    private:

				        chunk* _current = nullptr;

				    public:

				        fragment_iterator() = default;

				@@ -93,6 +102,29 @@ public:

				            return _current != other._current;

				        }

				    };

				    using const_iterator = fragment_iterator;

				    class output_iterator {

				    public:

				        using iterator_category = std::output_iterator_tag;

				        using difference_type = std::ptrdiff_t;

				        using value_type = bytes_ostream::value_type;

				        using pointer = bytes_ostream::value_type*;

				        using reference = bytes_ostream::value_type&;

				        friend class bytes_ostream;

				    private:

				        bytes_ostream* _ostream = nullptr;

				    private:

				        explicit output_iterator(bytes_ostream& os) : _ostream(&os) { }

				    public:

				        reference operator*() const { return *_ostream->write_place_holder(1); }

				        output_iterator& operator++() { return *this; }

				        output_iterator operator++(int) { return *this; }

				    };

				private:

				    inline size_type current_space_left() const {

				        if (!_current) {

				@@ -101,16 +133,15 @@ private:

				        return _current->size - _current->offset;

				    }

				    // Figure out next chunk size.

				    //   - must be enough for data_size

				    //   - must be enough for data_size + sizeof(chunk)

				    //   - must be at least _initial_chunk_size

				    //   - try to double each time to prevent too many allocations

				    //   - do not exceed max_chunk_size

				    //   - should not exceed max_alloc_size, unless data_size requires so

				    size_type next_alloc_size(size_t data_size) const {

				        auto next_size = _current

				                ? _current->size * 2

				                : _initial_chunk_size;

				        next_size = std::min(next_size, max_chunk_size());

				        // FIXME: check for overflow?

				        next_size = std::min(next_size, max_alloc_size());

				        return std::max<size_type>(next_size, data_size + sizeof(chunk));

				    }

				    // Makes room for a contiguous region of given size.

				@@ -289,6 +320,11 @@ public:

				        return _size;

				    }

				    // For the FragmentRange concept

				    size_type size_bytes() const {

				        return _size;

				    }

				    bool empty() const {

				        return _size == 0;

				    }

				@@ -326,6 +362,8 @@ public:

				    fragment_iterator begin() const { return { _begin.get() }; }

				    fragment_iterator end() const { return { nullptr }; }

				    output_iterator write_begin() { return output_iterator(*this); }

				    boost::iterator_range<fragment_iterator> fragments() const {

				        return { begin(), end() };

				    }

									
										72

cache_flat_mutation_reader.hh
									
												View File
												
				@@ -28,7 +28,6 @@

				#include "partition_version.hh"

				#include "utils/logalloc.hh"

				#include "query-request.hh"

				#include "partition_snapshot_reader.hh"

				#include "partition_snapshot_row_cursor.hh"

				#include "read_context.hh"

				#include "flat_mutation_reader.hh"

				@@ -134,7 +133,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {

				    void maybe_add_to_cache(const static_row& sr);

				    void maybe_set_static_row_continuous();

				    void finish_reader() {

				        push_mutation_fragment(partition_end());

				        push_mutation_fragment(*_schema, _permit, partition_end());

				        _end_of_stream = true;

				        _state = state::end_of_stream;

				    }

				@@ -146,7 +145,7 @@ public:

				                               lw_shared_ptr<read_context> ctx,

				                               partition_snapshot_ptr snp,

				                               row_cache& cache)

				        : flat_mutation_reader::impl(std::move(s))

				        : flat_mutation_reader::impl(std::move(s), ctx->permit())

				        , _snp(std::move(snp))

				        , _position_cmp(*_schema)

				        , _ck_ranges(std::move(crr))

				@@ -158,8 +157,8 @@ public:

				        , _read_context(std::move(ctx))

				        , _next_row(*_schema, *_snp)

				    {

				        clogger.trace("csm {}: table={}.{}", this, _schema->ks_name(), _schema->cf_name());

				        push_mutation_fragment(partition_start(std::move(dk), _snp->partition_tombstone()));

				        clogger.trace("csm {}: table={}.{}", fmt::ptr(this), _schema->ks_name(), _schema->cf_name());

				        push_mutation_fragment(*_schema, _permit, partition_start(std::move(dk), _snp->partition_tombstone()));

				    }

				    cache_flat_mutation_reader(const cache_flat_mutation_reader&) = delete;

				    cache_flat_mutation_reader(cache_flat_mutation_reader&&) = delete;

				@@ -176,7 +175,7 @@ public:

				        return make_ready_future<>();

				    }

				    virtual future<> fast_forward_to(position_range pr, db::timeout_clock::time_point timeout) override {

				        throw std::bad_function_call();

				        return make_exception_future<>(make_backtraced_exception_ptr<std::bad_function_call>());

				    }

				};

				@@ -188,7 +187,7 @@ future<> cache_flat_mutation_reader::process_static_row(db::timeout_clock::time_

				            return _snp->static_row(_read_context->digest_requested());

				        });

				        if (!sr.empty()) {

				            push_mutation_fragment(mutation_fragment(std::move(sr)));

				            push_mutation_fragment(mutation_fragment(*_schema, _permit, std::move(sr)));

				        }

				        return make_ready_future<>();

				    } else {

				@@ -232,7 +231,7 @@ future<> cache_flat_mutation_reader::fill_buffer(db::timeout_clock::time_point t

				            return after_static_row();

				        }

				    }

				    clogger.trace("csm {}: fill_buffer(), range={}, lb={}", this, *_ck_ranges_curr, _lower_bound);

				    clogger.trace("csm {}: fill_buffer(), range={}, lb={}", fmt::ptr(this), *_ck_ranges_curr, _lower_bound);

				    return do_until([this] { return _end_of_stream || is_buffer_full(); }, [this, timeout] {

				        return do_fill_buffer(timeout);

				    });

				@@ -265,6 +264,9 @@ future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_poin

				        }

				        _state = state::reading_from_underlying;

				        _population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);

				        if (!_read_context->partition_exists()) {

				            return read_from_underlying(timeout);

				        }

				        auto end = _next_row_in_range ? position_in_partition(_next_row.position())

				                                      : position_in_partition(_upper_bound);

				        return _underlying->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {

				@@ -277,7 +279,7 @@ future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_poin

				    // assert(_state == state::reading_from_cache)

				    return _lsa_manager.run_in_read_section([this] {

				        auto next_valid = _next_row.iterators_valid();

				        clogger.trace("csm {}: reading_from_cache, range=[{}, {}), next={}, valid={}", this, _lower_bound,

				        clogger.trace("csm {}: reading_from_cache, range=[{}, {}), next={}, valid={}", fmt::ptr(this), _lower_bound,

				            _upper_bound, _next_row.position(), next_valid);

				        // We assume that if there was eviction, and thus the range may

				        // no longer be continuous, the cursor was invalidated.

				@@ -291,7 +293,7 @@ future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_poin

				            }

				        }

				        _next_row.maybe_refresh();

				        clogger.trace("csm {}: next={}, cont={}", this, _next_row.position(), _next_row.continuous());

				        clogger.trace("csm {}: next={}, cont={}", fmt::ptr(this), _next_row.position(), _next_row.continuous());

				        _lower_bound_changed = false;

				        while (_state == state::reading_from_cache) {

				            copy_from_cache_to_buffer();

				@@ -357,7 +359,7 @@ future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::tim

				                                    e.release();

				                                    auto next = std::next(it);

				                                    it->set_continuous(next->continuous());

				                                    clogger.trace("csm {}: inserted dummy at {}, cont={}", this, it->position(), it->continuous());

				                                    clogger.trace("csm {}: inserted dummy at {}, cont={}", fmt::ptr(this), it->position(), it->continuous());

				                                }

				                            });

				                        } else if (ensure_population_lower_bound()) {

				@@ -368,11 +370,11 @@ future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::tim

				                                auto insert_result = rows.insert_check(_next_row.get_iterator_in_latest_version(), *e, less);

				                                auto inserted = insert_result.second;

				                                if (inserted) {

				                                    clogger.trace("csm {}: inserted dummy at {}", this, _upper_bound);

				                                    clogger.trace("csm {}: inserted dummy at {}", fmt::ptr(this), _upper_bound);

				                                    _snp->tracker()->insert(*e);

				                                    e.release();

				                                } else {

				                                    clogger.trace("csm {}: mark {} as continuous", this, insert_result.first->position());

				                                    clogger.trace("csm {}: mark {} as continuous", fmt::ptr(this), insert_result.first->position());

				                                    insert_result.first->set_continuous(true);

				                                }

				                            });

				@@ -413,7 +415,7 @@ bool cache_flat_mutation_reader::ensure_population_lower_bound() {

				            auto insert_result = rows.insert_check(rows.end(), *e, less);

				            auto inserted = insert_result.second;

				            if (inserted) {

				                clogger.trace("csm {}: inserted lower bound dummy at {}", this, e->position());

				                clogger.trace("csm {}: inserted lower bound dummy at {}", fmt::ptr(this), e->position());

				                _snp->tracker()->insert(*e);

				                e.release();

				            }

				@@ -453,7 +455,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {

				        _read_context->cache().on_mispopulate();

				        return;

				    }

				    clogger.trace("csm {}: populate({})", this, clustering_row::printer(*_schema, cr));

				    clogger.trace("csm {}: populate({})", fmt::ptr(this), clustering_row::printer(*_schema, cr));

				    _lsa_manager.run_in_update_section_with_allocator([this, &cr] {

				        mutation_partition& mp = _snp->version()->partition();

				        rows_entry::compare less(*_schema);

				@@ -462,7 +464,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {

				            cr.cells().prepare_hash(*_schema, column_kind::regular_column);

				        }

				        auto new_entry = alloc_strategy_unique_ptr<rows_entry>(

				            current_allocator().construct<rows_entry>(*_schema, cr.key(), cr.tomb(), cr.marker(), cr.cells()));

				            current_allocator().construct<rows_entry>(*_schema, cr.key(), cr.as_deletable_row()));

				        new_entry->set_continuous(false);

				        auto it = _next_row.iterators_valid() ? _next_row.get_iterator_in_latest_version()

				                                              : mp.clustered_rows().lower_bound(cr.key(), less);

				@@ -475,7 +477,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {

				        rows_entry& e = *it;

				        if (ensure_population_lower_bound()) {

				            clogger.trace("csm {}: set_continuous({})", this, e.position());

				            clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), e.position());

				            e.set_continuous(true);

				        } else {

				            _read_context->cache().on_mispopulate();

				@@ -494,14 +496,14 @@ bool cache_flat_mutation_reader::after_current_range(position_in_partition_view

				inline

				void cache_flat_mutation_reader::start_reading_from_underlying() {

				    clogger.trace("csm {}: start_reading_from_underlying(), range=[{}, {})", this, _lower_bound, _next_row_in_range ? _next_row.position() : _upper_bound);

				    clogger.trace("csm {}: start_reading_from_underlying(), range=[{}, {})", fmt::ptr(this), _lower_bound, _next_row_in_range ? _next_row.position() : _upper_bound);

				    _state = state::move_to_underlying;

				    _next_row.touch();

				}

				inline

				void cache_flat_mutation_reader::copy_from_cache_to_buffer() {

				    clogger.trace("csm {}: copy_from_cache, next={}, next_row_in_range={}", this, _next_row.position(), _next_row_in_range);

				    clogger.trace("csm {}: copy_from_cache, next={}, next_row_in_range={}", fmt::ptr(this), _next_row.position(), _next_row_in_range);

				    _next_row.touch();

				    position_in_partition_view next_lower_bound = _next_row.dummy() ? _next_row.position() : position_in_partition_view::after_key(_next_row.key());

				    for (auto &&rts : _snp->range_tombstones(_lower_bound, _next_row_in_range ? next_lower_bound : _upper_bound)) {

				@@ -509,7 +511,7 @@ void cache_flat_mutation_reader::copy_from_cache_to_buffer() {

				        // This guarantees that rts starts after any emitted clustering_row

				        // and not before any emitted range tombstone.

				        if (!less(_lower_bound, rts.position())) {

				            rts.set_start(*_schema, _lower_bound);

				            rts.set_start(_lower_bound);

				        } else {

				            _lower_bound = position_in_partition(rts.position());

				            _lower_bound_changed = true;

				@@ -517,7 +519,7 @@ void cache_flat_mutation_reader::copy_from_cache_to_buffer() {

				                return;

				            }

				        }

				        push_mutation_fragment(std::move(rts));

				        push_mutation_fragment(*_schema, _permit, std::move(rts));

				    }

				    // We add the row to the buffer even when it's full.

				    // This simplifies the code. For more info see #3139.

				@@ -533,7 +535,7 @@ void cache_flat_mutation_reader::copy_from_cache_to_buffer() {

				inline

				void cache_flat_mutation_reader::move_to_end() {

				    finish_reader();

				    clogger.trace("csm {}: eos", this);

				    clogger.trace("csm {}: eos", fmt::ptr(this));

				}

				inline

				@@ -558,7 +560,7 @@ void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::con

				    _ck_ranges_curr = next_it;

				    auto adjacent = _next_row.advance_to(_lower_bound);

				    _next_row_in_range = !after_current_range(_next_row.position());

				    clogger.trace("csm {}: move_to_range(), range={}, lb={}, ub={}, next={}", this, *_ck_ranges_curr, _lower_bound, _upper_bound, _next_row.position());

				    clogger.trace("csm {}: move_to_range(), range={}, lb={}, ub={}, next={}", fmt::ptr(this), *_ck_ranges_curr, _lower_bound, _upper_bound, _next_row.position());

				    if (!adjacent && !_next_row.continuous()) {

				        // FIXME: We don't insert a dummy for singular range to avoid allocating 3 entries

				        // for a hit (before, at and after). If we supported the concept of an incomplete row,

				@@ -568,7 +570,7 @@ void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::con

				            // Insert dummy for lower bound

				            if (can_populate()) {

				                // FIXME: _lower_bound could be adjacent to the previous row, in which case we could skip this

				                clogger.trace("csm {}: insert dummy at {}", this, _lower_bound);

				                clogger.trace("csm {}: insert dummy at {}", fmt::ptr(this), _lower_bound);

				                auto it = with_allocator(_lsa_manager.region().allocator(), [&] {

				                    auto& rows = _snp->version()->partition().clustered_rows();

				                    auto new_entry = current_allocator().construct<rows_entry>(*_schema, _lower_bound, is_dummy::yes, is_continuous::no);

				@@ -587,7 +589,7 @@ void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::con

				// _next_row must be inside the range.

				inline

				void cache_flat_mutation_reader::move_to_next_entry() {

				    clogger.trace("csm {}: move_to_next_entry(), curr={}", this, _next_row.position());

				    clogger.trace("csm {}: move_to_next_entry(), curr={}", fmt::ptr(this), _next_row.position());

				    if (no_clustering_row_between(*_schema, _next_row.position(), _upper_bound)) {

				        move_to_next_range();

				    } else {

				@@ -596,7 +598,7 @@ void cache_flat_mutation_reader::move_to_next_entry() {

				            return;

				        }

				        _next_row_in_range = !after_current_range(_next_row.position());

				        clogger.trace("csm {}: next={}, cont={}, in_range={}", this, _next_row.position(), _next_row.continuous(), _next_row_in_range);

				        clogger.trace("csm {}: next={}, cont={}, in_range={}", fmt::ptr(this), _next_row.position(), _next_row.continuous(), _next_row_in_range);

				        if (!_next_row.continuous()) {

				            start_reading_from_underlying();

				        }

				@@ -605,7 +607,7 @@ void cache_flat_mutation_reader::move_to_next_entry() {

				inline

				void cache_flat_mutation_reader::add_to_buffer(mutation_fragment&& mf) {

				    clogger.trace("csm {}: add_to_buffer({})", this, mutation_fragment::printer(*_schema, mf));

				    clogger.trace("csm {}: add_to_buffer({})", fmt::ptr(this), mutation_fragment::printer(*_schema, mf));

				    if (mf.is_clustering_row()) {

				        add_clustering_row_to_buffer(std::move(mf));

				    } else {

				@@ -618,7 +620,7 @@ inline

				void cache_flat_mutation_reader::add_to_buffer(const partition_snapshot_row_cursor& row) {

				    if (!row.dummy()) {

				        _read_context->cache().on_row_hit();

				        add_clustering_row_to_buffer(row.row(_read_context->digest_requested()));

				        add_clustering_row_to_buffer(mutation_fragment(*_schema, _permit, row.row(_read_context->digest_requested())));

				    }

				}

				@@ -627,7 +629,7 @@ void cache_flat_mutation_reader::add_to_buffer(const partition_snapshot_row_curs

				//   (2) If _lower_bound > mf.position(), mf was emitted

				inline

				void cache_flat_mutation_reader::add_clustering_row_to_buffer(mutation_fragment&& mf) {

				    clogger.trace("csm {}: add_clustering_row_to_buffer({})", this, mutation_fragment::printer(*_schema, mf));

				    clogger.trace("csm {}: add_clustering_row_to_buffer({})", fmt::ptr(this), mutation_fragment::printer(*_schema, mf));

				    auto& row = mf.as_clustering_row();

				    auto new_lower_bound = position_in_partition::after_key(row.key());

				    push_mutation_fragment(std::move(mf));

				@@ -637,7 +639,7 @@ void cache_flat_mutation_reader::add_clustering_row_to_buffer(mutation_fragment&

				inline

				void cache_flat_mutation_reader::add_to_buffer(range_tombstone&& rt) {

				    clogger.trace("csm {}: add_to_buffer({})", this, rt);

				    clogger.trace("csm {}: add_to_buffer({})", fmt::ptr(this), rt);

				    // This guarantees that rt starts after any emitted clustering_row

				    // and not before any emitted range tombstone.

				    position_in_partition::less_compare less(*_schema);

				@@ -645,18 +647,18 @@ void cache_flat_mutation_reader::add_to_buffer(range_tombstone&& rt) {

				        return;

				    }

				    if (!less(_lower_bound, rt.position())) {

				        rt.set_start(*_schema, _lower_bound);

				        rt.set_start(_lower_bound);

				    } else {

				        _lower_bound = position_in_partition(rt.position());

				        _lower_bound_changed = true;

				    }

				    push_mutation_fragment(std::move(rt));

				    push_mutation_fragment(*_schema, _permit, std::move(rt));

				}

				inline

				void cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone& rt) {

				    if (can_populate()) {

				        clogger.trace("csm {}: maybe_add_to_cache({})", this, rt);

				        clogger.trace("csm {}: maybe_add_to_cache({})", fmt::ptr(this), rt);

				        _lsa_manager.run_in_update_section_with_allocator([&] {

				            _snp->version()->partition().row_tombstones().apply_monotonically(*_schema, rt);

				        });

				@@ -668,7 +670,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone& rt) {

				inline

				void cache_flat_mutation_reader::maybe_add_to_cache(const static_row& sr) {

				    if (can_populate()) {

				        clogger.trace("csm {}: populate({})", this, static_row::printer(*_schema, sr));

				        clogger.trace("csm {}: populate({})", fmt::ptr(this), static_row::printer(*_schema, sr));

				        _read_context->cache().on_static_row_insert();

				        _lsa_manager.run_in_update_section_with_allocator([&] {

				            if (_read_context->digest_requested()) {

				@@ -684,7 +686,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const static_row& sr) {

				inline

				void cache_flat_mutation_reader::maybe_set_static_row_continuous() {

				    if (can_populate()) {

				        clogger.trace("csm {}: set static row continuous", this);

				        clogger.trace("csm {}: set static row continuous", fmt::ptr(this));

				        _snp->version()->partition().set_static_row_continuous(true);

				    } else {

				        _read_context->cache().on_mispopulate();

									
										37

caching_options.hh
									
												View File
												
				@@ -23,7 +23,7 @@

				#include <seastar/core/sstring.hh>

				#include <boost/lexical_cast.hpp>

				#include "exceptions/exceptions.hh"

				#include "json.hh"

				#include "utils/rjson.hh"

				#include "seastarx.hh"

				class schema;

				@@ -39,7 +39,10 @@ class caching_options {

				    sstring _key_cache;

				    sstring _row_cache;

				    caching_options(sstring k, sstring r) : _key_cache(k), _row_cache(r) {

				    bool _enabled = true;

				    caching_options(sstring k, sstring r, bool enabled)

				        : _key_cache(k), _row_cache(r), _enabled(enabled)

				    {

				        if ((k != "ALL") && (k != "NONE")) {

				            throw exceptions::configuration_exception("Invalid key value: " + k); 

				        }

				@@ -59,36 +62,54 @@ class caching_options {

				    caching_options() : _key_cache(default_key), _row_cache(default_row) {}

				public:

				    bool enabled() const {

				        return _enabled;

				    }

				    std::map<sstring, sstring> to_map() const {

				        return {{ "keys", _key_cache }, { "rows_per_partition", _row_cache }};

				        std::map<sstring, sstring> res = {{ "keys", _key_cache },

				                { "rows_per_partition", _row_cache }};

				        if (!_enabled) {

				            res.insert({"enabled", "false"});

				        }

				        return res;

				    }

				    sstring to_sstring() const {

				        return json::to_json(to_map());

				        return rjson::print(rjson::from_string_map(to_map()));

				    }

				    static caching_options get_disabled_caching_options() {

				        return caching_options("NONE", "NONE", false);

				    }

				    template<typename Map>

				    static caching_options from_map(const Map & map) {

				        sstring k = default_key;

				        sstring r = default_row;

				        bool e = true;

				        for (auto& p : map) {

				            if (p.first == "keys") {

				                k = p.second;

				            } else if (p.first == "rows_per_partition") {

				                r = p.second;

				            } else if (p.first == "enabled") {

				                e = p.second == "true";

				            } else {

				                throw exceptions::configuration_exception("Invalid caching option: " + p.first);

				                throw exceptions::configuration_exception(format("Invalid caching option: {}", p.first));

				            }

				        }

				        return caching_options(k, r);

				        return caching_options(k, r, e);

				    }

				    static caching_options from_sstring(const sstring& str) {

				        return from_map(json::to_map(str));

				        return from_map(rjson::parse_to_map<std::map<sstring, sstring>>(str));

				    }

				    bool operator==(const caching_options& other) const {

				        return _key_cache == other._key_cache && _row_cache == other._row_cache;

				        return _key_cache == other._key_cache && _row_cache == other._row_cache

				            && _enabled == other._enabled;

				    }

				    bool operator!=(const caching_options& other) const {

				        return !(*this == other);

									
										79

canonical_mutation.cc
									
												View File
												
				@@ -35,6 +35,7 @@

				#include "idl/uuid.dist.impl.hh"

				#include "idl/keys.dist.impl.hh"

				#include "idl/mutation.dist.impl.hh"

				#include <iostream>

				canonical_mutation::canonical_mutation(bytes data)

				        : _data(std::move(data))

				@@ -89,3 +90,81 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {

				    }

				    return m;

				}

				static sstring bytes_to_text(bytes_view bv) {

				    sstring ret = uninitialized_string(bv.size());

				    std::copy_n(reinterpret_cast<const char*>(bv.data()), bv.size(), ret.data());

				    return ret;

				}

				std::ostream& operator<<(std::ostream& os, const canonical_mutation& cm) {

				    auto in = ser::as_input_stream(cm._data);

				    auto mv = ser::deserialize(in, boost::type<ser::canonical_mutation_view>());

				    column_mapping mapping = mv.mapping();

				    auto partition_view = mutation_partition_view::from_view(mv.partition());

				    fmt::print(os, "{{canonical_mutation: ");

				    fmt::print(os, "table_id {} schema_version {} ", mv.table_id(), mv.schema_version());

				    fmt::print(os, "partition_key {} ", mv.key());

				    class printing_visitor : public mutation_partition_view_virtual_visitor {

				        std::ostream& _os;

				        const column_mapping& _cm;

				        bool _first = true;

				        bool _in_row = false;

				    private:

				        void print_separator() {

				            if (!_first) {

				                fmt::print(_os, ", ");

				            }

				            _first = false;

				        }

				    public:

				        printing_visitor(std::ostream& os, const column_mapping& cm) : _os(os), _cm(cm) {}

				        virtual void accept_partition_tombstone(tombstone t) override {

				            print_separator();

				            fmt::print(_os, "partition_tombstone {}", t);

				        }

				        virtual void accept_static_cell(column_id id, atomic_cell ac) override {

				            print_separator();

				            auto&& entry = _cm.static_column_at(id);

				            fmt::print(_os, "static column {} {}", bytes_to_text(entry.name()), atomic_cell::printer(*entry.type(), ac));

				        }

				        virtual void accept_static_cell(column_id id, collection_mutation_view cmv) override {

				            print_separator();

				            auto&& entry = _cm.static_column_at(id);

				            fmt::print(_os, "static column {} {}", bytes_to_text(entry.name()), collection_mutation_view::printer(*entry.type(), cmv));

				        }

				        virtual void accept_row_tombstone(range_tombstone rt) override {

				            print_separator();

				            fmt::print(_os, "row tombstone {}", rt);

				        }

				        virtual void accept_row(position_in_partition_view pipv, row_tombstone rt, row_marker rm, is_dummy, is_continuous) override {

				            if (_in_row) {

				                fmt::print(_os, "}}, ");

				            }

				            fmt::print(_os, "{{row {} tombstone {} marker {}", pipv, rt, rm);

				            _in_row = true;

				            _first = false;

				        }

				        virtual void accept_row_cell(column_id id, atomic_cell ac) override {

				            print_separator();

				            auto&& entry = _cm.regular_column_at(id);

				            fmt::print(_os, "column {} {}", bytes_to_text(entry.name()), atomic_cell::printer(*entry.type(), ac));

				        }

				        virtual void accept_row_cell(column_id id, collection_mutation_view cmv) override {

				            print_separator();

				            auto&& entry = _cm.regular_column_at(id);

				            fmt::print(_os, "column {} {}", bytes_to_text(entry.name()), collection_mutation_view::printer(*entry.type(), cmv));

				        }

				        void finalize() {

				            if (_in_row) {

				                fmt::print(_os, "}}");

				            }

				        }

				    };

				    printing_visitor pv(os, mapping);

				    partition_view.accept(mapping, pv);

				    pv.finalize();

				    fmt::print(os, "}}");

				    return os;

				}

									
										4

canonical_mutation.hh
									
												View File
												
				@@ -22,10 +22,11 @@

				#pragma once

				#include "bytes.hh"

				#include "schema.hh"

				#include "schema_fwd.hh"

				#include "database_fwd.hh"

				#include "mutation_partition_visitor.hh"

				#include "mutation_partition_serializer.hh"

				#include <iosfwd>

				// Immutable mutation form which can be read using any schema version of the same table.

				// Safe to access from other shards via const&.

				@@ -52,4 +53,5 @@ public:

				    const bytes& representation() const { return _data; }

				    friend std::ostream& operator<<(std::ostream& os, const canonical_mutation& cm);

				};

									
										9

cartesian_product.hh
									
												View File
												
				@@ -22,6 +22,9 @@

				#pragma once

				#include <vector>

				#include <sys/types.h>

				// Single-pass range over cartesian product of vectors.

				// Note:

				@@ -30,9 +33,13 @@ template<typename T>

				struct cartesian_product {

				    const std::vector<std::vector<T>>& _vec_of_vecs;

				public:

				    class iterator : public std::iterator<std::forward_iterator_tag, std::vector<T>> {

				    class iterator {

				    public:

				        using iterator_category = std::forward_iterator_tag;

				        using value_type = std::vector<T>;

				        using difference_type = std::ptrdiff_t;

				        using pointer = std::vector<T>*;

				        using reference = std::vector<T>&;

				    private:

				        size_t _pos;

				        const std::vector<std::vector<T>>* _vec_of_vecs;

Compare commits

4932 Commits next-3.2 ... next-4.4

1 .dockerignore Unescape Escape View File

87 .github/CODEOWNERS vendored Normal file Unescape Escape View File

33 .github/workflows/pages.yml vendored Normal file Unescape Escape View File

5 .gitignore vendored Unescape Escape View File

18 .gitmodules vendored Unescape Escape View File

852 CMakeLists.txt Unescape Escape View File

10 CONTRIBUTING.md Unescape Escape View File

32 HACKING.md Unescape Escape View File

131 MAINTAINERS Unescape Escape View File

4 NOTICE.txt Unescape Escape View File

151 README.md Unescape Escape View File

10 SCYLLA-VERSION-GEN Unescape Escape View File

1 abseil Submodule

26 absl-flat_hash_map.cc Normal file Unescape Escape View File

47 absl-flat_hash_map.hh Normal file Unescape Escape View File

40 alternator-test/test_condition_expression.py Unescape Escape View File

358 alternator-test/test_query.py Unescape Escape View File

21 alternator/auth.cc Unescape Escape View File

42 alternator/base64.cc Unescape Escape View File

6 alternator/base64.hh Unescape Escape View File

625 alternator/conditions.cc Unescape Escape View File

18 alternator/conditions.hh Unescape Escape View File

53 alternator/error.hh Unescape Escape View File

3801 alternator/executor.cc View File

191 alternator/executor.hh Unescape Escape View File

677 alternator/expressions.cc Unescape Escape View File

69 alternator/expressions.g Unescape Escape View File

61 alternator/expressions.hh Unescape Escape View File

122 alternator/expressions_types.hh Unescape Escape View File

120 alternator/rjson.cc Unescape Escape View File

159 alternator/rjson.hh Unescape Escape View File

128 alternator/rmw_operation.hh Normal file Unescape Escape View File

199 alternator/serialization.cc Unescape Escape View File

31 alternator/serialization.hh Unescape Escape View File

388 alternator/server.cc Unescape Escape View File

48 alternator/server.hh Unescape Escape View File

16 alternator/stats.cc Unescape Escape View File

16 alternator/stats.hh Unescape Escape View File

1116 alternator/streams.cc Normal file View File

53 alternator/tags_extension.hh Normal file Unescape Escape View File

30 api/api-doc/cache_service.json Unescape Escape View File

128 api/api-doc/column_family.json Unescape Escape View File

6 api/api-doc/compaction_manager.json Unescape Escape View File

90 api/api-doc/error_injection.json Normal file Unescape Escape View File

12 api/api-doc/failure_detector.json Unescape Escape View File

28 api/api-doc/gossiper.json Unescape Escape View File

4 api/api-doc/hinted_handoff.json Unescape Escape View File

4 api/api-doc/messaging_service.json Unescape Escape View File

51 api/api-doc/storage_proxy.json Unescape Escape View File

116 api/api-doc/storage_service.json Unescape Escape View File

16 api/api-doc/stream_manager.json Unescape Escape View File

15 api/api-doc/system.json Unescape Escape View File

53 api/api.cc Unescape Escape View File

2 api/api.hh Unescape Escape View File

29 api/api_init.hh Unescape Escape View File

24 api/cache_service.cc Unescape Escape View File

2 api/collectd.cc Unescape Escape View File

103 api/column_family.cc Unescape Escape View File

2 api/column_family.hh Unescape Escape View File

2 api/commitlog.cc Unescape Escape View File

69 api/error_injection.cc Normal file Unescape Escape View File

30 api/error_injection.hh Normal file Unescape Escape View File

9 api/gossiper.cc Unescape Escape View File

49 api/messaging_service.cc Unescape Escape View File

5 api/messaging_service.hh Unescape Escape View File

233 api/storage_proxy.cc Unescape Escape View File

608 api/storage_service.cc Unescape Escape View File

14 api/storage_service.hh Unescape Escape View File

5 api/system.cc Unescape Escape View File

62 atomic_cell.cc Unescape Escape View File

16 atomic_cell.hh Unescape Escape View File

5 auth/allow_all_authenticator.cc Unescape Escape View File

6 auth/allow_all_authenticator.hh Unescape Escape View File

5 auth/allow_all_authorizer.cc Unescape Escape View File

6 auth/allow_all_authorizer.hh Unescape Escape View File

2 auth/authenticator.hh Unescape Escape View File

2 auth/authorizer.hh Unescape Escape View File

33 auth/common.cc Unescape Escape View File

4932 Commits

next-3.2 ... next-4.4

1

.dockerignore

View File

87

.github/CODEOWNERS vendored Normal file

View File

33

.github/workflows/pages.yml vendored Normal file

View File

5

.gitignore vendored

View File

18

.gitmodules vendored

View File

852

CMakeLists.txt

View File

10

CONTRIBUTING.md

View File

32

HACKING.md

View File

131

MAINTAINERS

View File

4

NOTICE.txt

View File

151

README.md

View File

10

SCYLLA-VERSION-GEN

View File

1

abseil Submodule

26

absl-flat_hash_map.cc Normal file

View File

47

absl-flat_hash_map.hh Normal file

View File

40

alternator-test/test_condition_expression.py

View File

358

alternator-test/test_query.py

View File

21

alternator/auth.cc

View File

42

alternator/base64.cc

View File

6

alternator/base64.hh

View File

625

alternator/conditions.cc

View File

18

alternator/conditions.hh

View File

53

alternator/error.hh

View File

3801

alternator/executor.cc

View File

191

alternator/executor.hh

View File

677

alternator/expressions.cc

View File

69

alternator/expressions.g

View File

61

alternator/expressions.hh

View File

122

alternator/expressions_types.hh

View File

120

alternator/rjson.cc

View File

159

alternator/rjson.hh

View File

128

alternator/rmw_operation.hh Normal file

View File

199

alternator/serialization.cc

View File

31

alternator/serialization.hh

View File

388

alternator/server.cc

View File

48

alternator/server.hh

View File

16

alternator/stats.cc

View File

16

alternator/stats.hh

View File

1116

alternator/streams.cc Normal file

View File

53

alternator/tags_extension.hh Normal file

View File

30

api/api-doc/cache_service.json

View File

128

api/api-doc/column_family.json

View File

6

api/api-doc/compaction_manager.json

View File

90

api/api-doc/error_injection.json Normal file

View File

12

api/api-doc/failure_detector.json

View File

28

api/api-doc/gossiper.json

View File

4

api/api-doc/hinted_handoff.json

View File

4

api/api-doc/messaging_service.json

View File

51

api/api-doc/storage_proxy.json

View File

116

api/api-doc/storage_service.json

View File

16

api/api-doc/stream_manager.json

View File

15

api/api-doc/system.json

View File

53

api/api.cc

View File

2

api/api.hh

View File

29

api/api_init.hh

View File

24

api/cache_service.cc

View File

2

api/collectd.cc

View File

103

api/column_family.cc

View File

2

api/column_family.hh

View File

2

api/commitlog.cc

View File

69

api/error_injection.cc Normal file

View File

30

api/error_injection.hh Normal file

View File

9

api/gossiper.cc

View File

49

api/messaging_service.cc

View File

5

api/messaging_service.hh

View File

233

api/storage_proxy.cc

View File

608

api/storage_service.cc

View File

14

api/storage_service.hh

View File

5

api/system.cc

View File

62

atomic_cell.cc

View File

16

atomic_cell.hh

View File

5

auth/allow_all_authenticator.cc

View File

6

auth/allow_all_authenticator.hh

View File

5

auth/allow_all_authorizer.cc

View File

6

auth/allow_all_authorizer.hh

View File

2

auth/authenticator.hh

View File

2

auth/authorizer.hh

View File

33

auth/common.cc

View File

15

auth/common.hh

View File