scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 20:27:03 +00:00

Author	SHA1	Message	Date
Nadav Har'El	ef745e1ce7	alternator: fix support for bytes type in Query's KeyConditions Our parsing of values in a KeyConditions paramter of Query was done naively. As a result, we got bizarre error messages "condition not met: false" when these values had incorrect type (this is issue #6490). Worse - the naive conversion did not decode base64-encoded bytes value as needed, so KeyConditions on bytes-typed keys did not work at all. This patch fixes these bugs by using our existing utility function get_key_from_typed_value(), which takes care of throwing sensible errors when types don't match, and decoding base64 as needed. Unfortunately, we didn't have test coverage for many of the KeyConditions features including bytes keys, which is why this issue escaped detection. A patch will follow with much more comprehensive tests for KeyConditions, which also reproduce this issue and verify that it is fixed. Refs #6490 Fixes #6495 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200524141800.104950-1-nyh@scylladb.com> (cherry picked from commit `6b38126a8f`)	2020-05-31 14:02:18 +03:00
Calle Wilund	ae32aa970a	commitlog::read_log_file: Preserve subscription across reading Fixes #6265 Return type for read_log_file was previously changed from subscription to future<>, returning the previously returned subscriptions result of done(). But it did not preserve the subscription itself, which in turn will cause us to (in work::stream), call back into a deleted object. Message-Id: <20200422090856.5218-1-calle@scylladb.com> (cherry picked from commit `525b283326`)	2020-05-25 13:07:33 +03:00
Eliran Sinvani	a3eb12c5f1	Auth: return correct error code when role is not found Scylla returns the wrong error code (0000 - server internal error) in response to trying to do authentication/authorization operations that involves a non-existing role. This commit changes those cases to return error code 2200 (invalid query) which is the correct one and also the one that Cassandra returns. Tests: Unit tests (Dev) All auth and auth_role dtests (cherry picked from commit ce8cebe34801f0ef0e327a32f37442b513ffc214) Fixes #6363.	2020-05-25 12:58:09 +03:00
Amnon Heiman	b5cedfc177	storage_service: get_range_to_address_map prevent use after free The implementation of get_range_to_address_map has a default behaviour, when getting an empty keypsace, it uses the first non-system keyspace (first here is basically, just a keyspace). The current implementation has two issues, first, it uses a reference to a string that is held on a stack of another function. In other word, there's a use after free that is not clear why we never hit. The second, it calls get_non_system_keyspaces twice. Though this is not a bug, it's redundant (get_non_system_keyspaces uses a loop, so calling that function does have a cost). This patch solves both issues, by chaning the implementation to hold a string instead of a reference to a string. Second, it stores the results from get_non_system_keyspaces and reuse them it's more efficient and holds the returned values on the local stack. Fixes #6465 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `69a46d4179`)	2020-05-25 12:48:26 +03:00
Hagit Segev	8d9bc57aca	release: prepare for 4.0.1 scylla-4.0.1	2020-05-24 21:39:44 +03:00
Tomasz Grabiec	1cbda629a2	sstables: index_reader: Fix overflow when calculating promoted index end When index file is larger than 4GB, offset calculation will overflow uint32_t and _promoted_index_end will be too small. As a result, promoted_index_size calculation will underflow and the rest of the page will be interpretd as a promoted index. The partitions which are in the remainder of the index page will not be found by single-partition queries. Data is not lost. Introduced in `6c5f8e0eda`. Fixes #6040 Message-Id: <20200521174822.8350-1-tgrabiec@scylladb.com> (cherry picked from commit `a6c87a7b9e`)	2020-05-24 09:45:55 +03:00
Rafael Ávila de Espíndola	baf0201a6e	repair: Make sure sinks are always closed In a recent next failure I got the following backtrace function=function@entry=0x270360 "seastar::rpc::sink_impl<Serializer, Out>::~sink_impl() [with Serializer = netw::serializer; Out = {repair_row_on_wire_with_cmd}]") at assert.c:101 at ./seastar/include/seastar/core/shared_ptr.hh:463 at repair/row_level.cc:2059 This patch changes a few functions to use finally to make sure the sink is always closed. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200515202803.60020-1-espindola@scylladb.com> (cherry picked from commit `311fbe2f0a`) Ref #6414	2020-05-20 09:00:44 +03:00
Asias He	7dcffb963c	repair: Fix race between write_end_of_stream and apply_rows Consider: n1, n2, n1 is the repair master, n2 is the repair follower. === Case 1 === 1) n1 sends missing rows {r1, r2} to n2 2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1 is written to sstable, r2 is not written yet, r1 belongs to partition 1, r2 belongs to partition 2. It yields after row r1 is written. data: partition_start, r1 3) n1 sends repair_row_level_stop to n2 because error has happened on n1 4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream() data: partition_start, r1, partition_end 5) Step 2 resumes to apply the rows. data: partition_start, r1, partition_end, partition_end, partition_start, r2 === Case 2 === 1) n1 sends missing rows {r1, r2} to n2 2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1 is written to sstable, r2 is not written yet, r1 belongs to partition 1, r2 belongs to partition 2. It yields after partition_start for r2 is written but before _partition_opened is set to true. data: partition_start, r1, partition_end, partition_start 3) n1 sends repair_row_level_stop to n2 because error has happened on n1 4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream(). Since _partition_opened[node_idx] is false, partition_end is skipped, end_of_stream is written. data: partition_start, r1, partition_end, partition_start, end_of_stream This causes unbalanced partition_start and partition_end in the stream written to sstables. To fix, serialize the write_end_of_stream and apply_rows with a semaphore. Fixes: #6394 Fixes: #6296 Fixes: #6414 (cherry picked from commit `b2c4d9fdbc`)	2020-05-20 08:08:11 +03:00
Piotr Dulikowski	dcfaf4d035	hinted handoff: don't keep positions of old hints in rps_set When sending hints from one file, rps_set field in send_one_file_ctx keeps track of commitlog positions of hints that are being currently sent, or have failed to be sent. At the end of the operation, if sending of some hints failed, we will choose position of the earliest hint that failed to be sent, and will retry sending that file later, starting from that position. This position is stored in _last_not_complete_rp. Usually, this set has a bounded size, because we impose a limit of at most 128 hints being sent concurrently. Because we do not attempt to send any more hints after a failure is detected, rps_set should not have more than 128 elements at a time. Due to a bug, commitlog positions of old hints (older than gc_grace_seconds of the destination table) were inserted into rps_set but not removed after checking their age. This could cause rps_set to grow very large when replaying a file with old hints. Moreover, if the file mixed expired and non-expired hints (which could happen if it had hints to two tables with different gc_grace_seconds), and sending of some non-expired hints failed, then positions of expired hints could influence calculation _last_not_complete_rp, and more hints than necessary would be resent on the next retry. This simple patch removes commitlog position of a hint from rps_set when it is detected to be too old. Fixes #6422 (cherry picked from commit `85d5c3d5ee`)	2020-05-20 08:06:04 +03:00
Piotr Dulikowski	f974a54cbd	hinted handoff: remove discarded hint positions from rps_set Related commit: `85d5c3d` When attempting to send a hint, an exception might occur that results in that hint being discarded (e.g. keyspace or table of the hint was removed). When such an exception is thrown, position of the hint will already be stored in rps_set. We are only allowed to retain positions of hints that failed to be sent and needed to be retried later. Dropping a hint is not an error, therefore its position should be removed from rps_set - but current logic does not do that. Because of that bug, hint files with many discardable hints might cause rps_set to grow large when the file is replayed. Furthermore, leaving positions of such hints in rps_set might cause more hints than necessary to be re-sent if some non-discarded hints fail to be sent. This commit fixes the problem by removing positions of discarded hints from rps_set. Fixes #6433 (cherry picked from commit `0c5ac0da98`)	2020-05-20 08:03:44 +03:00
Piotr Sarna	30a96cc592	db, view: remove duplicate entries from pending endpoints When generating view updates, an endpoint can appear both as a primary paired endpoint for the view update, and as a pending endpoint (due to range movements). In order not to generate the same update twice for the same endpoint, the paired endpoint is removed from the list of pending endpoints if present. Fixes #5459 Tests: unit(dev), dtest(TestMaterializedViews.add_dc_during_mv_insert_test) (cherry picked from commit `86b0dd81e3`)	2020-05-17 19:09:58 +02:00
Avi Kivity	faf300382a	Update seastar submodule * seastar 8bc24f486a...447aad8d78 (1): > timer: add scheduling_group awareness Fixes #6170.	2020-05-10 18:12:32 +03:00
Gleb Natapov	55400598ff	storage_proxy: limit read repair only to replicas that answered during speculative reads Speculative reader has more targets that needed for CL. In case there is a digest mismatch the repair runs between all of them, but that violates provided CL. The patch makes it so that repair runs only between replicas that answered (there will be CL of them). Fixes #6123 Reviewed-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200402132245.GA21956@scylladb.com> (cherry picked from commit `36a24bbb70`)	2020-05-07 19:48:24 +03:00
Mike Goltsov	c177295bce	fix error in fstrim service (scylla_util.py) On Centos 7 machine: fstrim.timer not enabled, only unmasked due scylla_fstrim_setup on installation When trying run scylla-fstrim service manually you get error: Traceback (most recent call last): File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 60, in <module> main() File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 44, in main cfg = parse_scylla_dirs_with_default(conf=args.config) File "/opt/scylladb/scripts/scylla_util.py", line 484, in parse_scylla_dirs_with_default if key not in y or not y[k]: NameError: name 'k' is not defined It caused by error in scylla_util.py Fixes #6294. (cherry picked from commit `068bb3a5bf`)	2020-05-07 19:45:35 +03:00
Hagit Segev	d95aa77b62	release: prepare for 4.0.0 scylla-4.0.0	2020-05-05 18:58:39 +03:00
Pekka Enberg	fe54009855	scripts/jobs: Keep memory reserve when calculating parallelism The "jobs" script is used to determine the amount of compilation parallelism on a machine. It attempts to ensure each GCC process has at least 4 GB of memory per core. However, in the worst case scenario, we could end up having the GCC processes take up all the system memory, forcin swapping or OOM killer to kick in. For example, on a 4 core machine with 16 GB of memory, this worst case scenario seems easy to trigger in practice. Fix up the problem by keeping a 1 GB of memory reserve for other processes and calculating parallelism based on that. Message-Id: <20200423082753.31162-1-penberg@scylladb.com> (cherry picked from commit `7304a795e5`)	2020-05-04 19:01:14 +03:00
Piotr Sarna	bbe82236be	clocks-impl: switch to thread-safe time conversion std::gmtime() has a sad property of using a global static buffer for returning its value. This is not thread-safe, so its usage is replaced with gmtime_r, which can accept a local buffer. While no regressions where observed in this particular area of code, a similar bug caused failures in alternator, so it's better to simply replace all std::gmtime calls with their thread-safe counterpart. Message-Id: <39e91c74de95f8313e6bb0b12114bf12c0e79519.1588589151.git.sarna@scylladb.com> (cherry picked from commit `05ec95134a`)	2020-05-04 17:14:28 +03:00
Piotr Sarna	abd73cab78	alternator: fix signature timestamps Generating timestamps for auth signatures used a non-thread-safe ::gmtime function instead of thread-safe ::gmtime_r. Tests: unit(dev) Fixes #6345 (cherry picked from commit `fb7fa7f442`)	2020-05-04 17:05:39 +03:00
Nadav Har'El	8fd7cf5cd1	alternator test: drastically reduce time to boot Scylla The alternator test, test/alternator/run, runs Scylla and runs the various tests against it. Before this patch, just booting Scylla took about 26 seconds (for a dev build, on my laptop). This patch reduces this delay to less than one second! It turns out that almost the entire delay was artificial, two periods of 12 seconds "waiting for the gossip to settle", which are completely unnecessary in the one-node cluster used in the Alternator test. So a simple "--skip-wait-for-gossip-to-settle 0" parameter eliminates these long delays completely. Amusingly, the Scylla boot is now so fast, that I had to change a "sleep 2" in the test script to "sleep 1", because 2 seconds is now much more than it takes to boot Scylla :-) Fixes #6310. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200428145035.22894-1-nyh@scylladb.com> (cherry picked from commit `ff5615d59d`)	2020-05-04 16:10:27 +03:00
Alejo Sanchez	dd88b2dd18	utils: error injection allocate string for remote invoke Allocate string before sending to other shards. Reported by Pavel Solodovnikov. Refs #3295 (closed) Tests: unit ({dev}) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200328204454.1326514-2-alejo.sanchez@scylladb.com> (cherry picked from commit `e5a2ba32b9`) Ref #6342.	2020-05-03 19:33:34 +03:00
Hagit Segev	eee4c00e29	release: prepare for 4.0.rc3 scylla-4.0.rc3	2020-05-01 00:46:40 +03:00
Avi Kivity	85071ceeb1	Merge 'Fix hang in multishard_writer' from Asias " This series fix hang in multishard_writer when error happens. It contains - multishard_writer: Abort the queue attached to consumers when producer fails - repair: Fix hang when the writer is dead Fixes #6241 Refs: #6248 " * asias-stream_fix_multishard_writer_hang: repair: Fix hang when the writer is dead mutation_writer_test: Add test_multishard_writer_producer_aborts multishard_writer: Abort the queue attached to consumers when producer fails (cherry picked from commit `8925e00e96`)	2020-04-30 19:32:12 +03:00
Asias He	4cf201fc24	config: Do not enable repair based node operations by default Give it some more time to mature. Use the old stream plan based node operations by default. Fixes: #6305 Backports: 4.0 (cherry picked from commit `b8ac10c451`)	2020-04-30 17:57:55 +03:00
Raphael S. Carvalho	c6ad5cf556	api/service: fix segfault when taking a snapshot without keyspace specified If no keyspace is specified when taking snapshot, there will be a segfault because keynames is unconditionally dereferenced. Let's return an error because a keyspace must be specified when column families are specified. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com> (cherry picked from commit `02e046608f`) Fixes #6336.	2020-04-30 12:49:13 +03:00
Piotr Sarna	51e3e6c655	Update seastar submodule * seastar 251bc8f2...8bc24f48 (1): > http: make headers case-insensitive Fixes #6319	2020-04-30 08:18:01 +02:00
Nadav Har'El	8ac6579b30	test.py: run Alternator test with the correct Scylla binary The Alternator test's run script, test/alternator/run, runs Scylla. By default, it chooses the last built Scylla executable build/*/scylla. However, test.py has a "mode" option, that should be able to choose which build mode to run. Before this patch, this mode option wasn't honored by the Alternator test, so a "test.py alternator/run" would run the same Scylla binary (the one last built) three times, instead of running each of the three build modes. We fix this in this patch: test.py now passes the "SCYLLA" environment variable to the test/alternator/run script, indicating the location of the Scylla binary with the appropriate build mode. The script already supported this environment variable to override its default choice of Scylla binary. In test.py, we add to the run_test() function an optional "env" parameter which can be used to pass additional environment variables to the test. Fixes #6286 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200427131958.28248-1-nyh@scylladb.com> (cherry picked from commit `858a12755b`)	2020-04-28 16:19:07 +03:00
Piotr Sarna	3744e66244	alternator: fix integer overflow warning in token generation When generating tokens for parallel scan, debug mode undefined behavior sanitizer complained that integer overflow sometimes happens when multiplying two big values - delta and segment number. In order to mitigate this warning, the multiplication is now split into two smaller ones, and the generated machine code remains identical (verified on gcc and clang via compiler explorer). Fixes #6280 Tests: unit(dev) (cherry picked from commit `e17c237feb`)	2020-04-28 16:15:31 +03:00
Piotr Sarna	d3bf349484	alternator: allow parallel scan Parallel scans can be performed by providing Segment and TotalSegments attributes to Scan request, which can be used to split the work among many workers. This test makes the parallel scan test succeed, so the xfail is removed. Fixes #5059 (cherry picked from commit `dbb9574aa2`)	2020-04-28 16:07:43 +03:00
Nadav Har'El	3e6a8ba5bd	test/alternator: increase timeout on Scylla boot The Alternator test boots Scylla to test against it. We set an arbitrary timeout for this boot to succeed: 100 seconds. This 100 seconds is significantly more than 25 seconds it takes on my laptop, and I though we'll never reach it. But it turns out that in some setups - running the very slow debug build on slow and overcommitted nodes - 100 seconds is not enough. So this patch doubles the timeout to 200 seconds. Note that this "200 seconds" is just a timeout, and doesn't affect normal runs: Both a successful boot and a failed boot are recognized as soon as they happen, and we never unnecessarily wait the entire 200 seconds. Fixes #6271. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200422193920.17079-1-nyh@scylladb.com> (cherry picked from commit `92e36c5df5`)	2020-04-28 16:04:12 +03:00
Nadav Har'El	5f1785b9cf	alternator: use RF=3 even if some nodes are temporarily down Alternator is supposed to use RF=3 for new tables. Only when the cluster is smaller than 3 nodes do we use RF=1 (and warn about it) - this is useful for testing. However, our implementation incorrectly tested the number of live nodes in the cluster instead of the total number of nodes. As a result, if a 3-node cluster had one node down, and a new table was created, it was created with RF=1, and immediately could not be written because when RF=1, any node down means part of the data is unavailable. This patch fixes this: The total number of nodes in the cluster - not the number of live nodes - is consulted. The three-node-cluster-with-a-dead-node setup above creates the table with RF=3, and it can be written because two living nodes out of three are enough when RF=3 and we do quorum writes and reads. We have a dtest to reproduce this bug (and its fix), and it's also easy to reproduce manually by starting a 3-node cluster, killing one of the nodes, and then running "pytests". Before this patch, the tests can create tables but then fail to write to them. After this patch, the test succeed on the same cluster with the dead node. Fixes #6267 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200422182035.15106-2-nyh@scylladb.com> (cherry picked from commit `1f75efb556`)	2020-04-28 15:52:06 +03:00
Nadav Har'El	e1fd6cf989	gossiper: add convenience function for getting number of nodes The gossiper has a convenience functions get_up_endpoint_count() and get_down_endpoint_count(), but strangely no function to get the total number. Even though it's easy to calculate the total by summing up their result it is inefficient and also incovenient because of of these functions returns a future. So let's add another function, get_all_endpoint_count(), to get the total number of nodes. We will use this function in the next patch. Signed-off-by: Nadav Har'El <n...@scylladb.com> Message-Id: <20200422182035.15106-1-nyh@scylladb.com> (cherry picked from commit `08c39bde1a`)	2020-04-28 15:51:37 +03:00
Piotr Sarna	b7328ff1e4	alternator: implement ScanIndexForward The ScanIndexForward parameter is now fully implemented and can accept ScanIndexForward=false in order to query the partitions in reverse clustering order. Note that reading partition slices in reverse order is less efficient than forward scans and may put a strain on memory usage, especially for large partitions, since the whole partition is currently fetched in order to be reversed. Fixes #5153 (cherry picked from commit `09e4f3b917`)	2020-04-28 15:30:01 +03:00
Avi Kivity	602ed43ac7	Update seastar submodule * seastar 76260705ef...251bc8f25d (1): > http server: fix "Date" header format Fixes #6253.	2020-04-26 19:30:08 +03:00
Tomasz Grabiec	c42c91c5bb	Merge "Drop only learnt value on PRUNE" from Gleb It is unsafe to remove entire row, so only drop learn value from system.paxos table. Fixes: #6154 (cherry picked from commit `e648e314e5`)	2020-04-21 18:30:12 +03:00
Avi Kivity	cf017b320a	test: alternator: configure scylla for test environment in terms of cpu and disk Currently, the alternator tests configure scylla to use all the logical cores in the host system, but only 1GB of RAM. This can lead to a small amount of memory per core. It also uses the default disk configuration, which is safe, but can be very slow on mechanical or non-enterprise disks. Change to use a fixed --smp 2 configuration, and add --overprovisioned for maximum flexibility (no spinning). Use --unsafe-bypass-fsync for faster performance on non-enterprise or mechanical disks, assuming that the test data is not important. Fixes #6251. Message-Id: <20200420154112.123386-1-avi@scylladb.com> (cherry picked from commit `2482e53de9`)	2020-04-21 18:25:28 +03:00
Hagit Segev	89e79023ae	release: prepare for 4.0.rc2 scylla-4.0.rc2	2020-04-21 16:26:09 +03:00
Nadav Har'El	bc67da1a21	alternator-test: comment out an error-path test that doesn't work on newer boto3 Unfortunately, the boto3 library doen't allow us to check some of the input error cases because it unnecessarily tests its input instead of just passing it to Alternator and allowing Alternator to report the error. In this patch we comment out a test case which used to work fine - i.e., the error was reported by Alternator - until recent changes to boto3 made it catch the problem without passing it to Alternator :-( Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200330190521.19526-2-nyh@scylladb.com> (cherry picked from commit `fe6cecb26d`)	2020-04-21 07:19:54 +02:00
Botond Dénes	0c7643f1fe	schema: schema(): use std::stable_sort() to sort key columns When multiple key columns (clustering or partition) are passed to the schema constructor, all having the same column id, the expectation is that these columns will retain the order in which they were passed to `schema_builder::with_column()`. Currently however this is not guaranteed as the schema constructor sort key columns by column id with `std::sort()`, which doesn't guarantee that equally comparing elements retain their order. This can be an issue for indexes, the schemas of which are built independently on each node. If there is any room for variance between for the key column order, this can result in different nodes having incompatible schemas for the same index. The fix is to use `std::stable_sort()` which guarantees that the order of equally comparing elements won't change. This is a suspected cause of #5856, although we don't have hard proof. Fixes: #5856 Signed-off-by: Botond Dénes <bdenes@scylladb.com> [avi: upgraded "Refs" to "Fixes", since we saw that std::sort() becomes unstable at 17 elements, and the failing schema had a clustering key with 23 elements] Message-Id: <20200417121848.1456817-1-bdenes@scylladb.com> (cherry picked from commit `a4aa753f0f`)	2020-04-19 18:18:45 +03:00
Rafael Ávila de Espíndola	c563234f40	dht: Use get_random_number<uint64_t> instead of int64_t in token::get_random_token I bisect the opposite change in `9c202b52da` as the cause of issue 6193. I don't know why. Maybe get_random_number<signed_type> is buggy? In any case, reverting to uint64_t solves the issue. Fixes #6193 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200418001611.440733-1-espindola@scylladb.com> (cherry picked from commit `f3fd466156`)	2020-04-19 16:20:40 +03:00
Nadav Har'El	77b7a48a02	alternator: remove mentions of experimental status of LWT Since commit `9948f548a5`, the LWT no longer requires an "experimental" flag, so Alternator documents and scripts which referred to the need for enabling experimental LWT, are fixed here to no longer do that. Fixes #6118. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200405143237.12693-1-nyh@scylladb.com> (cherry picked from commit `d9d50362af`)	2020-04-19 15:10:32 +03:00
Piotr Sarna	b2b1bfb159	alternator: fix failure on incorrect table name with no indexes If a table name is not found, it may still exist as a local index, but the check tried to fetch a local index name regardless if it was present in the request, which was a nullptr dereference bug. Fixes #6161 Tests: alternator-test(local, remote) Message-Id: <428c21e94f6c9e450b1766943677613bd46cbc68.1586347130.git.sarna@scylladb.com> (cherry picked from commit `123edfc10c`)	2020-04-19 15:07:25 +03:00
Nadav Har'El	d72cbe37aa	docs/alternator/alternator.md: fix typos Fix a couple of typos in the Alternator documentation. Fixes scylladb/scylla-doc-issues#280 Fixes scylladb/scylla-doc-issues#281 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200419091900.23030-1-nyh@scylladb.com> (cherry picked from commit `7e7c688946`)	2020-04-19 15:03:22 +03:00
Nadav Har'El	9f7b560771	docs, alternator: alternator.md cleanup Clean up the alternator.md document, by: * Updating out-of-date information that outstayed its welcome. * When Scylla does have a feature but it's just not supported via the DynamoDB API (e.g., CDC and on-demand backups) mention that. * Remove mention of Alternator being experimental and users should not store important data on it :-) * Miscellaneous cleanups. Fixes #6179. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200412094641.27186-1-nyh@scylladb.com> (cherry picked from commit `606ae0744c`)	2020-04-19 15:00:53 +03:00
Nadav Har'El	06af9c028c	alternator-test: make Alternator tests runnable from test.py To make the tests in alternator-test runnable by test.py, we need to move the directory alternator-test/ to test/alternator, because test.py only looks for tests in subdirectories of test/. Then, we need to create a test/alternator/suite.yaml saying that this test directory is of type "Run", i.e., it has a single run script "run" which runs all its tests. The "run" script had to be slightly modified to be aware of its new location relative to the source directory. To run the Alternator tests from test.py, do: ./test.py --mode dev alternator Note that in this version, the "--mode" has no effect - test/alternator/run always runs the latest compiled Scylla, regardless of the chosen mode. The Alternator tests can still be run manually and individually against a running Scylla or DynamoDB as before - just go to the test/alternator directory (instead of alternator-test previously) and run "pytest" with the desired parameters. Fixes #6046 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `4e2bf28b84`)	2020-04-19 11:19:15 +03:00
Nadav Har'El	c74ab3ae80	test.py: add xunit XML output file for "Run" tests Assumes that "Run" tests can take the --junit-xml=<path> option, and pass it to ask the test to generate an XML summary of the run to a file like testlog/dev/xml/run.1.xunit.xml. This option is honored by the Alternator tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `0cccb5a630`)	2020-04-19 11:19:06 +03:00
Nadav Har'El	32cd3a070a	test.py: add new test type "Run" This patch adds a new test type, "Run". A test subdirectory of type "Run" has a script called "run" which is expected to run all the tests in that directory. This will be used, in the next patch, by the Alternator functional tests. These tests indeed have a "run" script, which runs Scylla and then runs all of Alternator's tests, finishing fairly quickly (in less than a minute). All of that will become one test.py test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `0ae3136900`)	2020-04-19 11:18:01 +03:00
Nadav Har'El	bb1554f09e	test.py: flag for aborting tests with SIGTERM, not SIGKILL Today, if test.py is interrupted with SIGINT or SIGTERM, the ongoing test is killed with SIGKILL. Some types of tests - such as Alternator's test - may depend on being killed politely (e.g., with SIGTERM) to clean up files. We cannot yet change the signal to SIGTERM for all tests, because Seastar tests often don't deal well with signals, but we can at least add a flag that certain test types - that know they can be killed gently - will use. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `36e44972f1`)	2020-04-19 11:17:51 +03:00
Nadav Har'El	2037d7550e	alternator-test: change "run" script to pick random IP address Before this patch, the Alternator tests "run" script ran Scylla on a fixed listening address, 127.0.0.1. There is a problem that there might be other concurrent runs of Scylla using the same IP address - e.g., CCM (used by dtest) uses exactly this IP address for its first node. Luckily, Linux's loopback device actually allows us to pick any of over a million addresses in 127.0.0.0/8 to listen on - we don't need to use 127.0.0.1 specifically. So the code in this patch picks an address in 127.1.., so it cannot collide with CCM (which uses 127.0.0.* for up to 255 nodes). Moreover, the last two bytes of the listen address are picked based on the process ID of the run script; This allows multiple copies of this script to run concurrently - in case anybody wishes to do that. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `24fcc0c0ff`)	2020-04-19 11:17:39 +03:00
Nadav Har'El	c320c3f6da	install-dependencies.sh: add dependencies for Alternator tests To run Alternator tests, only two additional dependencies need to be added to install-dependencies.sh: pytest, and python3-boto3. We also need python3-cassandra-driver, but this dependency is already listed. This patch only updates the dependencies for Fedora, which is what we need for dbuild and our Jenkins setups. Tested by building a new dbuild docker image and verifying that the Alternator tests pass. Signed-off-by: Nadav Har'El <nyh@scylladb.com> [avi: update toolchain image; note this upgrades gcc to 9.3.1] Message-Id: <20200330181128.18582-1-nyh@scylladb.com> (cherry picked from commit `8627ae42a6`)	2020-04-19 11:17:07 +03:00
Nadav Har'El	0ed70944aa	alternator-test: run: use the Python driver, not cqlsh The "run" script for the Alternator tests needs to set a system table for authentication credentials, so we can test this feature. So far we did this with cqlsh, but cqlsh isn't always installed on build machines. But install-dependencies.sh already installs the Cassandra driver for Python, so it makes more sense to use that, so this patch switches to use it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200331131522.28056-1-nyh@scylladb.com> (cherry picked from commit `55f02c00f2`)	2020-04-19 11:16:54 +03:00

1 2 3 4 5 ...

21731 Commits