Commit Graph

40 Commits

Author SHA1 Message Date
Nadav Har'El
a896e2dbb9 alternator: add optional support for writing to system table
In commit 44a1daf we added the ability to read system tables through
the DynamoDB API (actually, the Scan and Query requests only).
This ability is useful for tests, and can also be useful to users who
want to read information that is only available through system tables.

This patch adds support also for *writing* into system tables. This will
be useful for Alternator tests, were we want to temporarily change
some live-updatable configuration option - and so far haven't been
able to do that like we did do in some cql-pytest tests.

For reasons explained in issue #23218, only superuser roles are allowed to
write to system tables - it is not enough for the role to be granted
MODIFY permissions on the system table or on ALL KEYSPACES. Moreover,
the ability to modify system tables carries special risks, so this
patch only allows writes to the system tables if a new configuration
option "alternator_allow_system_table_write" turned on. This option is
turned off by default.

This patch also includes a test for this new configuration-writing
capability. The test scripts test/alternator/run and test.py now
run Scylla with alternator_allow_system_table_write turned on, but
the new test can also run without this option, and will be skipped
in that case (to allow running the test suite against some manually-
run instance of Scylla).

Fixes: #12348

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-08-06 10:00:04 +03:00
Nadav Har'El
e7257b1393 test/alternator: make "run" script use only_rmw_uses_lwt
Originally (since commit c3da9f2), Alternator's functional test suite
(test/alternator) ran "always_use_lwt" write isolation mode. The original
thinking was that we need to exercise this more difficult mode and it's
the most important mode. This mode was originally chosen in
test/alternator/run.

However, starting with commit 76a766c (a year ago), test.py no longer
runs test/alternator/run. Instead, it runs Scylla itself, and the options
for running Scylla appear in test/alternator/suite.yaml, and accidentally
the write isolation mode only_rmw_uses_lwt was chosen there.

The purpose of this patch is to reconcile this difference and use the
same mode in test.py (which CI is using) and test/alternator/run (which
is only used by some developers, during development).

I decided to have this patch change test/alternator/run to use
only_rmw_uses_lwt. As noted above, this is anyway how all Alternator
tests have been running in CI in the past year (through test.py).
Also, the mode only_rmw_uses_lwt makes running the Alternator test
suite slightly faster (52 seconds instead of 58 seconds, on my laptop)
which is always nice for developers.

This patch changes nothing for testing in CI - only manual runs through
test/alternator/run are affected.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-06-29 13:58:58 +03:00
Evgeniy Naydanov
b7b68355ef test.py: remove --omit-scylla-output from pytest argparser
Remove `--omit-scylla-output` CLI option from pytest argparser.
Instead, remove it from `sys.argv` in `cqlpy/run.py`.  Also, no need
to check this option in `alternator/run`.
2025-05-29 12:15:28 +00:00
Nadav Har'El
e919794db8 test/alternator: fix mistakes introduced with test_service_levels.py
This patch undoes multiple mistakes done when introducing the test
for service levels in pull request #22031:

1. The PR introduced in test/alternator/run and test/alternator/suite.yaml
   a permanent role and service level that the service-level test is
   supposed to use. This was a mistake - the test can create the service
   level for its own use, using CQL, it does not need to assume such a
   service level already exists.
   It's important to fix this to allow the service level test to run
   against an installation of Scylla not set up by our own scripts.
   Moreover, while the code in suite.yaml was correct, the code in
   "run" was incorrect (used an outdated keyspace name). This patch
   removes that incorrect code.

2. The PR introduced a duplicate "cql" fixture, copied verbatim
   from test_cql_rbac.py (including a comment that was correct only
   in the latter file :-)). Let's de-duplicate it, using the fixture
   that I moved to conftest.py in the previous patch.

3. The PR used temporary_grant(). This needelessly complicated the test
   and added even more duplicate code, and this patch removes all that
   stuff. This test is about service levels, not RBAC and "grant".
   This test should just use a superuser role that has the permissions
   to do everything, and don't need to be granted specific permissions.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-01-05 19:40:14 +02:00
Piotr Dulikowski
b23bc3a5d5 alternator: execute under scheduling group for service level
Now, the Alternator API requests are executed under the correct
scheduling group of the service level assigned to the currently logged
in user.
2025-01-02 07:13:34 +01:00
Nadav Har'El
3fda9651cc test/alternator: option to run alternator tests against specific release
We recently added a "--release <version>" option to test/cql-pytest/run
to run a cql-pytest test against a released version of Scylla, downloaded
automatically from ScyllaDB's precompiled binary repository. This patch
adds the same capability also to test/alternator/run - allowing to run
a current test/alternator test on older releases of Scylla. The
implementation in this patch reuses the same implementation from the
cql-pytest patch.

Here is an example use case: the pull request #19941 claimed that
a certain bug fix was backported to release 6.0. Was it? Let's run
the test reproducing that bug on two releases:

test/alternator/run --release 6.0 test_streams.py::test_stream_list_tables
test/alternator/run --release 6.1 test_streams.py::test_stream_list_tables

It shows that the test passes on 6.1 (so the bug is fixed there) but the
test fails 6.0. It turns out that although the fix was backported to
branch-6.0, this happened shortly after 6.0.4 was released and no later
6.0 minor release came afterwards! So the bug wasn't actually fixed
on any official release of 6.0.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#21343
2024-11-13 09:38:09 +02:00
Nadav Har'El
8c215141a1 test: rename "cql-pytest" to "cqlpy"
Python and Python developers don't like directory names to include a
minus sign, like "cql-pytest". In this patch we rename test/cql-pytest
to test/cqlpy, and also change a few references in other code (e.g., code
that used test/cql-pytest/run.py) and also references to this test suite
in documentation and comments.

Arguably, the word "test" was always redundant in test/cql-pytest, and
I want to leave the "py" in test/cqlpy to emphasize that it's Python-based
tests, contrasting with test/cql which are CQL-request-only approval
tests.

Fixes #20846

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-11-06 16:48:36 +02:00
Botond Dénes
6d8e9645ce test/*/run: restore --vnodes into working order
This option was silently broken when --enable-tablet's default changed
from false to true. The reason is that when --vnodes is passed, run only
removes --enable-tablets=true from scylla's command line. With the new
default this is not enough, we need to explicitely disable tablets to
override the default.

Closes scylladb/scylladb#20462
2024-09-13 17:10:09 +03:00
Pavel Emelyanov
83d491af02 config: Remove experimental TABLETS feature
... and replace it with boolean enable_tablets option. All the places
in the code are patched to check the latter option instead of the former
feature.

The option is OFF by default, but the default scylla.yaml file sets this
to true, so that newly installed clusters turn tablets ON.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18898
2024-05-30 18:03:51 +03:00
Nadav Har'El
dc80b5dafe test/alternator: do not write to auth tables
As part of the Alternator test suite, we check Alternator's support for
authentication. Alternator maps Scylla's existing CQL roles to AWS's
authentication:
  * AWS's access_key_id     <- the name of the CQL role
  * AWS's secret_access_key <- the salted hash of the password of the CQL role

Before this patch, the Alternator test suite created a new role with a
preset salted hash (role "alternator", salted hash "secret_pass")
and than used that in the tests. However, with the advent of Raft-based
metadata it is wrong to write directly to the roles table, and starting
with #17952 such writes will be outright forbidden.

But we don't actually need to create a new CQL role! We already have
a perfectly good CQL role called "cassandra", and our tests already use
it. So what this patch does is to have the Alternator tests (conftest.py)
read from the roles system-table the salted hash of the "cassandra" role,
and then use that - instead of the hard-coded pair alternator/secret_pass -
in the tests.

A couple more tests assumed that the role name that was used was
"alternator", but now it was changed to "cassandra" so those tests
needed minor fixes as well.

After this patch, the Alternator tests no longer *write* to the roles
system table. Moreover, after this patch, test/alternator/run and
test/alternator/suite.yaml (used when testing with test.py) no longer
need to do extra ugly CQL setup before starting the Alternator tests.

Fixes #18744

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18771
2024-05-22 11:00:15 +03:00
Marcin Maliszkiewicz
9cb1f111d5 alternator: add support for auth-v2
Alternator doesn't do any writes to auth
tables so it's simply change of keyspace
name.

Docs will be updated later, when auth-v2
is enabled as default.
2024-03-01 16:25:14 +01:00
Nadav Har'El
4d6b286345 test/alternator: add "--vnodes" option to run script
test/cql-pytest/run.py was recently modified to add the "tablets"
experimental feature, so test/alternator/run now also runs Scylla by
default with tablets enabled.

This is the correct default going forward, but in the short term it
would be nice to also have an option to easily do a manual test run
*without* tablets.

So this patch adds a "--vnodes" option to the test/alternator/run script.
This option causes "run" to run Scylla without enabling the "tablets"
experimental feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 10:53:23 +02:00
Patryk Jędrzejczak
5ebfbf42bc db: config: make consistent_cluster_management mandatory
Code that executed only when consistent_cluster_management=false is
removed. In particular, after this patch:
- raft_group0 and raft_group_registry are always enabled,
- raft_group0::status_for_monitoring::disabled becomes unused,
- topology tests can only run with consistent_cluster_management.
2023-12-14 16:54:04 +01:00
Marcin Maliszkiewicz
81be3e0935 test/alternator/run: port -h and --omit-scylla-output options from cql-pytest
Closes scylladb/scylladb#16171
2023-11-26 13:52:01 +02:00
Nadav Har'El
7dc54771e1 test/cql-pytest: allow "run-cassandra" without building Scylla
Before this patch, all scripts which use test/cql-pytest/run.py
looked for the Scylla executable as their first step. This is usually
the right thing to do, except in two cases where Scylla is *not* needed:

1. The script test/cql-pytest/run-cassandra.
2. The script test/alternator/run with the "--aws" option.

So in this patch we change run.py to only look for Scylla when actually
needed (the find_scylla() function is called). In both cases mentioned
above, find_scylla() will never get called and the script can work even
if Scylla was never built.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13010
2023-03-01 07:54:19 +02:00
Gleb Natapov
1688163233 raft: replace experimental raft option with dedicated flag
Unlike other experimental feature we want to raft to be optional even
after it leaves experimental mode. For that we need to have a separate
option to enable it. The patch adds the binary option "consistent-cluster-management"
for that.
2023-01-03 11:15:11 +02:00
Nadav Har'El
2dedb5ea75 alternator: make Alternator TTL feature no longer "experimental"
Until now, the Alternator TTL feature was considered "experimental",
and had to be manually enabled on all nodes of the cluster to be usable.

This patch removes this requirement and in essence GAs this feature.

Even after this patch, Alternator TTL is still a "cluster feature",
i.e., for this feature to be usable every node in the cluster needs
to support it. If any of the nodes is old and does not yet support this
feature, the UpdateTimeToLive request will not be accepted, so although
the expiration-scanning threads may exist on the newer nodes, they will
not do anything because none of the tables can be marked as having
expiration enabled.

This patch does not contain documentation fixes - the documentation
still suggests that the Alternator TTL feature is experimental.
The documentation patch will come separately.

Fixes #12037

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12049
2022-11-24 17:21:39 +02:00
Nadav Har'El
e7e9adc519 alternator,ttl: allow sub-second TTL scanning period, for tests
Alternator has the "alternator_ttl_period_in_seconds" parameter for
controlling how often the expiration thread looks for expired items to
delete. It is usually a very large number of seconds, but for tests
to finish quickly, we set it to 1 second.
With 1 second expiration latency, test/alternator/test_ttl.py took 5
seconds to run.

In this patch, we change the parameter to allow a  floating-point number
of seconds instead of just an integer. Then, this allows us to halve the
TTL period used by tests to 0.5 seconds, and as a result, the run time of
test_ttl.py halves to 2.5 seconds. I think this is fast enough for now.

I verified that even if I change the period to 0.1, there is no noticable
slowdown to other Alternator tests, so 0.5 is definitely safe.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-09-12 10:32:56 +03:00
Nadav Har'El
49a8164fb7 alternator: add configurable scan period to TTL expiration
Before this patch, the experimental TTL (expiration time) feature in
Alternator scans tables for expiration in a tight loop - starting the
next scan one second after the previous one completed.

In this patch we introduce a new configuration option,
alternator_ttl_period_in_seconds, which determines how frequently
to start the scan. The default is 24 hours - meaning that the next
scan is started 24 hours after the previous one started.

The tests (test/alternator/run) change this configuration back to one
second, so that expiration tests finish as quickly as possible.

Please note that the scan is *not* slowed down to fill this 24 hours -
if it finishes in one hour, it will then sleep for 23 hours. Additional
work would be needed to slow down the scan to not finish too quickly.
One idea not yet implemented is to move the expiration service from
the "maintenance" scheduling group which it uses today to a new
scheduling group, and modifying the number of shares that this group
gets.

Another thing worth noting about the configurable period (which defaults
to 24 hours) is that when TTL is enabled on an Alternator table, it can
take that amount of time until its scan starts and items start expiring
from it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-02-25 07:26:11 +02:00
Nadav Har'El
4937270803 test/alternator: add option to run with Raft-based schema changes
This patch adds a "--raft" option to test/alternator/run to enable the
experimental Raft-based schema changes ("--experimental-features=raft")
when running Scylla for the tests. This is the same option we added to
test/cql-pytest/run in a previous patch.

Note that we still don't have any Alternator tests that pass or fail
differently in these two modes - these will probably come later as we
fix issues #9868 and #6391. But in order to work on fixing those issues
we need to be able to run the tests in Raft mode.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220209123144.321344-1-nyh@scylladb.com>
2022-02-10 09:43:10 +02:00
Nadav Har'El
86e8979ff2 test/alternator, test/cql-pytest: enable specific experimental features
Issue #9467 deprecated the blanket "--experimental" option which we
used to enable all experimental Scylla features for testing, and
suggests that individual experimental features should be enabled
instead.

So this is what we do in this patch for the Scylla-running scripts
in test/alternator and test/cql-pytest: We need to enable UDF for
the CQL tests, and to enable Alternator Streams and Alternator TTL
for the Alternator tests.

Refs #9467

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211012110312.719654-2-nyh@scylladb.com>
2021-10-15 16:36:35 +03:00
Nadav Har'El
1d4474d543 test/alternator/run: don't run Scylla if "--aws" option
The test/alternator/run script runs Scylla and then runs pytest against
it. But when passing the "--aws" option, the intention is that these
tests be run against AWS DynamoDB, not a local Scylla, so there is no
point in starting Scylla at all - so this is what we do in this patch.

This doesn't really add a new feature - "test/alternator/run --aws"
will now be nothing more than "cd test/alternator; pytest --aws".
But it adds the convenience that you can run the same tests on Scylla
and AWS with exactly the same "run" command, just adding the "--aws"
option, and don't need to sometimes use "run" and sometimes "pytest".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210912133239.75463-1-nyh@scylladb.com>
2021-09-12 16:50:38 +03:00
Nadav Har'El
0bb2e010f5 test/alternator: rewrite test/alternator/run script in Python
We already wrote the test/cql-pytest/run script in Python in a way
it can be reusable for the other test/*/run scripts.

So this patch replaces the test/alternator/run shell script with Python
code which does the same thing (safely runs Scylla with Alternator and
pytest on it in a temporary directory and IP address), but sharing most
of the code that cql-pytest uses.

The benefit of reusing the test/cql-pytest/run.py library goes beyond
shorter code - the main benefit will be that we can't forget to fix one
of the test/*/run scripts (e.g., add more command line options or fix a
bug) when fixing another one.

To make the test/cql-pytest/run.py library reusable for running
Alternator, I needed to generalize a few things in this patch (e.g.,
the way we check and wait for Scylla to boot with the different APIs we
intend to check). There is also one bug-fix on how interrupts are
handled (they are now better guaranteed to kill pytest) - and now fixing
this bug benefits all runners using run.py (cql-pytest/run,
cql-pytest/run-cassandra and alternator/run).

In the future, we can port the runners which are still duplicate shell
scripts - test/redis/run and test/scylla-gdb/run - to Python in a
similar manner to what we did here for test/alternator/run.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-06-03 11:23:00 +03:00
Avi Kivity
8785dd62cb tests: use kernel page cache
Tests are short-lived and use a small amount of data. They
are also often run repeatly, and the data is deleted immediately
after the test. This is a good scenario for using the kernel page
cache, as it can cache read-only data from test to test, and avoid
spilling write data to disk if it is deleted quickly.

Acknowledge this by using the new --kernel-page-cache option for
tests.

This is expected to help on large machines, where the disk can be
overloaded. Smaller machines with NVMe disks probably will not see
a difference.

Closes #8347
2021-03-30 12:04:55 +02:00
Nadav Har'El
781f9d9aca alternator: make default timeout configurable
Whereas in CQL the client can pass a timeout parameter to the server, in
the DynamoDB API there is no such feature; The server needs to choose
reasonable timeouts for its own internal operations - e.g., writes to disk,
querying other replicas, etc.

Until now, Alternator had a fixed timeout of 10 seconds for its
requests. This choice was reasonable - it is much higher than we expect
during normal operations, and still lower than the client-side timeouts
that some DynamoDB libraries have (boto3 has a one-minute timeout).
However, there's nothing holy about this number of 10 seconds, some
installations might want to change this default.

So this patch adds a configuration option, "--alternator-timeout-in-ms",
to choose this timeout. As before, it defaults to 10 seconds (10,000ms).

In particular, some test runs are unusually slow - consider for example
testing a debug build (which is already very slow) in an extremely
over-comitted test host. In some cases (see issue #7706) we noticed
the 10 second timeout was not enough. So in this patch we increase the
default timeout chosen in the "test/alternator/run" script to 30 seconds.

Please note that as the code is structured today, this timeout only
applies to some operations, such as GetItem, UpdateItem or Scan, but
does not apply to CreateTable, for example. This is a pre-existing
issue that this patch does not change.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201207122758.2570332-1-nyh@scylladb.com>
2020-12-09 14:30:43 +01:00
Piotr Jastrzebski
d2897d8f8b alternator: guard streams with an experimental flag
Add new alternator-streams experimental flag for
alternator streams control.

CDC becomes GA and won't be guarded by an experimental flag any more.
Alternator Streams stay experimental so now they need to be controlled
by their own experimental flag.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-12 12:36:16 +01:00
Nadav Har'El
509a41db04 alternator: change name of Alternator's SSL options
When Alternator is enabled over HTTPS - by setting the
"alternator_https_port" option - it needs to know some SSL-related options,
most importantly where to pick up the certificate and key.

Before this patch, we used the "server_encryption_options" option for that.
However, this was a mistake: Although it sounds like these are the "server's
options", in fact prior to Alternator this option was only used when
communicating with other servers - i.e., connections between Scylla nodes.
For CQL connections with the client, we used a different option -
"client_encryption_options".

This patch introduces a third option "alternator_encryption_options", which
controls only Alternator's HTTPS server. Making it separate from the
existing CQL "client_encryption_options" allows both Alternator and CQL to
be active at the same time but with different certificates (if the user
so wishes).

For backward compatibility, we temporarily continue to allow
server_encryption_options to control the Alternator HTTPS server if
alternator_encryption_options is not specified. However, this generates
a warning in the log, urging the user to switch. This temporary workaround
should be removed in a future version.

This patch also:
1. fixes the test run code (which has an "--https" option to test over
   https) to use the new name of the option.
2. Adds documentation of the new option in alternator.md and protocols.md -
   previously the information on how to control the location of the
   certificate was missing from these documents.

Fixes #7204.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200930123027.213587-1-nyh@scylladb.com>
2020-10-14 18:13:57 +03:00
Nadav Har'El
8f12ef3628 alternator test: faster stream tests by reducing number of vnodes
The single-node test Scylla run by test/alternator/run uses, as the
default, 256 vnodes. When we have 256 vnodes and two shards, our CDC
implementation produces 512 separate "streams" (called "shards" in
DynamoDB lingo). This causes each of the tests in test_streams.py
which need to read data from the stream to need to do 1024 (!) API
requests (512 calls to GetShardIterator and 512 calls to GetRecords)
which takes significant time - about a second per test.

In this patch, we reduce the number of vnodes to 16. We still have
a non-negligible number of stream "shards" (32) so this part of the
CDC code is still exercised. Moreover, to ensure we still routinely
test the paging feature of DescribeStream (whose default page size
is 100), the patch changes the request to use a Limit of 10, so
paging will still be used to retrieve the list of 32 shards.

The time to run the 27 tests in test_streams.py, on my laptop:
Before this patch: 26 seconds
After this patch:   6 seconds.

Fixes #6979
Message-Id: <20200805093418.1490305-1-nyh@scylladb.com>
2020-08-05 19:34:18 +03:00
Nadav Har'El
ae25661d9c alternator test: set streams time window to zero
Alternator Streams have a "alternator_streams_time_window_s" parameter which
is used to allow for correct ordering in the stream in the face of clock
differences between Scylla nodes and possibly network delays. This parameter
currently defaults to 10 seconds, and there is a discussion on issue #6929
on whether it is perhaps too high. But in any case, for tests running on a
single node there is no reason not to set this parameter to zero.

Setting this parameter to zero greatly speeds up the Alternator Streams
tests which use ReadRecords to read from the stream. Previously each such
test took at least 10 seconds, because the data was only readable after a
10 second delay. With alternator_streams_time_window_s=0,  these tests can
finish in less than a second. Unfortunately they are still relatively slow
because our Streams implementation has 512 shards, and thus we need over a
thousand (!) API calls to read from the stream).

Running "test/alternator/run test_streams.py" with 25 tests took before
this patch 114 seconds, after this patch, it is down to 18 seconds.

Refs #6929

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Reviewed-by: Calle Wilund <calle@scylladb.com>
Message-Id: <20200728184612.1253178-1-nyh@scylladb.com>
2020-08-03 09:19:57 +03:00
Nadav Har'El
665b78253a alternator test: reduce amount of Scylla logs saved
The test/alternator/run script follows the pytest log with a full log of
Scylla. This saved log can be useful in diagnosing problems, but most of
it is filled with non-useful "INFO"-level messages. The two biggest
offenders are compaction - which logs every single compaction happening,
and the migration manager, which is just a second (and very long) message
about schema change operations (e.g., table creations). Neither of these
are interesting for Alternator's tests, which shouldn't care exactly when
compaction of which sstable is happening. These two components alone
are reponsible for 80% of the log lines, and 90% of the log bytes!

In this patch we increase the log level of just these two components -
compaction and migration_manager - to WARN, which reduces the log
by the same percentages (80% by lines, 90% by bytes).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200728191420.1254961-1-nyh@scylladb.com>
2020-07-29 14:17:12 +03:00
Nadav Har'El
e0693f19d0 alternator test: produce newer xunit format for test results
test.py passes the "--junit-xml" option to test/alternator/run, which passes
this option to pytest to get an xunit-format summary of the test results.
However, unfortunately until very recent versions (which aren't yet in Linux
distributions), pytest defaulted to a non-standard xunit format which tools
like Jenkins couldn't properly parse. The more standard format can be chosen
by passing the option "-o junit_family=xunit2", so this is what we do in
this patch.

Fixes #6767 (hopefully).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200719203414.985340-1-nyh@scylladb.com>
2020-07-20 09:24:50 +03:00
Nadav Har'El
c4497bf770 alternator test: enable experimental CDC
In the script test/alternator/run, which runs Scylla for the Alternator
tests, add the "--experimental-features=cdc" option, to allow us testing
the streams API whose implementation requires the experimenal CDC feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2020-07-16 08:18:09 +03:00
Nadav Har'El
8e3be5e7d6 alternator test: configurable temporary directory
The test/alternator/run script creates a temporary directory for the Scylla
database in /tmp. The assumption was that this is the fastest disk (usually
even a ramdisk) on the test machine, and we didn't need anything else from
it.

But it turns out that on some systems, /tmp is actually a slow disk, so
this patch adds a way to configure the temporary directory - if the TMPDIR
environment variable exists, it is used instead of /tmp. As before this
patch, a temporary subdirectry is created in $TMPDIR, and this subdirectory
is automatically deleted when the test ends.

The test.py script already passes an appropriate TMPDIR (testlog/$mode),
which after this patch the Alternator test will use instead of /tmp.

Fixes #6750

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200713193023.788634-1-nyh@scylladb.com>
2020-07-14 08:52:22 +03:00
Nadav Har'El
c3da9f2bd4 alternator: add mandatory configurable write isolation mode
Alternator supports four ways in which write operations can use quorum
writes or LWT or both, which we called "write isolation policies".

Until this patch, Alternator defaulted to the most generally safe policy,
"always_use_lwt". This default could have been overriden for each table
separately, but there was no way to change this default for all tables.
This patch adds a "--alternator-write-isolation" configuration option which
allows changing the default.

Moreover, @dorlaor asked that users must *explicitly* choose this default
mode, and not get "always_use_lwt" without noticing. The previous default,
"always_use_lwt" supports any workload correctly but because it uses LWT
for all writes it may be disappointingly slow for users who run write-only
workloads (including most benchmarks) - such users might find the slow
writes so disappointing that they will drop Scylla. Conversely, a default
of "forbid_rmw" will be faster and still correct, but will fail on workloads
which need read-modify-write operations - and suprise users that need these
operations. So Dor asked that that *none* of the write modes be made the
default, and users must make an informed choice between the different write
modes, rather than being disappointed by a default choice they weren't
aware of.

So after this patch, Scylla refuses to boot if Alternator is enabled but
a "--alternator-write-isolation" option is missing.

The patch also modifies the relevant documentation, adds the same option to
our docker image, and the modifies the test-running script
test/alternator/run to run Scylla with the old default mode (always_use_lwt),
which we need because we want to test RMW operations as well.

Fixes #6452

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200524160338.108417-1-nyh@scylladb.com>
2020-05-27 08:40:05 +03:00
Nadav Har'El
16b0680c40 alternator test: run Scylla with a different executable name
The Alternator test (test/alternator/run) runs the real Scylla executable
to test it. Users sometimes want to run Scylla manually in parallel (on
different IP addresses, of course) and sometimes use commands like
"killall scylla" to stop it, may be surprised that this command will also
unintentionally kill a running test.

So what this patch does is to name the Scylla process used for the test
with the name "test_scylla". It will be visible as "test_scylla" in top,
and a "killall scylla" will not touch it. You can, of course, kill it with
a "killall test_scylla" if you wish.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200519071604.19161-1-nyh@scylladb.com>
2020-05-19 18:23:15 +03:00
Nadav Har'El
1b807a5018 alternator test: better recognition that Alternator failed to boot
The test/alternator/run script starts Scylla to be tested. It waits until
CQL is responsive and if Scylla dies earlier, recognizes the failure
immediately. This is useful so we see boot errors immediately instead of
waiting for the first test to timeout and fail.

However, Scylla starts the Alternator service after CQL. So it is possible
that after the "run" script found CQL to be up, Alternator couldn't start
(e.g., bad configuration parameters) and Scylla is shut down, and instead
of recognizing this situation, we start the actual test.

The fix is simple: don't start the tests until verifying that Alternator
is up. We verify this using the trivial healthcheck request (which is
nothing more than an HTTP GET request).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200517125851.8484-1-nyh@scylladb.com>
2020-05-17 18:33:27 +02:00
Nadav Har'El
ff5615d59d alternator test: drastically reduce time to boot Scylla
The alternator test, test/alternator/run, runs Scylla and runs the
various tests against it. Before this patch, just booting Scylla took
about 26 seconds (for a dev build, on my laptop). This patch reduces
this delay to less than one second!

It turns out that almost the entire delay was artificial, two periods
of 12 seconds "waiting for the gossip to settle", which are completely
unnecessary in the one-node cluster used in the Alternator test.
So a simple "--skip-wait-for-gossip-to-settle 0" parameter eliminates
these long delays completely.

Amusingly, the Scylla boot is now so fast, that I had to change a "sleep 2"
in the test script to "sleep 1", because 2 seconds is now much more than
it takes to boot Scylla :-)

Fixes #6310.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200428145035.22894-1-nyh@scylladb.com>
2020-04-29 07:55:03 +02:00
Nadav Har'El
92e36c5df5 test/alternator: increase timeout on Scylla boot
The Alternator test boots Scylla to test against it. We set an arbitrary
timeout for this boot to succeed: 100 seconds. This 100 seconds is
significantly more than 25 seconds it takes on my laptop, and I though
we'll never reach it. But it turns out that in some setups - running the
very slow debug build on slow and overcommitted nodes - 100 seconds is
not enough.

So this patch doubles the timeout to 200 seconds.

Note that this "200 seconds" is just a timeout, and doesn't affect normal
runs: Both a successful boot and a failed boot are recognized as soon as
they happen, and we never unnecessarily wait the entire 200 seconds.

Fixes #6271.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200422193920.17079-1-nyh@scylladb.com>
2020-04-23 07:47:21 +02:00
Avi Kivity
2482e53de9 test: alternator: configure scylla for test environment in terms of cpu and disk
Currently, the alternator tests configure scylla to use all the
logical cores in the host system, but only 1GB of RAM. This can lead
to a small amount of memory per core.

It also uses the default disk configuration, which is safe, but can be
very slow on mechanical or non-enterprise disks.

Change to use a fixed --smp 2 configuration, and add --overprovisioned
for maximum flexibility (no spinning). Use --unsafe-bypass-fsync
for faster performance on non-enterprise or mechanical disks, assuming
that the test data is not important.

Fixes #6251.
Message-Id: <20200420154112.123386-1-avi@scylladb.com>
2020-04-20 18:50:46 +03:00
Nadav Har'El
4e2bf28b84 alternator-test: make Alternator tests runnable from test.py
To make the tests in alternator-test runnable by test.py, we need to
move the directory alternator-test/ to test/alternator, because test.py
only looks for tests in subdirectories of test/. Then, we need to create
a test/alternator/suite.yaml saying that this test directory is of type
"Run", i.e., it has a single run script "run" which runs all its tests.

The "run" script had to be slightly modified to be aware of its new
location relative to the source directory.

To run the Alternator tests from test.py, do:

	./test.py --mode dev alternator

Note that in this version, the "--mode" has no effect - test/alternator/run
always runs the latest compiled Scylla, regardless of the chosen mode.

The Alternator tests can still be run manually and individually against
a running Scylla or DynamoDB as before - just go to the test/alternator
directory (instead of alternator-test previously) and run "pytest" with
the desired parameters.

Fixes #6046

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2020-04-12 16:27:45 +03:00