Commit Graph

3669 Commits

Author SHA1 Message Date
Avi Kivity
cf3830a249 Merge 'Add support for TRUNCATE USING TIMEOUT' from Benny Halevy
Extend the cql3 truncate statement to accept attributes,
similar to modification statements.

To achieve that we define cql3::statements::raw::truncate_statement
derived from raw::cf_statement, and implement its pure virtual
prepare() method to make a prepared truncate_statement.

The latter is no longer derived from raw::cf_statement,
and just stores a schema_ptr to get to the keyspace and column_family.

`test_truncate_using_timeout` cql-pytest was added to test
the new USING TIMEOUT feature.

Fixes #11408

Also, update docs/cql/ddl.rst truncate-statement section respectively.

Closes #11409

* github.com:scylladb/scylladb:
  docs: cql-extensions: add TRUNCATE to USING TIMEOUT section.
  docs: cql: ddl: add support for TRUNCATE USING TIMEOUT
  cql3, storage_proxy: add support for TRUNCATE USING TIMEOUT
  cql3: selectStatement: restrict to USING TIMEOUT in grammar
  cql3: deleteStatement: restrict to USING TIMEOUT|TIMESTAMP in grammar
2022-09-28 18:19:03 +03:00
Nadav Har'El
de1bc147bc Merge 'test.py: cleanups in topology test suites' from Kamil Braun
Fix the type of `create_server`, rename `topology_for_class` to `get_cluster_factory`, simplify the suite definitions and parameters passed to `get_cluster_factory`

Closes #11590

* github.com:scylladb/scylladb:
  test.py: replace `topology` with `cluster_size` in Topology tests
  test.py: rename `topology_for_class` to `get_cluster_factory`
  test/pylib: ScyllaCluster: fix create_server parameter type
2022-09-28 15:19:54 +03:00
Kamil Braun
1bcc28b48b test/topology_raft_disabled: reenable test_raft_upgrade
The test was disabled due to a bug in the Python driver which caused the
driver not to reconnect after a node was restarted (see
scylladb/python-driver#170).

Introduce a workaround for that bug: we simply create a new driver
session after restarting the nodes. Reenable the test.

Closes #11641
2022-09-28 15:13:42 +03:00
Mikołaj Grzebieluch
be8fcba8c1 raft: broadcast_tables: add support for bind variables
Extended the queries language to support bind variables which are bound in the
execution stage, before creating a raft command.

Adjusted `test_broadcast_tables.py` to prepare statements at the beginning of the test.

Fixed a small bug in `strongly_consistent_modification_statement::check_access`.

Closes #11525
2022-09-28 09:54:59 +03:00
Alejo Sanchez
02933c9b82 test.py: close aiohttp session for topology tests
Close the aiohttp ClientSession after pytest session finishes.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #11648
2022-09-27 18:09:08 +02:00
Kamil Braun
82481ae31b Merge 'raft server, log size limit in bytes' from Gusev Petr
Before this patch we could get an OOM if we
received several big commands. The number of
commands was small, but their total size
in bytes was large.

snapshot_trailing_size is needed to guarantee
progress. Without this limit the fsm could
get stuck if the size of the next item is greater than
 max_log_size - (size of trailing entries).

Closes #11397

* github.com:scylladb/scylladb:
  raft replication_test, make backpressure test to do actual backpressure
  raft server, shrink_to_fit on log truncation
  raft server, release memory if add_entry throws
  raft server, log size limit in bytes
2022-09-27 14:25:08 +02:00
Kamil Braun
ed67f0e267 Merge 'test.py: fix topology init error handling' from Alecco
When there are errors starting the first cluster(s) the logs of the server logs are needed. So move `.start()` to the `try` block in `test.py` (out of `asynccontextmanager`).

While there, make `ScyllaClusterManager.start()` idempotent.

Closes #11594

* github.com:scylladb/scylladb:
  test.py: fix ScyllaClusterManager start/stop
  test.py: fix topology init error handling
2022-09-27 11:36:07 +02:00
Petr Gusev
bc50b7407f raft replication_test, make backpressure test to do actual backpressure
Before this patch this test didn't actually experience any backpressure since all the commands were executed sequentially.
2022-09-27 12:04:14 +04:00
Petr Gusev
b34dfed307 raft server, release memory if add_entry throws
We consume memory from semaphore in add_entry_on_leader, but never release it if add_entry throws.
2022-09-27 12:02:34 +04:00
Benny Halevy
64140ccf05 cql3, storage_proxy: add support for TRUNCATE USING TIMEOUT
Extend the cql3 truncate statement to accept attributes,
similar to modification statements.

To achieve that we define cql3::statements::raw::truncate_statement
derived from raw::cf_statement, and implement its pure virtual
prepare() method to make a prepared truncate_statement.

The latter, statements::truncate_statement, is no longer derived
from raw::cf_statement, and just stores a schema_ptr to get to the
keyspace and column_family names.

`test_truncate_using_timeout` cql-pytest was added to test
the new USING TIMEOUT feature.

Fixes #11408

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 18:30:39 +03:00
Benny Halevy
27d3e48005 cql3: selectStatement: restrict to USING TIMEOUT in grammar
It is preferred to reject USING TLL / TIMESTAMP at the grammar
level rather than functionally validating the USING attributes.

test_using_timeout was adjusted respectively to expect the
`SyntaxException` error rather than `InvalidRequest`.

Note that cql3::statements::raw::select_statement validate_attrs
now asserts that the ttl or the timestamp attributes aren't set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 18:30:39 +03:00
Benny Halevy
0728d33d5f cql3: deleteStatement: restrict to USING TIMEOUT|TIMESTAMP in grammar
It is preferred to reject USING TLL / TIMESTAMP at the grammar
level rather than functionally validating the USING attributes.

test_using_timeout was adjusted respectively to expect the
`SyntaxException` error rather than `InvalidRequest`.

Note that now delete_statement ctor asserts that the ttl
attribute is not set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 18:30:39 +03:00
Kamil Braun
696bdb2de7 test.py: replace topology with cluster_size in Topology tests
First, a reminder of a few basic concepts in Scylla:
- "topology" is a mapping: for each node, its DC and Rack.
- "replication strategy" is a method of calculating replica sets in
  a cluster. It is not a cluster-global property; each keyspace can have
  a different replication strategy. A cluster may have multiple
  keyspaces.
- "cluster size" is the number of nodes in a cluster.

Replication strategy is orthogonal to topology. Cluster size can be
derived from topology and is also orthogonal to replication strategy.

test.py was confusing the three concepts together. For some reason,
Topology suites were specifying a "topology" parameter which contained
replication strategy details - having nothing to do with topology. Also
it's unclear why a test suite would specify anything to do with
replication strategies - after all, a test may create keyspaces with
different replication strategies, and a suite may contain multiple
different tests.

Get rid of the "topology" parameter, replace it with a simple
"cluster_size". In the future we may re-introduce it when we actually
implement the possibility to start clusters with custom topologies
(which involves configuring the snitch etc.) Simplify the test.py code.
2022-09-26 15:17:50 +02:00
Kamil Braun
06cc4f9259 test/pylib: ScyllaCluster: fix create_server parameter type
The only usage of `ScyllaCluster` constructor passed a `create_server`
function which expected a `List[str]` for the second parameter, while
the constructor specified that the function should expect an
`Optional[List[str]]`. There was no reason for the latter, we can easily
fix this type error.

Also give a type hint for `create_cluster` function in
`PythonTestSuite.topology_for_class`. This is actually what catched the
type error.
2022-09-26 11:45:44 +02:00
Petr Gusev
27e60ecbf4 raft server, log size limit in bytes
Before this patch we could get an OOM if we
received several big commands. The number of
commands was small, but their total size
in bytes was large.

snapshot_trailing_size is needed to guarantee
progress. Without this limit the fsm could
get stuck if the size of the next item is
greater than max_log_size - (size of trailing entries).
2022-09-26 13:10:10 +04:00
Nadav Har'El
868a884b79 test/cql-pytest: add reproducer for ignored IS NOT NULL
This test reproduces issue #10365: It shows that although "IS NOT NULL" is
not allowed in regular SELECT filters, in a materialized view it is allowed,
even for non-key columns - but then outright ignored and does not actually
filter out anything - a fact which already surprised several users.

The test also fails on Cassandra - it also wrongly allows IS NOT NULL
on the non-key columns but then ignores this in the filter. So the test
is marked with both xfail (known to fail on Scylla) and cassandra_bug
(fails on Cassandra because of what we consider to be a Cassandra bug).

Refs #10365
Refs #11606

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11615
2022-09-26 09:02:08 +03:00
Piotr Sarna
481240b8b4 Merge 'Alternator: Run more TTL tests by default (and add a test for metrics)' from Nadav Har'El
We had quite a few tests for Alternator TTL in test/alternator, but most
of them did not run as part of the usual Jenkins test suite, because
they were considered "very slow" (and require a special "--runveryslow"
flag to run).

In this series we enable six tests which run quickly enough to run by
default, without an additional flag. We also make them even quicker -
the six tests now take around 2.5 seconds.

I also noticed that we don't have a test for the Alternator TTL metrics
- and added one.

Fixes #11374.
Refs https://github.com/scylladb/scylla-monitoring/issues/1783

Closes #11384

* github.com:scylladb/scylladb:
  test/alternator: insert test names into Scylla logs
  rest api: add a new /system/log operation
  alternator ttl: log warning if scan took too long.
  alternator,ttl: allow sub-second TTL scanning period, for tests
  test/alternator: skip fewer Alternator TTL tests
  test/alternator: test Alternator TTL metrics
2022-09-22 09:47:50 +02:00
Petr Gusev
210d9dd026 raft: fix snapshots leak
applier_fiber could create multiple snapshots between
io_fiber run. The fsm_output.snp variable was
overwritten by applier_fiber and io_fiber didn't drop
the previous snapshot.

In this patch we introduce the variable
fsm_output.snps_to_drop, store in it
the current snapshot id before applying
a new one, and then sequentially drop them in
io_fiber after storing the last snapshot_descriptor.

_sm_events.signal() is added to fsm::apply_snapshot,
since this method mutates the _output and thus gives a
reason to run io_fiber.

The new test test_frequent_snapshotting demonstrates
the problem by causing frequent snapshots and
setting the applier queue size to one.

Closes #11530
2022-09-21 12:46:26 +02:00
Kamil Braun
3b096b71c1 test/topology_raft_disabled: disable test_raft_upgrade
For some reason, the test is currently flaky on Jenkins. Apparently the
Python driver does not reconnect to the cluster after the cluster
restarts (well it does, but then it disconnects from one of the nodes
and never reconnects again). This causes the test to hang on "waiting
until driver reconnects to every server" until it times out.

Disable it for now so it doesn't block next promotion.
2022-09-21 12:32:40 +02:00
Alejo Sanchez
510215d79a test.py: fix ScyllaClusterManager start/stop
Check existing is_running member to avoid re-starting.

While there, set it to false after stopping.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-21 11:42:02 +02:00
Alejo Sanchez
933d93d052 test.py: fix topology init error handling
Start ScyllaClusterManager within error handling so the ScyllaCluster
logs are available in case of error starting up.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-21 09:15:25 +02:00
Avi Kivity
2cec417426 Merge 'tools: use the standard allocator' from Botond Dénes
Tools want to be as little disrupting to the environment they run in as possible, because they might be run in a production environment, next to a running scylladb production server. As such, the usual behavior of seastar applications w.r.t. memory is an anti-pattern for tools: they don't want to reserve most of the system memory, in fact they don't want to reserve any amount, instead consuming as much as needed on-demand.
To achieve this, tools want to use the standard allocator. To achieve this they need a seastar option to to instruct seastar to *not* configure and use the seastar allocator and they need LSA to cooperate with the standard allocator.
The former is provided by https://github.com/scylladb/seastar/pull/1211.
The latter is solved by introducing the concept of a `segment_store_backend`, which abstracts away how the memory arena for segments is acquired and managed. We then refactor the existing segment store so that the seastar allocator specific parts are moved to an implementation of this backend concept, then we introduce another backend implementation appropriate to the standard allocator.
Finally, tools configure seastar with the newly introduced option to use the standard allocator and similarly configure LSA to use the standard allocator appropriate backend.

Refs: https://github.com/scylladb/scylladb/issues/9882
This is the last major code piece in scylla for making tools production ready.

Closes #11510

* github.com:scylladb/scylladb:
  test/boost: add alternative variant of logalloc test
  tools: use standard allocator
  utils/logalloc: add use_standard_allocator_segment_pool_backend()
  utils/logalloc: introduce segment store backend for standard allocator
  utils/logalloc: rebase release segment-store on segment-store-backend
  utils/logalloc: introduce segment_store_backend
  utils/logalloc: push segment alloc/dealloc to segment_store
  test/boost/logalloc_test: make test_compaction_with_multiple_regions exception-safe
2022-09-20 12:59:34 +03:00
Nadav Har'El
4c93a694b7 cql: validate bloom_filter_fp_chance up-front
Scylla's Bloom filter implementation has a minimal false-positive rate
that it can support (6.71e-5). When setting bloom_filter_fp_chance any
lower than that, the compute_bloom_spec() function, which writes the bloom
filter, throws an exception. However, this is too late - it only happens
while flushing the memtable to disk, and a failure at that point causes
Scylla to crash.

Instead, we should refuse the table creation with the unsupported
bloom_filter_fp_chance. This is also what Cassandra did six years ago -
see CASSANDRA-11920.

This patch also includes a regression test, which crashes Scylla before
this patch but passes after the patch (and also passes on Cassandra).

Fixes #11524.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11576
2022-09-20 06:18:51 +03:00
Botond Dénes
60991358e8 Merge 'Improvements to test/lib/sstable_utils.hh' from Raphael "Raph" Carvalho
Changes done to avoid pitfalls and fix issues of sstable-related unit tests

Closes #11578

* github.com:scylladb/scylladb:
  test: Make fake sstables implicitly belong to current shard
  test: Make it clearer that sstables::test::set_values() modify data size
2022-09-20 06:14:07 +03:00
Nadav Har'El
a1ff865c77 Merge 'test/topology_raft_disabled: write basic raft upgrade test' from Kamil Braun
The test changes the servers' configuration to include `raft`
in the `experimental-features` list, then restarts them.

It waits until driver reconnects to every server after restarting.
Then it checks that upgrade eventually finishes on every server by
querying `group0_upgrade_state` key in `system.scylla_local`. Finally,
it performs a schema change and verifies that a corresponding entry has
appeared in `system.group0_history`.

The commit also increases the number of clusters in the suite cluster
pool. Since the suite contains only one test at this time this only has
an effect if we run the test multiple times (using `--repeat`).

Closes #11563

* github.com:scylladb/scylladb:
  test/topology_raft_disabled: write basic raft upgrade test
  test: setup logging in topology suites
2022-09-19 20:27:08 +03:00
Alejo Sanchez
087ae521c5 test.py: make client fail if before test check fails
Check if request to server side (test.py) failed and raise if so.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #11575
2022-09-19 18:04:07 +02:00
Raphael S. Carvalho
2f52698a26 test: Make fake sstables implicitly belong to current shard
Fake SSTables will be implicitly owned by the shard that created them,
allowing them to be called on procedures that assert the SSTables
are owned by the current shard, like the table's one that rebuilds
the sstable set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-19 12:05:24 -03:00
Raphael S. Carvalho
697f200319 test: Make it clearer that sstables::test::set_values() modify data size
By adding a param with default value, we make it clear in the interface
that the procedure modifies sstable data size.

It can happen one calls this function without noticing it overrides
the data size previously set using a different function.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-19 12:01:24 -03:00
Kamil Braun
b770443300 test/topology_raft_disabled: write basic raft upgrade test
The test changes the servers' configuration to include `raft`
in the `experimental-features` list, then restarts them.

It waits until driver reconnects to every server after restarting.
Then it checks that upgrade eventually finishes on every server by
querying `group0_upgrade_state` key in `system.scylla_local`. Finally,
it performs a schema change and verifies that a corresponding entry has
appeared in `system.group0_history`.

The commit also increases the number of clusters in the suite cluster
pool. Since the suite contains only one test at this time this only has
an effect if we run the test multiple times (using `--repeat`).
2022-09-19 13:29:35 +02:00
Kamil Braun
fd986bfed1 test: setup logging in topology suites
Make it possible to use logging from within tests in the topology
suites. The tests are executed using `pytest`, which uses a `pytest.ini`
file for logging configuration.

Also cleanup the `pytest.ini` files a bit.
2022-09-19 12:23:11 +02:00
Piotr Sarna
5597bc8573 Merge 'Alternator: test and fix crashes and errors...
when using ":attrs" attribute' from Nadav Har'El

This PR improves the testing for issue #5009 and fixes most of it (but
not all - see below). Issue #5009 is about what happens when a user
tries to use the name `:attrs` for an attribute - while Alternator uses
a map column with that name to hold all the schema-less attributes of an
item.  The tests we had for this issue were partial, and missed the
worst cases which could result in Scylla crashing on specially-crafted
PutItem or UpdateItem requests.

What the tests missed were the cases that `:attrs` is used as a
**non-key**. So in this PR we add additional tests for this case,
several of them fail or even crash Scylla, and then we fix all these
cases.

Issue #5009 remains open because using `:attrs` as the name of a **key**
is still not allowed. But because it results in a clean error message
when attempting to create a table with such a key, I consider this
remaining problem very minor.

Refs #5009.

Closes #11572

* github.com:scylladb/scylladb:
  alternator: fix crashes an errors when using ":attrs" attribute
  alternator: improve tests for reserved attribute name ":attrs"
2022-09-19 09:48:06 +02:00
Nadav Har'El
999ca2d588 alternator: fix crashes an errors when using ":attrs" attribute
Alternator uses a single column, a map, with the deliberately strange
name ":attrs", to hold all the schema-less attributes of an item.
The existing code is buggy when the user tries to write to an attribute
with this strange name ":attrs". Although it is extremely unlikely that
any user would happen to choose such a name, it is nevertheless a legal
attribute name in DynamoDB, and should definitely not cause Scylla to crash
as it does in some cases today.

The bug was caused by the code assuming that to check whether an attribute
is stored in its own column in the schema, we just need to check whether
a column with that name exists. This is almost true, except for the name
":attrs" - a column with this name exists, but it is a map - the attribute
with that name should be stored *in* the map, not as the map. The fix
is to modify that check to special-case ":attrs".

This fix makes the relevant tests, which used to crash or fail, now pass.

This fix solves most of #5009, but one point is not yet solved (and
perhaps we don't need to solve): It is still not allowed to use the
name ":attrs" for a **key** attribute. But trying to do that fails cleanly
(during the table creation) with an appropriate error message, so is only
a very minor compatibility issue.

Refs #5009

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-09-19 10:30:11 +03:00
Nadav Har'El
6f8dca3760 alternator: improve tests for reserved attribute name ":attrs"
As explained in issue #5009, Alternator currently forbids the special
attribute name ":attrs", whereas DynamoDB allows any string of approriate
length (including the specific string ":attrs") to be used.

We had only a partial test for this incompatibility, and this patch
improves the testing of this issue. In particular, we were missing a
test for the case that the name ":attrs" was used for a non-key
attribute (we only tested the case it was used as a sort key).

It turns out that Alternator crashes on the new test, when the test tries
to write to a non-key attribute called ":attrs", so we needed to mark
the new test with "skip". Moreover, it turns out that different code paths
handle the attribute name ":attrs" differently, and also crash or fail
in other ways - so we added more than one xfailing and skipped tests
that each fails in a different place (and also a few tests that do pass).

As usual, the new tests we checked to pass on DynamoDB.

Refs #5009

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-09-19 10:30:06 +03:00
Kamil Braun
348582c4c8 test/pylib: pool: make it possible to free up space
Some tests mark clusters as 'dirty', which makes them non-reusable by
later tests; we don't want to return them to the pool of clusters.

This use-case was covered by the `add_one` function in the `Pool` class.
However, it had the unintended side effect of creating extra clusters
even if there were no more tests that were waiting for new clusters.

Rewrite the implementation of `Pool` so it provides 3 interface
functions:
- `get` borrows an object, building it first if necessary
- `put` returns a borrowed object
- `steal` is called by a borrower to free up space in the pool;
  the borrower is then responsible for cleaning up the object.

Both `put` and `steal` wake up any outstanding `get` calls. Objects are
built only in `get`, so no objects are built if none are needed.

Closes #11558
2022-09-18 12:05:57 +03:00
Botond Dénes
22128977e4 test/boost: add alternative variant of logalloc test
Which intializes LSA with use_standard_allocator_segment_pool_backend()
running the logalloc_test suite on the standard allocator segment pool
backend. To avoid duplicating the test code, the new test-file pulls in
the test code via #include. I'm not proud of it, but it works and we
test LSA with both the debug and standard memory segment stores without
duplicating code.
2022-09-16 14:57:23 +03:00
Kamil Braun
0a6f601996 Merge 'Raft test topology fix request paths and API response handling' from Alecco
- Raise on response not HTTP 200 for `.get_text()` helper
- Fix API paths
- Close and start a fresh driver when restarting a server and it's the only server in the cluster
- Fix stop/restart response as text instead of inspecting (errors are status 500 and raise exceptions)

Closes #11496

* github.com:scylladb/scylladb:
  test.py: handle duplicate result from driver
  test.py: log server restarts for topology tests
  test.py: log actions for topology tests
  Revert "test.py: restart stopped servers before...
  test.py: ManagerClient API fix return text
  test.py: ManagerClient raise on HTTP != 200
  test.py: ManagerClient fix paths to updated resource
2022-09-16 11:29:10 +02:00
Botond Dénes
e82ea2f3ad test/boost/logalloc_test: make test_compaction_with_multiple_regions exception-safe
Said test creates two vectors, the vector storage being allocated with
the default allocator, while its content being allocated on LSA. If an
exception is thrown however, both are freed via the default allocator,
triggering an assert in LSA code. Move the cleanup into a `defer()` so
the correct cleanup sequence is executed even on exceptions.
2022-09-16 12:16:57 +03:00
Pavel Emelyanov
fe48b66c0a cross-shard-barrier: Capture shared barrier in complete
When cross-shard barrier is abort()-ed it spawns a background fiber
that will wake-up other shards (if they are sleeping) with exception.

This fiber is implicitly waited by the owning sharded service .stop,
because barrier usage is like this:

    sharded<service> s;
    co_await s.invoke_on_all([] {
        ...
        barrier.abort();
    });
    ...
    co_await s.stop();

If abort happens, the invoke_on_all() will only resolve _after_ it
queues up the waking lambdas into smp queues, thus the subseqent stop
will queue its stopping lambdas after barrier's ones.

However, in debug mode the queue can be shuffled, so the owning service
can suddenly be freed from under the barrier's feet causing use after
free. Fortunately, this can be easily fixed by capturing the shared
pointer on the shared barrier instead of a regular pointer on the
shard-local barrier.

fixes: #11303

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11553
2022-09-16 08:21:02 +03:00
Michał Chojnowski
78850884d2 test: perf: perf_fast_forward: fix an error message
The test is supposed to give a helpful error message when the user forgets to
run --populate before the benchmark. But this must have become broken at some
point, because execute_cql() terminates the program with an unhelpful
("unconfigured table config") message, which doesn't mention --populate.

Fix that by catching the exception and adding the helpful tip.

Closes #11533
2022-09-15 19:30:10 +02:00
Alejo Sanchez
92129f1d47 test.py: handle duplicate result from driver
Sometimes the driver calls twice the callback on ready done future with
a None result. Log it and avoid setting the local future twice.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-15 15:12:50 +02:00
Alejo Sanchez
2da7304696 test.py: log server restarts for topology tests
Add missing logging for server restart.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-15 15:10:29 +02:00
Alejo Sanchez
61a92afa2d test.py: log actions for topology tests
For debugging, log driver connection, before and after checks, and
topology changes.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-15 15:10:29 +02:00
Botond Dénes
05ef13a627 Merge 'Add support to split large partitions across SSTables' from Raphael "Raph" Carvalho
Introduces support to split large partitions during compaction. Today, compaction can only split input data at partition boundary, so a large partition is stored in a single file. But that can cause many problems, like memory pressure (e.g.: https://github.com/scylladb/scylladb/issues/4217), and incremental compaction can also not fulfill its promise as the file storing the large partition can only be released once exhausted.

The first step was to add clustering range metadata for first and last partition keys (retrieved from promoted index), which is crucial to determine disjointness at clustering level, and also the order at which the disjoint files should be opened for incremental reading.

The second step was to extend sstable_run to look at clustering dimension, so a set of files storing disjoint ranges for the same partition can live in the same sstable run.

The final step was to introduce the option for compaction to split large partition being written if it has exceeded the size threshold.

What's next? Following this series, a reader will be implemented for sstable_run that will incrementally open the readers. It can be safely built on the assumption of the disjoint invariant after the second step aforementioned.

Closes #11233

* github.com:scylladb/scylladb:
  test: Add test for large partition splitting on compaction
  compaction: Add support to split large partitions
  sstable: Extend sstable_run to allow disjointness on the clustering level
  sstables: simplify will_introduce_overlapping()
  test: move sstable_run_disjoint_invariant_test into sstable_datafile_test
  test: lib: Fix inefficient merging of mutations in make_sstable_containing()
  sstables: Keep track of first partition's first pos and last partition's last pos
  sstables: Rename min/max position_range to a descriptive name
  sstables_manager: Add sstable metadata reader concurrency semaphore
  sstables: Add ability to find first or last position in a partition
2022-09-15 16:08:56 +03:00
Alejo Sanchez
604f7353ef Revert "test.py: restart stopped servers before...
teardown..."

This reverts commit df1ca57fda.

In order to prevent timeouts on teardown queries, the previous commit
added functionality to restart servers that were down. This issue is
fixed in fc0263fc9b so there's no longer need to restart stopped servers
on test teardown.
2022-09-15 14:47:01 +02:00
Alejo Sanchez
ed81f1a85c test.py: ManagerClient API fix return text
For ManagerClient request API, don't return status, raise an exception.
Server side errors are signaled by status 500, not text body.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-15 14:47:01 +02:00
Alejo Sanchez
4a5f2418ec test.py: ManagerClient raise on HTTP != 200
Raise an exception if the request result is not HTTP 200 for .get()
helper.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-15 14:47:01 +02:00
Alejo Sanchez
a84bde38c0 test.py: ManagerClient fix paths to updated resource
Fix missing path renames for server-side rename
"node" -> "server" API.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-09-15 14:47:01 +02:00
Kamil Braun
728161003a Merge 'raft server, abort on background errors' from Gusev Petr
Halted background fibers render raft server effectively unusable, so
report this explicitly to the clients.

Fix: #11352

Closes #11370

* github.com:scylladb/scylladb:
  raft server, status metric
  raft server, abort group0 server on background errors
  raft server, provide a callback to handle background errors
  raft server, check aborted state on public server public api's
2022-09-15 14:12:11 +02:00
Alejo Sanchez
b8f68729b0 test.py: Pool add fresh when item not returned
Pool.get() might have waiting callers, so if an item is not returned
to the pool after use, tell the pool to add a new one and tell the pool
an entry was taken (used for total running entries, i.e. clusters).

Use it when a ScyllaCluster is dirty and not returned.

While there improve logging and docstrings.

Issue reported by @kbr-.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #11546
2022-09-15 13:56:44 +03:00
Alejo Sanchez
7e3389ee43 test.py: schema timeout less than request timeout
When a server is down, the driver expects multiple schema timeouts
within the same request to handle it properly.

Found by @kbr-

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #11544
2022-09-15 11:43:52 +03:00