Commit Graph

33189 Commits

Author SHA1 Message Date
Pavel Emelyanov
826244084e table: Rewrap retry loop
The existing loop is very branchy in its attempts to find out whether or
not to abort. The "allowed_retries" count can be a good indicator of the
decision taken. This makes the code notably shorter and easier to extend

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-29 19:14:46 +03:00
Asias He
c194c811df repair: Yield in repair_service::do_decommission_removenode_with_repair
When walking through the ranges, we should yield to prevent stalls. We
do similar yield in other node operations.

Fix a stall in 5.1.dev.20220724.f46b207472a3 with build-id
d947aaccafa94647f71c1c79326eb88840c5b6d2

```
!INFO | scylla[6551]: Reactor stalled for 10 ms on shard 0. Backtrace:
0x4bbb9d2 0x4bba630 0x4bbb8e0 0x7fd365262a1f 0x2face49 0x2f5caff
0x36ca29f 0x36c89c3 0x4e3a0e1
````

Fixes #11146

Closes #11160
2022-09-28 18:21:35 +03:00
Avi Kivity
cf3830a249 Merge 'Add support for TRUNCATE USING TIMEOUT' from Benny Halevy
Extend the cql3 truncate statement to accept attributes,
similar to modification statements.

To achieve that we define cql3::statements::raw::truncate_statement
derived from raw::cf_statement, and implement its pure virtual
prepare() method to make a prepared truncate_statement.

The latter is no longer derived from raw::cf_statement,
and just stores a schema_ptr to get to the keyspace and column_family.

`test_truncate_using_timeout` cql-pytest was added to test
the new USING TIMEOUT feature.

Fixes #11408

Also, update docs/cql/ddl.rst truncate-statement section respectively.

Closes #11409

* github.com:scylladb/scylladb:
  docs: cql-extensions: add TRUNCATE to USING TIMEOUT section.
  docs: cql: ddl: add support for TRUNCATE USING TIMEOUT
  cql3, storage_proxy: add support for TRUNCATE USING TIMEOUT
  cql3: selectStatement: restrict to USING TIMEOUT in grammar
  cql3: deleteStatement: restrict to USING TIMEOUT|TIMESTAMP in grammar
2022-09-28 18:19:03 +03:00
Avi Kivity
19374779bb Merge 'Fix large data warning and docs' from Benny Halevy
The series contains fixes for system.large_* log warning and respective documentation.
This prepares the way for adding a new system.large_collections table (See #11449):

Fixes #11620
Fixes #11621
Fixes #11622

the respective fixes should be backported to different release branches, based on the respective patches they depend on (mentioned in each issue).

Closes #11623

* github.com:scylladb/scylladb:
  docs: adjust to sstable base name
  docs: large-partition-table: adjust for additional rows column
  docs: debugging-large-partition: update log warning example
  db/large_data_handler: print static cell/collection description in log warning
  db/large_data_handler: separate pk and ck strings in log warning with delimiter
2022-09-28 17:52:23 +03:00
Nadav Har'El
de1bc147bc Merge 'test.py: cleanups in topology test suites' from Kamil Braun
Fix the type of `create_server`, rename `topology_for_class` to `get_cluster_factory`, simplify the suite definitions and parameters passed to `get_cluster_factory`

Closes #11590

* github.com:scylladb/scylladb:
  test.py: replace `topology` with `cluster_size` in Topology tests
  test.py: rename `topology_for_class` to `get_cluster_factory`
  test/pylib: ScyllaCluster: fix create_server parameter type
2022-09-28 15:19:54 +03:00
Kamil Braun
1bcc28b48b test/topology_raft_disabled: reenable test_raft_upgrade
The test was disabled due to a bug in the Python driver which caused the
driver not to reconnect after a node was restarted (see
scylladb/python-driver#170).

Introduce a workaround for that bug: we simply create a new driver
session after restarting the nodes. Reenable the test.

Closes #11641
2022-09-28 15:13:42 +03:00
Mikołaj Grzebieluch
be8fcba8c1 raft: broadcast_tables: add support for bind variables
Extended the queries language to support bind variables which are bound in the
execution stage, before creating a raft command.

Adjusted `test_broadcast_tables.py` to prepare statements at the beginning of the test.

Fixed a small bug in `strongly_consistent_modification_statement::check_access`.

Closes #11525
2022-09-28 09:54:59 +03:00
Alejo Sanchez
02933c9b82 test.py: close aiohttp session for topology tests
Close the aiohttp ClientSession after pytest session finishes.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #11648
2022-09-27 18:09:08 +02:00
Kamil Braun
82481ae31b Merge 'raft server, log size limit in bytes' from Gusev Petr
Before this patch we could get an OOM if we
received several big commands. The number of
commands was small, but their total size
in bytes was large.

snapshot_trailing_size is needed to guarantee
progress. Without this limit the fsm could
get stuck if the size of the next item is greater than
 max_log_size - (size of trailing entries).

Closes #11397

* github.com:scylladb/scylladb:
  raft replication_test, make backpressure test to do actual backpressure
  raft server, shrink_to_fit on log truncation
  raft server, release memory if add_entry throws
  raft server, log size limit in bytes
2022-09-27 14:25:08 +02:00
Kamil Braun
ed67f0e267 Merge 'test.py: fix topology init error handling' from Alecco
When there are errors starting the first cluster(s) the logs of the server logs are needed. So move `.start()` to the `try` block in `test.py` (out of `asynccontextmanager`).

While there, make `ScyllaClusterManager.start()` idempotent.

Closes #11594

* github.com:scylladb/scylladb:
  test.py: fix ScyllaClusterManager start/stop
  test.py: fix topology init error handling
2022-09-27 11:36:07 +02:00
Petr Gusev
bc50b7407f raft replication_test, make backpressure test to do actual backpressure
Before this patch this test didn't actually experience any backpressure since all the commands were executed sequentially.
2022-09-27 12:04:14 +04:00
Petr Gusev
cbfe033786 raft server, shrink_to_fit on log truncation
We don't want to keep memory we don't use, shrink_to_fit guarantees that.

In fact, boost::deque frees up memory when items are deleted, so this change has little effect at the moment, but it may pay off if we change the container in the future.
2022-09-27 12:02:36 +04:00
Petr Gusev
b34dfed307 raft server, release memory if add_entry throws
We consume memory from semaphore in add_entry_on_leader, but never release it if add_entry throws.
2022-09-27 12:02:34 +04:00
Benny Halevy
b178813cba docs: cql-extensions: add TRUNCATE to USING TIMEOUT section.
List the queries that support the TIMEOUT parameter.

Mention the newly added support for TRUNCATE
USING TIMEOUT.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 18:30:39 +03:00
Benny Halevy
b0bad0b153 docs: cql: ddl: add support for TRUNCATE USING TIMEOUT
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 18:30:39 +03:00
Benny Halevy
64140ccf05 cql3, storage_proxy: add support for TRUNCATE USING TIMEOUT
Extend the cql3 truncate statement to accept attributes,
similar to modification statements.

To achieve that we define cql3::statements::raw::truncate_statement
derived from raw::cf_statement, and implement its pure virtual
prepare() method to make a prepared truncate_statement.

The latter, statements::truncate_statement, is no longer derived
from raw::cf_statement, and just stores a schema_ptr to get to the
keyspace and column_family names.

`test_truncate_using_timeout` cql-pytest was added to test
the new USING TIMEOUT feature.

Fixes #11408

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 18:30:39 +03:00
Benny Halevy
27d3e48005 cql3: selectStatement: restrict to USING TIMEOUT in grammar
It is preferred to reject USING TLL / TIMESTAMP at the grammar
level rather than functionally validating the USING attributes.

test_using_timeout was adjusted respectively to expect the
`SyntaxException` error rather than `InvalidRequest`.

Note that cql3::statements::raw::select_statement validate_attrs
now asserts that the ttl or the timestamp attributes aren't set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 18:30:39 +03:00
Benny Halevy
0728d33d5f cql3: deleteStatement: restrict to USING TIMEOUT|TIMESTAMP in grammar
It is preferred to reject USING TLL / TIMESTAMP at the grammar
level rather than functionally validating the USING attributes.

test_using_timeout was adjusted respectively to expect the
`SyntaxException` error rather than `InvalidRequest`.

Note that now delete_statement ctor asserts that the ttl
attribute is not set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 18:30:39 +03:00
Kamil Braun
696bdb2de7 test.py: replace topology with cluster_size in Topology tests
First, a reminder of a few basic concepts in Scylla:
- "topology" is a mapping: for each node, its DC and Rack.
- "replication strategy" is a method of calculating replica sets in
  a cluster. It is not a cluster-global property; each keyspace can have
  a different replication strategy. A cluster may have multiple
  keyspaces.
- "cluster size" is the number of nodes in a cluster.

Replication strategy is orthogonal to topology. Cluster size can be
derived from topology and is also orthogonal to replication strategy.

test.py was confusing the three concepts together. For some reason,
Topology suites were specifying a "topology" parameter which contained
replication strategy details - having nothing to do with topology. Also
it's unclear why a test suite would specify anything to do with
replication strategies - after all, a test may create keyspaces with
different replication strategies, and a suite may contain multiple
different tests.

Get rid of the "topology" parameter, replace it with a simple
"cluster_size". In the future we may re-introduce it when we actually
implement the possibility to start clusters with custom topologies
(which involves configuring the snitch etc.) Simplify the test.py code.
2022-09-26 15:17:50 +02:00
Kamil Braun
0725ab3a3e test.py: rename topology_for_class to get_cluster_factory
The previous name had nothing to do with what the function calculated
and returned (it returned a `create_cluster` function; the standard name
for a function that constructs objects would be 'factory', so
`get_cluster_factory` is an appropriate name for a function that returns
cluster factories).
2022-09-26 11:45:44 +02:00
Kamil Braun
06cc4f9259 test/pylib: ScyllaCluster: fix create_server parameter type
The only usage of `ScyllaCluster` constructor passed a `create_server`
function which expected a `List[str]` for the second parameter, while
the constructor specified that the function should expect an
`Optional[List[str]]`. There was no reason for the latter, we can easily
fix this type error.

Also give a type hint for `create_cluster` function in
`PythonTestSuite.topology_for_class`. This is actually what catched the
type error.
2022-09-26 11:45:44 +02:00
Petr Gusev
27e60ecbf4 raft server, log size limit in bytes
Before this patch we could get an OOM if we
received several big commands. The number of
commands was small, but their total size
in bytes was large.

snapshot_trailing_size is needed to guarantee
progress. Without this limit the fsm could
get stuck if the size of the next item is
greater than max_log_size - (size of trailing entries).
2022-09-26 13:10:10 +04:00
Nadav Har'El
868a884b79 test/cql-pytest: add reproducer for ignored IS NOT NULL
This test reproduces issue #10365: It shows that although "IS NOT NULL" is
not allowed in regular SELECT filters, in a materialized view it is allowed,
even for non-key columns - but then outright ignored and does not actually
filter out anything - a fact which already surprised several users.

The test also fails on Cassandra - it also wrongly allows IS NOT NULL
on the non-key columns but then ignores this in the filter. So the test
is marked with both xfail (known to fail on Scylla) and cassandra_bug
(fails on Cassandra because of what we consider to be a Cassandra bug).

Refs #10365
Refs #11606

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11615
2022-09-26 09:02:08 +03:00
Anna Stuchlik
c5285bcb14 doc: remove the section about updating OS packages during upgrade from upgrade guides for Ubunut and Debian (from 4.5 to 4.6)
Closes #11629
2022-09-26 08:04:02 +03:00
Avi Kivity
ad2f1dc704 Merge 'Avoid default initialization of token_metadata and topology when not needed' from Pavel Emelyanov
The goal is not to default initialize an object when its fields are about to be immediately overwritten by the consecutive code.

Closes #11619

* github.com:scylladb/scylladb:
  replication_strategy: Construct temp tokens in place
  topology: Define copy-sonctructor with init-lists
2022-09-25 18:08:42 +03:00
Jan Ciolek
ac152af88c expression: Add for_each_boolean factor
boolean_factors is a function that takes an expression
and extracts all children of the top level conjunction.

The problem is that it returns a vector<expression>,
which is inefficent.

Sometimes we would like to iterate over all boolean
factors without allocations. for_each_boolean_factor
is implemented for this purpose.

boolean_factors() can be implemented using
for_each_boolean_factor, so it's done to
reduce code duplication.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-09-25 16:34:22 +03:00
Avi Kivity
2a538f5543 Merge 'Cut one of snitch->gossiper links' from Pavel Emelyanov
Snitch uses gossiper for several reasons, one of is to re-gossip the topology-related app states when property-file snitch config changes. This set cuts this link by moving re-gossiping into the existing storage_service::snitch_reconfigured() subscription. Since initial snitch state gossiping happens in storage service as well, this change is not unreasonable.

Closes #11630

* github.com:scylladb/scylladb:
  storage_service: Re-gossiping snitch data in reconfiguration callback
  storage_service: Coroutinize snitch_reconfigured()
  storage_service: Indentation fix after previous patch
  storage_service: Reshard to shard-0 earlier
  storage_service: Refactor snitch reconfigured kick
2022-09-25 16:08:48 +03:00
Benny Halevy
a1adbf1f59 docs: adjust to sstable base name
Since 244df07771 (scylla 5.1),
only the sstable basename is kept in the large_* system tables.
The base path can be determined from the keyspace and
table name.

Fixes #11621

Adjust the examples in documentation respectively.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-25 14:38:13 +03:00
Benny Halevy
33924201cc docs: large-partition-table: adjust for additional rows column
Since a7511cf600 (scylla 5.0),
sstables containing partitions with too many rows are recorded in system.large_partitions.

Adjust the doc respectively.

Fixes #11622

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-25 14:38:13 +03:00
Benny Halevy
92ff17c6e3 docs: debugging-large-partition: update log warning example
The log warning format has changed since f3089bf3d1
and was fixed in the previous patch to include
a delimiter between the partition key, clustering key, and
column name.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-25 14:38:13 +03:00
Benny Halevy
fcbbc3eb9c db/large_data_handler: print static cell/collection description in log warning
When warning about a large cell/collection in a static row,
print that fact in the log warning to make it clearer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-25 14:37:42 +03:00
Benny Halevy
4670829502 db/large_data_handler: separate pk and ck strings in log warning with delimiter
Currently (since f3089bf3d1),
when printing a warning to the log about large rows and/or cells
the clustering key string is concatenated to the partition key string,
rendering the warning confsing and much less useful.

This patch adds a '/' delimiter to separate the fields,
and also uses one to separate the clustering key from the column name
for large cells.  In case of a static cell, the clustering key is null
hence the warning will look like: `pk//column`.

This patch does NOT change anything in the large_* system
table schema or contents.  It changes only the log warning format
that need not be backward compatible.

Fixes #11620

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-25 14:36:41 +03:00
Pavel Emelyanov
47958a4b37 storage_service: Re-gossiping snitch data in reconfiguration callback
Nowadays it's done inside snitch, and snitch needs to carry gossiper
refernece for that. There's an ongoing effort in de-globalizing snitch
and fixing its dependencies. This patch cuts this snitch->gossiper link
to facilitate the mentioned effort.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-23 14:31:55 +03:00
Pavel Emelyanov
932566d448 storage_service: Coroutinize snitch_reconfigured()
Next patch will add more sleeping code to it and it's simpler if the new
call is co_await-ed rather than .then()-ed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-23 14:31:55 +03:00
Pavel Emelyanov
7fee98cad0 storage_service: Indentation fix after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-23 14:31:55 +03:00
Pavel Emelyanov
3d4ea2c628 storage_service: Reshard to shard-0 earlier
It makes next patch shorter and nicer

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-23 14:31:55 +03:00
Pavel Emelyanov
11b79f9f80 storage_service: Refactor snitch reconfigured kick
The snitch_reconfigured calls update_topology with local node bcast
address argument. Things get simpler if the callee gets the address
itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-23 14:31:43 +03:00
Botond Dénes
b9d55ee02f Merge 'Add cassandra functional - show warn/err when tombstone threshold reached.' from Taras Borodin
Add cassandra functional - show warn/err when tombstone_warn_threshold/tombstone_failure_threshold reached on select, by partitions. Propagate raw query_string from coordinator to replicas.

Closes #11356

* github.com:scylladb/scylladb:
  add utf8:validate to operator<< partition_key with_schema.
  Show warn message if `tombstone_warn_threshold` reached on querier.
2022-09-23 05:53:47 +03:00
Pavel Emelyanov
9e7407ff91 replication_strategy: Construct temp tokens in place
Otherwise, the token_metadata object is default-initialized, then it's
move-assigned from another object.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-22 19:19:32 +03:00
Pavel Emelyanov
d540af2cb0 topology: Define copy-sonctructor with init-lists
Otherwise the topology is default-constructed, then its fields
are copy-assigned with the data from the copy-from reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-22 19:18:58 +03:00
Tomasz Grabiec
ccbfe2ef0d Merge 'Fix invalid mutation fragment stream issues' from Botond Dénes
Found by a fragment stream validator added to the mutation-compactor (https://github.com/scylladb/scylladb/pull/11532). As that PR moves very slowly, the fixes for the issues found are split out into a PR of their own.
The first two of these issues seems benign, but it is important to remember that how benign an invalid fragment stream is depends entirely on the consumer of said stream. The present consumer of said streams may swallow the invalid stream without problem now but any future change may cause it to enter into a corrupt state.
The last one is a non-benign problem (again because the consumer reacts badly already) causing problems when building query results for range scans.

Closes #11604

* github.com:scylladb/scylladb:
  shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied
  readers/mutation_readers: compacting_reader: remember injected partition-end
  db/view: view_builder::execute(): only inject partition-start if needed
2022-09-22 17:57:27 +02:00
Taras Borodin
c155ae1182 add utf8:validate to operator<< partition_key with_schema. 2022-09-22 16:42:31 +03:00
TarasBor
1f4a93da78 Show warn message if tombstone_warn_threshold reached on querier.
When querier read page with tombstones more than `tombstone_warn_threshold` limit - warning message appeared in logs.
If `tombstone_warn_threshold:0` feature disabled.

Refs scylladb#11410
2022-09-22 16:42:31 +03:00
Asias He
9ed401c4b2 streaming: Add finished percentage metrics for node ops using streaming
We have added the finished percentage for repair based node operations.

This patch adds the finished percentage for node ops using the old
streaming.

Example output:

scylla_streaming_finished_percentage{ops="bootstrap",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="decommission",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="rebuild",shard="0"} 0.561945
scylla_streaming_finished_percentage{ops="removenode",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="repair",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="replace",shard="0"} 1.000000

In addition to the metrics, log shows the percentage is added.

[shard 0] range_streamer - Finished 2698 out of 2817 ranges for rebuild, finished percentage=0.95775646

Fixes #11600

Closes #11601
2022-09-22 14:19:34 +03:00
Botond Dénes
0ccb23d02b shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied
Commit 8ab57aa added a yield to the buffer-copy loop, which means that
the copy can yield before done and the multishard reader might see the
half-copied buffer and consider the reader done (because
`_end_of_stream` is already set) resulting in the dropping the remaining
part of the buffer and in an invalid stream if the last copied fragment
wasn't a partition-end.

Fixes: #11561
2022-09-22 13:54:36 +03:00
Botond Dénes
16a0025dc3 readers/mutation_readers: compacting_reader: remember injected partition-end
Currently injecting a partition-end doesn't update
`_last_uncompacted_kind`, which will allow for a subsequent
`next_partition()` call to trigger injecting a partition-end, leading to
an invalid mutation fragment stream (partition-end after partition-end).
Fix by changing `_last_uncompacted_kind` to `partition_end` when
injecting a partition-end, making subsequent injection attempts noop.

Fixes: #11608
2022-09-22 13:54:36 +03:00
Botond Dénes
681e6ae77f db/view: view_builder::execute(): only inject partition-start if needed
When resuming a build-step, the view builder injects the partition-start
fragment of the last processed partition, to bring the consumer
(compactor) into the correct state before it starts to consume the
remainder of the partition content. This results in an invalid fragment
stream when the partition was actually over or there is nothing left for
the build step. Make the inject conditional on when the reader contains
more data for the partition.

Fixes: #11607
2022-09-22 13:54:36 +03:00
Nadav Har'El
517c1529aa docs: update docs/alternator/getting-started.md
Update several aspects of the alternator/getting-started.md which were
not up-to-date:

* When the documented was written, Alternator was moving quickly so we
  recommended running a nightly version. This is no longer the case, so
  we should recommend running the latest stable build.

* The link to the download link is no longer helpful for getting Docker
  instructions (it shows some generic download options). Instead point to
  our dockerhub page.

* Replace mentions of "Scylla" by the new official name, "ScyllaDB".

* Miscelleneous copy-edits.

Fixes #11218

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11605
2022-09-22 11:08:05 +03:00
Piotr Sarna
481240b8b4 Merge 'Alternator: Run more TTL tests by default (and add a test for metrics)' from Nadav Har'El
We had quite a few tests for Alternator TTL in test/alternator, but most
of them did not run as part of the usual Jenkins test suite, because
they were considered "very slow" (and require a special "--runveryslow"
flag to run).

In this series we enable six tests which run quickly enough to run by
default, without an additional flag. We also make them even quicker -
the six tests now take around 2.5 seconds.

I also noticed that we don't have a test for the Alternator TTL metrics
- and added one.

Fixes #11374.
Refs https://github.com/scylladb/scylla-monitoring/issues/1783

Closes #11384

* github.com:scylladb/scylladb:
  test/alternator: insert test names into Scylla logs
  rest api: add a new /system/log operation
  alternator ttl: log warning if scan took too long.
  alternator,ttl: allow sub-second TTL scanning period, for tests
  test/alternator: skip fewer Alternator TTL tests
  test/alternator: test Alternator TTL metrics
2022-09-22 09:47:50 +02:00
Botond Dénes
ef7471c460 readers/mutation_reader: stream validator: fix log level detection logic
The mutation fragment stream validator filter has a detailed debug log
in its constructor. To avoid putting together this message when the log
level is above debug, it is enclosed in an if, activated when log level
is debug or trace... at least that was intended. Actually the if is
activated when the log level is debug or above (info, warn or error) but
is only actually logged if the log level is exactly debug. Fix the logic
to work as intended.

Closes #11603
2022-09-22 09:41:45 +03:00