The existing loop is very branchy in its attempts to find out whether or
not to abort. The "allowed_retries" count can be a good indicator of the
decision taken. This makes the code notably shorter and easier to extend
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When walking through the ranges, we should yield to prevent stalls. We
do similar yield in other node operations.
Fix a stall in 5.1.dev.20220724.f46b207472a3 with build-id
d947aaccafa94647f71c1c79326eb88840c5b6d2
```
!INFO | scylla[6551]: Reactor stalled for 10 ms on shard 0. Backtrace:
0x4bbb9d2 0x4bba630 0x4bbb8e0 0x7fd365262a1f 0x2face49 0x2f5caff
0x36ca29f 0x36c89c3 0x4e3a0e1
````
Fixes#11146Closes#11160
Extend the cql3 truncate statement to accept attributes,
similar to modification statements.
To achieve that we define cql3::statements::raw::truncate_statement
derived from raw::cf_statement, and implement its pure virtual
prepare() method to make a prepared truncate_statement.
The latter is no longer derived from raw::cf_statement,
and just stores a schema_ptr to get to the keyspace and column_family.
`test_truncate_using_timeout` cql-pytest was added to test
the new USING TIMEOUT feature.
Fixes#11408
Also, update docs/cql/ddl.rst truncate-statement section respectively.
Closes#11409
* github.com:scylladb/scylladb:
docs: cql-extensions: add TRUNCATE to USING TIMEOUT section.
docs: cql: ddl: add support for TRUNCATE USING TIMEOUT
cql3, storage_proxy: add support for TRUNCATE USING TIMEOUT
cql3: selectStatement: restrict to USING TIMEOUT in grammar
cql3: deleteStatement: restrict to USING TIMEOUT|TIMESTAMP in grammar
The series contains fixes for system.large_* log warning and respective documentation.
This prepares the way for adding a new system.large_collections table (See #11449):
Fixes#11620Fixes#11621Fixes#11622
the respective fixes should be backported to different release branches, based on the respective patches they depend on (mentioned in each issue).
Closes#11623
* github.com:scylladb/scylladb:
docs: adjust to sstable base name
docs: large-partition-table: adjust for additional rows column
docs: debugging-large-partition: update log warning example
db/large_data_handler: print static cell/collection description in log warning
db/large_data_handler: separate pk and ck strings in log warning with delimiter
Fix the type of `create_server`, rename `topology_for_class` to `get_cluster_factory`, simplify the suite definitions and parameters passed to `get_cluster_factory`
Closes#11590
* github.com:scylladb/scylladb:
test.py: replace `topology` with `cluster_size` in Topology tests
test.py: rename `topology_for_class` to `get_cluster_factory`
test/pylib: ScyllaCluster: fix create_server parameter type
The test was disabled due to a bug in the Python driver which caused the
driver not to reconnect after a node was restarted (see
scylladb/python-driver#170).
Introduce a workaround for that bug: we simply create a new driver
session after restarting the nodes. Reenable the test.
Closes#11641
Extended the queries language to support bind variables which are bound in the
execution stage, before creating a raft command.
Adjusted `test_broadcast_tables.py` to prepare statements at the beginning of the test.
Fixed a small bug in `strongly_consistent_modification_statement::check_access`.
Closes#11525
Before this patch we could get an OOM if we
received several big commands. The number of
commands was small, but their total size
in bytes was large.
snapshot_trailing_size is needed to guarantee
progress. Without this limit the fsm could
get stuck if the size of the next item is greater than
max_log_size - (size of trailing entries).
Closes#11397
* github.com:scylladb/scylladb:
raft replication_test, make backpressure test to do actual backpressure
raft server, shrink_to_fit on log truncation
raft server, release memory if add_entry throws
raft server, log size limit in bytes
When there are errors starting the first cluster(s) the logs of the server logs are needed. So move `.start()` to the `try` block in `test.py` (out of `asynccontextmanager`).
While there, make `ScyllaClusterManager.start()` idempotent.
Closes#11594
* github.com:scylladb/scylladb:
test.py: fix ScyllaClusterManager start/stop
test.py: fix topology init error handling
We don't want to keep memory we don't use, shrink_to_fit guarantees that.
In fact, boost::deque frees up memory when items are deleted, so this change has little effect at the moment, but it may pay off if we change the container in the future.
List the queries that support the TIMEOUT parameter.
Mention the newly added support for TRUNCATE
USING TIMEOUT.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Extend the cql3 truncate statement to accept attributes,
similar to modification statements.
To achieve that we define cql3::statements::raw::truncate_statement
derived from raw::cf_statement, and implement its pure virtual
prepare() method to make a prepared truncate_statement.
The latter, statements::truncate_statement, is no longer derived
from raw::cf_statement, and just stores a schema_ptr to get to the
keyspace and column_family names.
`test_truncate_using_timeout` cql-pytest was added to test
the new USING TIMEOUT feature.
Fixes#11408
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It is preferred to reject USING TLL / TIMESTAMP at the grammar
level rather than functionally validating the USING attributes.
test_using_timeout was adjusted respectively to expect the
`SyntaxException` error rather than `InvalidRequest`.
Note that cql3::statements::raw::select_statement validate_attrs
now asserts that the ttl or the timestamp attributes aren't set.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It is preferred to reject USING TLL / TIMESTAMP at the grammar
level rather than functionally validating the USING attributes.
test_using_timeout was adjusted respectively to expect the
`SyntaxException` error rather than `InvalidRequest`.
Note that now delete_statement ctor asserts that the ttl
attribute is not set.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
First, a reminder of a few basic concepts in Scylla:
- "topology" is a mapping: for each node, its DC and Rack.
- "replication strategy" is a method of calculating replica sets in
a cluster. It is not a cluster-global property; each keyspace can have
a different replication strategy. A cluster may have multiple
keyspaces.
- "cluster size" is the number of nodes in a cluster.
Replication strategy is orthogonal to topology. Cluster size can be
derived from topology and is also orthogonal to replication strategy.
test.py was confusing the three concepts together. For some reason,
Topology suites were specifying a "topology" parameter which contained
replication strategy details - having nothing to do with topology. Also
it's unclear why a test suite would specify anything to do with
replication strategies - after all, a test may create keyspaces with
different replication strategies, and a suite may contain multiple
different tests.
Get rid of the "topology" parameter, replace it with a simple
"cluster_size". In the future we may re-introduce it when we actually
implement the possibility to start clusters with custom topologies
(which involves configuring the snitch etc.) Simplify the test.py code.
The previous name had nothing to do with what the function calculated
and returned (it returned a `create_cluster` function; the standard name
for a function that constructs objects would be 'factory', so
`get_cluster_factory` is an appropriate name for a function that returns
cluster factories).
The only usage of `ScyllaCluster` constructor passed a `create_server`
function which expected a `List[str]` for the second parameter, while
the constructor specified that the function should expect an
`Optional[List[str]]`. There was no reason for the latter, we can easily
fix this type error.
Also give a type hint for `create_cluster` function in
`PythonTestSuite.topology_for_class`. This is actually what catched the
type error.
Before this patch we could get an OOM if we
received several big commands. The number of
commands was small, but their total size
in bytes was large.
snapshot_trailing_size is needed to guarantee
progress. Without this limit the fsm could
get stuck if the size of the next item is
greater than max_log_size - (size of trailing entries).
This test reproduces issue #10365: It shows that although "IS NOT NULL" is
not allowed in regular SELECT filters, in a materialized view it is allowed,
even for non-key columns - but then outright ignored and does not actually
filter out anything - a fact which already surprised several users.
The test also fails on Cassandra - it also wrongly allows IS NOT NULL
on the non-key columns but then ignores this in the filter. So the test
is marked with both xfail (known to fail on Scylla) and cassandra_bug
(fails on Cassandra because of what we consider to be a Cassandra bug).
Refs #10365
Refs #11606
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11615
The goal is not to default initialize an object when its fields are about to be immediately overwritten by the consecutive code.
Closes#11619
* github.com:scylladb/scylladb:
replication_strategy: Construct temp tokens in place
topology: Define copy-sonctructor with init-lists
boolean_factors is a function that takes an expression
and extracts all children of the top level conjunction.
The problem is that it returns a vector<expression>,
which is inefficent.
Sometimes we would like to iterate over all boolean
factors without allocations. for_each_boolean_factor
is implemented for this purpose.
boolean_factors() can be implemented using
for_each_boolean_factor, so it's done to
reduce code duplication.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Snitch uses gossiper for several reasons, one of is to re-gossip the topology-related app states when property-file snitch config changes. This set cuts this link by moving re-gossiping into the existing storage_service::snitch_reconfigured() subscription. Since initial snitch state gossiping happens in storage service as well, this change is not unreasonable.
Closes#11630
* github.com:scylladb/scylladb:
storage_service: Re-gossiping snitch data in reconfiguration callback
storage_service: Coroutinize snitch_reconfigured()
storage_service: Indentation fix after previous patch
storage_service: Reshard to shard-0 earlier
storage_service: Refactor snitch reconfigured kick
Since 244df07771 (scylla 5.1),
only the sstable basename is kept in the large_* system tables.
The base path can be determined from the keyspace and
table name.
Fixes#11621
Adjust the examples in documentation respectively.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Since a7511cf600 (scylla 5.0),
sstables containing partitions with too many rows are recorded in system.large_partitions.
Adjust the doc respectively.
Fixes#11622
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The log warning format has changed since f3089bf3d1
and was fixed in the previous patch to include
a delimiter between the partition key, clustering key, and
column name.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When warning about a large cell/collection in a static row,
print that fact in the log warning to make it clearer.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently (since f3089bf3d1),
when printing a warning to the log about large rows and/or cells
the clustering key string is concatenated to the partition key string,
rendering the warning confsing and much less useful.
This patch adds a '/' delimiter to separate the fields,
and also uses one to separate the clustering key from the column name
for large cells. In case of a static cell, the clustering key is null
hence the warning will look like: `pk//column`.
This patch does NOT change anything in the large_* system
table schema or contents. It changes only the log warning format
that need not be backward compatible.
Fixes#11620
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Nowadays it's done inside snitch, and snitch needs to carry gossiper
refernece for that. There's an ongoing effort in de-globalizing snitch
and fixing its dependencies. This patch cuts this snitch->gossiper link
to facilitate the mentioned effort.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patch will add more sleeping code to it and it's simpler if the new
call is co_await-ed rather than .then()-ed
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The snitch_reconfigured calls update_topology with local node bcast
address argument. Things get simpler if the callee gets the address
itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add cassandra functional - show warn/err when tombstone_warn_threshold/tombstone_failure_threshold reached on select, by partitions. Propagate raw query_string from coordinator to replicas.
Closes#11356
* github.com:scylladb/scylladb:
add utf8:validate to operator<< partition_key with_schema.
Show warn message if `tombstone_warn_threshold` reached on querier.
Otherwise, the token_metadata object is default-initialized, then it's
move-assigned from another object.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Otherwise the topology is default-constructed, then its fields
are copy-assigned with the data from the copy-from reference.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Found by a fragment stream validator added to the mutation-compactor (https://github.com/scylladb/scylladb/pull/11532). As that PR moves very slowly, the fixes for the issues found are split out into a PR of their own.
The first two of these issues seems benign, but it is important to remember that how benign an invalid fragment stream is depends entirely on the consumer of said stream. The present consumer of said streams may swallow the invalid stream without problem now but any future change may cause it to enter into a corrupt state.
The last one is a non-benign problem (again because the consumer reacts badly already) causing problems when building query results for range scans.
Closes#11604
* github.com:scylladb/scylladb:
shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied
readers/mutation_readers: compacting_reader: remember injected partition-end
db/view: view_builder::execute(): only inject partition-start if needed
When querier read page with tombstones more than `tombstone_warn_threshold` limit - warning message appeared in logs.
If `tombstone_warn_threshold:0` feature disabled.
Refs scylladb#11410
We have added the finished percentage for repair based node operations.
This patch adds the finished percentage for node ops using the old
streaming.
Example output:
scylla_streaming_finished_percentage{ops="bootstrap",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="decommission",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="rebuild",shard="0"} 0.561945
scylla_streaming_finished_percentage{ops="removenode",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="repair",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="replace",shard="0"} 1.000000
In addition to the metrics, log shows the percentage is added.
[shard 0] range_streamer - Finished 2698 out of 2817 ranges for rebuild, finished percentage=0.95775646
Fixes#11600Closes#11601
Commit 8ab57aa added a yield to the buffer-copy loop, which means that
the copy can yield before done and the multishard reader might see the
half-copied buffer and consider the reader done (because
`_end_of_stream` is already set) resulting in the dropping the remaining
part of the buffer and in an invalid stream if the last copied fragment
wasn't a partition-end.
Fixes: #11561
Currently injecting a partition-end doesn't update
`_last_uncompacted_kind`, which will allow for a subsequent
`next_partition()` call to trigger injecting a partition-end, leading to
an invalid mutation fragment stream (partition-end after partition-end).
Fix by changing `_last_uncompacted_kind` to `partition_end` when
injecting a partition-end, making subsequent injection attempts noop.
Fixes: #11608
When resuming a build-step, the view builder injects the partition-start
fragment of the last processed partition, to bring the consumer
(compactor) into the correct state before it starts to consume the
remainder of the partition content. This results in an invalid fragment
stream when the partition was actually over or there is nothing left for
the build step. Make the inject conditional on when the reader contains
more data for the partition.
Fixes: #11607
Update several aspects of the alternator/getting-started.md which were
not up-to-date:
* When the documented was written, Alternator was moving quickly so we
recommended running a nightly version. This is no longer the case, so
we should recommend running the latest stable build.
* The link to the download link is no longer helpful for getting Docker
instructions (it shows some generic download options). Instead point to
our dockerhub page.
* Replace mentions of "Scylla" by the new official name, "ScyllaDB".
* Miscelleneous copy-edits.
Fixes#11218
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11605
We had quite a few tests for Alternator TTL in test/alternator, but most
of them did not run as part of the usual Jenkins test suite, because
they were considered "very slow" (and require a special "--runveryslow"
flag to run).
In this series we enable six tests which run quickly enough to run by
default, without an additional flag. We also make them even quicker -
the six tests now take around 2.5 seconds.
I also noticed that we don't have a test for the Alternator TTL metrics
- and added one.
Fixes#11374.
Refs https://github.com/scylladb/scylla-monitoring/issues/1783Closes#11384
* github.com:scylladb/scylladb:
test/alternator: insert test names into Scylla logs
rest api: add a new /system/log operation
alternator ttl: log warning if scan took too long.
alternator,ttl: allow sub-second TTL scanning period, for tests
test/alternator: skip fewer Alternator TTL tests
test/alternator: test Alternator TTL metrics
The mutation fragment stream validator filter has a detailed debug log
in its constructor. To avoid putting together this message when the log
level is above debug, it is enclosed in an if, activated when log level
is debug or trace... at least that was intended. Actually the if is
activated when the log level is debug or above (info, warn or error) but
is only actually logged if the log level is exactly debug. Fix the logic
to work as intended.
Closes#11603