Before changing its type to streaming::plan_id
this patch clarifies that the parameter actually represents
the plan id and not the table id as its name suggests.
For reference, see the call to update_progress in
`stream_transfer_task::execute`, as well as the function
using _stream_bytes which map key is the plan id.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
from Tomasz Grabiec
This series fixes lack of mutation associativity which manifests as
sporadic failures in
row_cache_test.cc::test_concurrent_reads_and_eviction due to differences
in mutations applied and read.
No known production impact.
Refs https://github.com/scylladb/scylladb/issues/11307Closes#11312
* github.com:scylladb/scylladb:
test: mutation_test: Add explicit test for mutation commutativity
test: random_mutation_generator: Workaround for non-associativity of mutations with shadowable tombstones
db: mutation_partition: Drop unnecessary maybe_shadow()
db: mutation_partition: Maintain shadowable tombstone invariant when applying a hard tombstone
mutation_partition: row: make row marker shadowing symmetric
- Remove `ScyllaCluster.__getitem__()` (pending request by @kbr- in a previous pull request), for this remove all direct access to servers from caller code
- Increase Python driver timeouts (req by @nyh)
- Improve `ManagerClient` API requests: use `http+unix://<sockname>/<resource>` instead of `http://localhost/<resource>` and callers of the helper method only pass the resource
- Improve lint and type hints
Closes#11305
* github.com:scylladb/scylladb:
test.py: remove ScyllaCluster.__getitem__()
test.py: ScyllaCluster check kesypace with any server
test.py: ScyllaCluster server error log method
test.py: ScyllaCluster read_server_log()
test.py: save log point for all running servers
test.py: ScyllaCluster provide endpoint
test.py: build host param after before_test
test.py: manager client disable lint warnings
test.py: scylla cluster lint and type hint fixes
test.py: increase more timeouts
test.py: ManagerClient improve API HTTP requests
Dtest fails if it sees an unknown errors in the logs. This series
reduces severity of some errors (since they are actually expected during
shutdown) and removes some others that duplicate already existing errors
that dtest knows how to deal with. Also fix one case of unhandled
exception in schema management code.
* 'dtest-fixes-v1' of github.com:gleb-cloudius/scylla:
raft: getting abort_requested_exception exception from a sm::apply is not a critical error
schema_registry: fix abandoned feature warning
service: raft: silence rpc::closed_errors in raft_rpc
Given 3 row mutations:
m1 = {
marker: {row_marker: dead timestamp=-9223372036854775803},
tombstone: {row_tombstone: {shadowable tombstone: timestamp=-9223372036854775807, deletion_time=0}, {tombstone: none}}
}
m2 = {
marker: {row_marker: timestamp=-9223372036854775805}
}
m3 = {
tombstone: {row_tombstone: {shadowable tombstone: timestamp=-9223372036854775806, deletion_time=2}, {tombstone: none}}
}
We get different shadowable tombstones depending on the order of merging:
(m1 + m2) + m3 = {
marker: {row_marker: dead timestamp=-9223372036854775803},
tombstone: {row_tombstone: {shadowable tombstone: timestamp=-9223372036854775806, deletion_time=2}, {tombstone: none}}
m1 + (m2 + m3) = {
marker: {row_marker: dead timestamp=-9223372036854775803},
tombstone: {row_tombstone: {shadowable tombstone: timestamp=-9223372036854775807, deletion_time=0}, {tombstone: none}}
}
The reason is that in the second case the shadowable tombstone in m3
is shadwed by the row marker in m2. In the first case, the marker in
m2 is cancelled by the dead marker in m1, so shadowable tombstone in
m3 is not cancelled (the marker in m1 does not cancel because it's
dead).
This wouldn't happen if the dead marker in m1 was accompanied by a
hard tombstone of the same timestamp, which would effectively make the
difference in shadowable tombstones irrelevant.
Found by row_cache_test.cc::test_concurrent_reads_and_eviction.
I'm not sure if this situation can be reached in practice (dead marker
in mv table but no row tombstone).
Work it around for tests by producing a row tombstone if there is a
dead marker.
Refs #11307
When the row has a live row marker which shadows the shadowable
tombstone, the shadowable tombstone should not be effective. The code
assumes that _shadowable always reflects the current tombstone, so
maybe_shadow() needs to be called whenever marker or regular tombstone
changes. This was not ensured by row::apply(tombstone).
This causes problems in tests which use random_mutation_generator,
which generates mutations which would violate this invariant, and as a
result, mutation commutativity would be violated.
I am not aware of problems in production code.
Currently row marker shadowing the shadowable tombstone is only checked
in `apply(row_marker)`. This means that shadowing will only be checked
if the shadowable tombstone and row marker are set in the correct order.
This at the very least can cause flakyness in tests when a mutation
produced just the right way has a shadowable tombstone that can be
eliminated when the mutation is reconstructed in a different way,
leading to artificial differences when comparing those mutations.
This patch fixes this by checking shadowing in
`apply(shadowable_tombstone)` too, making the shadowing check symmetric.
There is still one vulnerability left: `row_marker& row_marker()`, which
allow overwriting the marker without triggering the corresponding
checks. We cannot remove this overload as it is used by compaction so we
just add a comment to it warning that `maybe_shadow()` has to be manually
invoked if it is used to mutate the marker (compaction takes care of
that). A caller which didn't do the manual check is
mutation_source_test: this patch updates it to use `apply(row_marker)`
instead.
Fixes: #9483
Tests: unit(dev)
Closes#9519
Provide server error logs to caller (test.py).
Avoids direct access to list of servers.
To be done later: pick the failed server. For now it just provides the
log of one server.
While there, fix type hints.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Instead of accessing the first server, now test.py asks ScyllaCluster
for the server log.
In a later commit, ScyllaCluster will pick the appropriate server.
Also removes another direct access to the list of servers we want to get
rid of.
For error reporting, before a test a mark of the log point in time is
saved. Previously, only the log of the first server was saved. Now it's
done for all running servers.
While there, remove direct access to servers on test.py.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
If no server started, there is no server in the cluster list. So only
build the pytest --host param after before_test check is done.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Increase Python driver connection timeouts to deal with extreme cases
for slow debug builds in slow machines as done (and explained) in
95bd02246a.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Use the AF Unix socket name as host name instead of localhost and avoid
repeating the full URL for callers of _request() for the Manager API
requests from the client.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
In commit 7eda6b1e90, we increased the
request_timeout parameter used by cql-pytest tests from the default of
10 seconds to 120 seconds. 10 seconds was usually more than enough for
finishing any Scylla request, but it turned out that in some extreme
cases of a debug build running on an extremely over-committed machine,
the default timeout was not enough.
Recently, in issue #11289 we saw additional cases of timeouts which
the request_timeout setting did *not* solve. It turns out that the Python
CQL driver has two additional timeout settings - connect_timeout and
control_connection_timeout, which default to 5 seconds and 2 seconds
respectively. I believe that most of the timeouts in issue #11289
come from the control_connection_timeout setting - by changing it
to a tiny number (e.g., 0.0001) I got the same error messages as those
reported in #11289. The default of that timeout - 2 seconds - is
certainly low enough to be reached on an extremely over-committed
machine.
So this patch significantly increases both connect_timeout and
control_connection_timeout to 60 seconds. We don't care that this timeout
is ridiculously large - under normal operations it will never be reached.
There is no code which loops for this amount of time, for example.
Refs #11289 (perhaps even Fixes, we'll need to see that the test errors
go away).
NOTE: This patch only changes test/cql-pytest/util.py, which is only
used by the cql-pytest test suite. We have multiple other test suites which
copied this code, and those test suites might need fixing separately.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11295
Right now, if there's a node for which we don't know the features
supported by this node (they are neither persisted locally, nor gossiped
by that node), we would skip this node in calculating the set
of enabled features and potentially enable a feature which shouldn't be
enabled - because that node may not know it. We should only enable a
feature when we know that all nodes have upgraded and know the feature.
This bug caused us problems when we tried to move RAFT out of
experimental. There are dtests such as `partitioner_tests.py` in which
nodes would enable features prematurely, which caused the Raft upgrade
procedure to break (the procedure starts only when all nodes upgrade
and announce that they know the SUPPORTS_RAFT cluster feature).
Closes#11225
This pull request introduces global secondary-indexing for non-frozen collections.
The intent is to enable such queries:
```
CREATE TABLE test(int id, somemap map<int, int>, somelist<int>, someset<int>, PRIMARY KEY(id));
CREATE INDEX ON test(keys(somemap));
CREATE INDEX ON test(values(somemap));
CREATE INDEX ON test(entries(somemap));
CREATE INDEX ON test(values(somelist));
CREATE INDEX ON test(values(someset));
-- index on test(c) is the same as index on (values(c))
CREATE INDEX IF NOT EXISTS ON test(somelist);
CREATE INDEX IF NOT EXISTS ON test(someset);
CREATE INDEX IF NOT EXISTS ON test(somemap);
SELECT * FROM test WHERE someset CONTAINS 7;
SELECT * FROM test WHERE somelist CONTAINS 7;
SELECT * FROM test WHERE somemap CONTAINS KEY 7;
SELECT * FROM test WHERE somemap CONTAINS 7;
SELECT * FROM test WHERE somemap[7] = 7;
```
We use here all-familiar materialized views (MVs). Scylla treats all the
collections the same way - they're a list of pairs (key, value). In case
of sets, the value type is dummy one. In case of lists, the key type is
TIMEUUID. When describing the design, I will forget that there is more
than one collection type. Suppose that the columns in the base table
were as follows:
```
pkey int, ckey1 int, ckey2 int, somemap map<int, text>, PRIMARY KEY(pkey, ckey1, ckey2)
```
The MV schema is as follows (the names of columns which are not the same
as in base might be different). All the columns here form the primary
key.
```
-- for index over entries
indexed_coll (int, text), idx_token long, pkey int, ckey1 int, ckey2 int
-- for index over keys
indexed_coll int, idx_token long, pkey int, ckey1 int, ckey2 int
-- for index over values
indexed_coll text, idx_token long, pkey int, ckey1 int, ckey2 int, coll_keys_for_values_index int
```
The reason for the last additional column is that the values from a collection might not be unique.
Fixes#2962Fixes#8745Fixes#10707
This patch does not implement **local** secondary indexes for collection columns: Refs #10713.
Closes#10841
* github.com:scylladb/scylladb:
test/cql-pytest: un-xfail yet another passing collection-indexing test
secondary index: fix paging in map value indexing
test/cql-pytest: test for paging with collection values index
cql, view: rename and explain bytes_with_action
cql, index: make collection indexing a cluster feature
test/cql-pytest: failing tests for oversized key values in MV and SI
cql: fix secondary index "target" when column name has special characters
cql, index: improve error messages
cql, index: fix default index name for collection index
test/cql-pytest: un-xfail several collecting indexing tests
test/cql-pytest/test_secondary_index: verify that local index on collection fails.
docs/design-notes/secondary_index: add `VALUES` to index target list
test/cql-pytest/test_secondary_index: add randomized test for indexes on collections
cql-pytest/cassandra_tests/.../secondary_index_test: fix error message in test ported from Cassandra
cql-pytest/cassandra_tests/.../secondary_index_on_map_entries,select_test: test ported from Cassandra is expected to fail, since Scylla assumes that comparison with null doesn't throw error, just evaluates to false. Since it's not a bug, but expected behavior from the perspective of Scylla, we don't mark it as xfail.
test/boost/secondary_index_test: update for non-frozen indexes on collections
test/cql-pytest: Uncomment collection indexes tests that should be working now
cql, index: don't use IS NOT NULL on collection column
cql3/statements/select_statement: for index on values of collection, don't emit duplicate rows
cql/expr/expression, index/secondary_index_manager: needs_filtering and index_supports_expression rewrite to accomodate for indexes over collections
cql3, index: Use entries() indexes on collections for queries
cql3, index: Use keys() and values() indexes on collections for queries.
types/tuple: Use std::begin() instead of .begin() in tuple_type_impl::build_value_fragmented
cql3/statements/index_target: throw exception to signalize that we didn't miss returning from function
db/view/view.cc: compute view_updates for views over collections
view info: has_computed_column_depending_on_base_non_primary_key
column_computation: depends_on_non_primary_key_column
schema, index/secondary_index_manager: make schema for index-induced mv
index/secondary_index_manager: extract keys, values, entries types from collection
cql3/statements/: validate CREATE INDEX for index over a collection
cql3/statements/create_index_statement,index_target: rewrite index target for collection
column_computation.hh, schema.cc: collection_column_computation
column_computation.hh, schema.cc: compute_value interface refactor
Cql.g, treewide: support cql syntax `INDEX ON table(VALUES(collection))`
Commit 23acc2e848 broke the "--ssl" option of test/cql-pytest/run
(which makes Scylla - and cqlpytest - use SSL-encrypted CQL).
The problem was that there was a confusion between the "ssl" module
(Python's SSL support) and a new "ssl" variable. A rename and a missing
"import" solves the breakage.
We never noticed this because Jenkins does *not* run cql-pytest/run
with --ssl (actually, it no longer runs cql-pytest/run at all).
It is still a useful option for checking SSL-related problems in Scylla
and Seastar.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11292
Test schema changes when there was an underlying topology change.
- per test case checks of cluster health and cycling
- helper class to do cluster manager API requests
- tests can perform topology changes: stop/start/restart servers
- modified clusters are marked dirty and discarded after the test case
- cql connection is updated per topology change and per cluster change
Closes#11266
* github.com:scylladb/scylladb:
test.py: test topology and schema changes
test.py: ClusterManager API mark cluster dirty
test.py: call before/after_test for each test case
test.py: handle driver connection in ManagerClient
test.py: ClusterManager API and ManagerClient
test.py: improve topology docstring
Currently, when detaching the table from the database, we force-evict all queriers for said table. This series broadens the scope of this force-evict to include all inactive reads registered at the semaphore. This ensures that any regular inactive read "forgotten" for any reason in the semaphore, will not end up in said readers accessing a dangling table reference when destroyed later.
Fixes: https://github.com/scylladb/scylladb/issues/11264Closes#11273
* github.com:scylladb/scylladb:
querier: querier_cache: remove now unused evict_all_for_table()
database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table()
reader_concurrency_semaphore: add evict_inactive_reads_for_table()
It should have had one, derived instances are stored and destroyed via
the base-class. The only reason this haven't caused bugs yet is that
derived instances happen to not have any non-trivial members yet.
Closes#11293
A mixed bag of improvements developed as part of another PR (https://github.com/scylladb/scylladb/pull/10736). Said PR was closed so I'm submitting these improvements separately.
Closes#11294
* github.com:scylladb/scylladb:
test/lib: move convenience table config factory to sstable_test_env
test/lib/sstable_test_env: move members to impl struct
test/lib/sstable_utils: use test_env::do_with_async()
Instead of querier_cache::evict_all_for_table(). The new method cover
all queriers and in addition any other inactive reads registered on the
semaphore. In theory by the time we detach a table, no regular inactive
reads should be in the semaphore anymore, but if there is any still, we
better evict them before the table is destroyed, they might attempt to
access it in when destroyed later.
All users of `column_family_test_config()`, get the semaphore parameter
for it from `sstable_test_env`. It is clear that the latter serves as
the storage space for stable objects required by the table config. This
patch just enshrines this fact by moving the config factory method to
`sstable_test_env`, so it can just get what it needs from members.
All present members of sstable_test_env are std::unique_ptr<>:s because
they require stable addresses. This makes their handling somewhat
awkward. Move all of them into an internal `struct impl` and make that
member a unique ptr.
Fixes#11184Fixes#11237
In prev (broken) fix for https://github.com/scylladb/scylladb/issues/11184 we added the footprint for left-over
files (replay candidates) to disk footprint on commitlog init.
This effectively prevents us from creating segments iff we have tight limits. Since we nowadays do quite a bit of inserts _before_ commitlog replay (system.local, but...) we can end up in a situation where we deadlock start because we cannot get to the actual replay that will eventually free things.
Another, not thought through, consequence is that we add a single footprint to _all_ commitlog shard instances - even though only shard 0 will get to actually replay + delete (i.e. drop footprint).
So shards 1-X would all be either locked out or performance degraded.
Simplest fix is to add the footprint in delete call instead. This will lock out segment creation until delete call is done, but this is fast. Also ensures that only replay shard is involved.
To further emphasize this, don't store segments found on init scan in all shard instances,
instead retrieve (based on low time-pos for current gen) when required. This changes very little, but we at last don't store
pointless string lists in shards 1 to X, and also we can potentially ask for the list twice.
More to the point, goes better hand-in-hand with the semantics of "delete_segments", where any file sent in is
considered candidate for recycling, and included in footprint.
Closes#11251
* github.com:scylladb/scylladb:
commitlog: Make get_segments_to_replay on-demand
commitlog: Revert/modify fac2bc4 - do footprint add in delete
Fix https://github.com/scylladb/scylladb/issues/11197
This PR adds a new page where specifying workload attributes with service levels is described and adds it to the menu.
Also, I had to fix some links because of the warnings.
Closes#11209
* github.com:scylladb/scylladb:
doc: remove the reduntant space from index
doc: update the syntax for defining service level attributes
doc: rewording
doc: update the links to fix the warnings
doc: add the new page to the toctree
doc: add the descrption of specifying workload attributes with service levels
doc: add the definition of workloads to the glossary
In preparation for effective_replication_map hygiene, convert
some counter functions to coroutines to simplify the changes.
Closes#11291
* github.com:scylladb/scylladb:
storage_proxy: mutate_counters_on_leader: coroutinize
storage_proxy: mutate_counters: coroutinize
storage_proxy: mutate_counters: reorganize error handling
Simplify ahead of refactoring for consistent effective_replication_map.
This is probably a pessimization of the error case, but the error case
will be terrible in any case unless we resultify it.
Move the error handling function where it's used so the code
is more straightforward.
Due to some std::move()s later, we must still capture the schema early.
Move the termination condition to the front of the loop so it's
clear why we're looping and when we stop.
It's less than perfectly clean since we widen the scope of some variables
(from loop-internal to loop-carried), but IMO it's clearer.
It's much easier to maintain this way. Since it uses ranges_to_vnodes,
it interacts with topology and needs integration into
effective_replication_map management.
The patch leaves bad indentation and an infinite-looking loop in
the interest of minimization, but that will be corrected later.
Note, the test for `!r.has_value()` was eliminated since it was
short-circuited by the test for `!rqr.has_value()` returning from
the coroutine rather than propagating an error.
We use result_wrap() in two places, but that makes coroutinizing the
containing function a little harder, since it's composed of more lambdas.
Remove the wrappers, gaining a bit of performance in the error case.