Commit Graph

3559 Commits

Author SHA1 Message Date
Kefu Chai
a5e696fab8 storage_service, test: drop unused storage_service_config
this setting was removed back in
dcdd207349, so despite that we are still
passing `storage_service_config` to the ctor of `storage_service`,
`storage_service::storage_service()` just drops it on the floor.

in this change, `storage_service_config` class is removed, and all
places referencing it are updated accordingly.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>

Closes #11415
2022-08-31 19:49:13 +03:00
Avi Kivity
421557b40a Merge "Provide DC/RACK when populating topology" from Pavel E
"
The topology object maintains all sort of node/DC/RACK mappings on
board. When new entries are added to it the DC and RACK are taken
from the global snitch instance which, in turn, checks gossiper,
system keyspace and its local caches.

This set make topology population API require DC and RACK via the
call argument. In most of the cases the populating code is the
storage service that knows exactly where to get those from.

After this set it will be possible to remove the dependency knot
consiting of snitch, gossiper, system keyspace and messaging.
"

* 'br-topology-dc-rack-info' of https://github.com/xemul/scylla:
  toplogy: Use the provided dc/rack info
  test: Provide testing dc/rack infos
  storage_service: Provide dc/rack for snitch reconfiguration
  storage_service: Provide dc/rack from system ks on start
  storage_service: Provide dc/rack from gossiper for replacement
  storage_service: Provide dc/rack from gossiper for remotes
  storage_service,dht,repair: Provide local dc/rack from system ks
  system_keyspace: Cache local dc-rack on .start()
  topology: Some renames after previous patch
  topology: Require entry in the map for update_normal_tokens()
  topology: Make update_endpoint() accept dc-rack info
  replication_strategy: Accept dc-rack as get_pending_address_ranges argument
  dht: Carry dc-rack over boot_strapper and range_streamer
  storage_service: Make replacement info a real struct
2022-08-31 12:53:06 +03:00
Nadav Har'El
a797512148 Merge 'Raft test topology start stopped servers' from Alecco
Test teardown involves dropping the test keyspace. If there are stopped servers occasionally we would see timeouts.

Start stopped servers after a test is finished (and passed).

Revert previous commit making teardown async again.

Closes #11412

* github.com:scylladb/scylladb:
  test.py: restart stopped servers before teardown...
  Revert "test.py: random tables make DDL queries async"
2022-08-30 22:48:47 +03:00
Pavel Emelyanov
e5e75ba43c Merge 'scylla-gdb.py: bring scylla reads-stats up-to-date' from Botond Dénes
Said command is broken since 4.6, as the type of `reader_concurrency_semaphore::_permit_list` was changed without an accompanying update to this command. This series updates said command and adds it to the list of tested commands so we notice if it breaks in the future.

Closes #11389

* github.com:scylladb/scylladb:
  test/scylla-gdb: test scylla read-stats
  scylla-gdb.py: read_stats: update w.r.t. post 4.5 code
  scylla-gdb.py: improve string_view_printer implementation
2022-08-30 20:24:02 +03:00
Alejo Sanchez
df1ca57fda test.py: restart stopped servers before teardown...
for topology tests

Test teardown involves dropping the test keyspace. If there are stopped
servers occasionally we would see timeouts.

Start stopped servers after a test is finished.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-30 11:40:40 +02:00
Alejo Sanchez
e5eac22a37 Revert "test.py: random tables make DDL queries async"
This reverts commit 67c91e8bcd.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-30 10:54:33 +02:00
Nadav Har'El
eed65dfc2d Merge 'db: schema_tables: Make table creation shadow earlier concurrent changes' from Tomasz Grabiec
Issuing two CREATE TABLE statements with a different name for one of
the partition key columns leads to the following assertion failure on
all replicas:

scylla: schema.cc:363: schema::schema(const schema::raw_schema&, std::optional<raw_view_info>): Assertion `!def.id || def.id == id - column_offset(def.kind)' failed.

The reason is that once the create table mutations are merged, the
columns table contains two entries for the same position in the
partition key tuple.

If the schemas were the same, or not conflicting in a way which leads
to abort, the current behavior would be to drop the older table as if
the last CREATE TABLE was preceded by a DROP TABLE.

The proposed fix is to make CREATE TABLE mutation include a tombstone
for all older schema changes of this table, effectively overriding
them. The behavior will be the same as if the schemas were not
different, older table will be dropped.

Fixes #11396

Closes #11398

* github.com:scylladb/scylladb:
  db: schema_tables: Make table creation shadow earlier concurrent changes
  db: schema_tables: Fix formatting
  db: schema_mutations: Make operator<<() print all mutations
  schema_mutations: Make it a monoid by defining appropriate += operator
2022-08-29 14:21:07 +03:00
Tomasz Grabiec
ae8d2a550d db: schema_tables: Make table creation shadow earlier concurrent changes
Issuing two CREATE TABLE statements with a different name for one of
the partition key columns leads to the following assertion failure on
all replicas:

scylla: schema.cc:363: schema::schema(const schema::raw_schema&, std::optional<raw_view_info>): Assertion `!def.id || def.id == id - column_offset(def.kind)' failed.

The reason is that once the create table mutations are merged, the
columns table contains two entries for the same position in the
partition key tuple.

If the schemas were the same, or not conflicting in a way which leads
to abort, the current behavior would be to drop the older table as if
the last CREATE TABLE was preceded by a DROP TABLE.

The proposed fix is to make CREATE TABLE mutation include a tombstone
for all older schema changes of this table, effectively overriding
them. The behavior will be the same as if the schemas were not
different, older table will be dropped.

Fixes #11396
2022-08-29 12:06:02 +02:00
Alejo Sanchez
67c91e8bcd test.py: random tables make DDL queries async
There are async timeouts for ALTER queries. Seems related to othe issues
with the driver and async.

Make these queries synchronous for now.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #11394
2022-08-28 10:38:39 +03:00
Pavel Emelyanov
10e8804417 test: Provide testing dc/rack infos
There's a test that's sensitive to correct dc/rack info for testing
entries. To populate them it uses global rack-inferring snitch instance
or a special "testing" snitch. To make it continue working add a helper
that would populate the topology properly (spoiler: next branch will
replace it with explicitly populated topology object).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 10:00:04 +03:00
Pavel Emelyanov
4cbe6ee9f4 topology: Require entry in the map for update_normal_tokens()
The method in question tries to be on the safest side and adds the
enpoint for which it updates the tokens into the topology. From now on
it's up to the caller to put the endpoint into topology in advance.

So most of what this patch does is places topology.update_endpoint()
into the relevant places of the code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:44:08 +03:00
Botond Dénes
4d33812a77 test/scylla-gdb: test scylla read-stats
This command was not run before, allowing it to silently break.
2022-08-26 08:08:28 +03:00
Wojciech Mitros
49dba4f0c1 functions: fix dropping of a keyspace with an aggregate in it
Currently, if a keyspace has an aggregate and the keyspace
is dropped, the keyspace becomes corrupted and another keyspace
with the same name cannot be created again

This is caused by the fact that when removing an aggregate, we
call create_aggregate() to get values for its name and signature.
In the create_aggregate(), we check whether the row and final
functions for the aggregate exist.
Normally, that's not an issue, because when dropping an existing
aggregate alone, we know that its UDFs also exist. But when dropping
and entire keyspace, we first drop the UDFs, making us unable to drop
the aggregate afterwards.

This patch fixes this behavior by removing the create_aggregate()
from the aggregate dropping implementation and replacing it with
specific calls for getting the aggregate name and signature.

Additionally, a test that would previously fail is added to
cql-pytest/test_uda.py where we drop a keyspace with an aggregate.

Fixes #11327

Closes #11375
2022-08-25 16:28:57 +02:00
Tomasz Grabiec
83850e247a Merge 'raft: server: handle aborts when waiting for config entry to commit' from Kamil Braun
Changing configuration involves two entries in the log: a 'joint
configuration entry' and a 'non-joint configuration entry'. We use
`wait_for_entry` to wait on the joint one. To wait on the non-joint one,
we use a separate promise field in `server`. This promise wasn't
connected to the `abort_source` passed into `set_configuration`.

The call could get stuck if the server got removed from the
configuration and lost leadership after committing the joint entry but
before committing the non-joint one, waiting on the promise. Aborting
wouldn't help. Fix this by subscribing to the `abort_source` in
resolving the promise exceptionally.

Furthermore, make sure that two `set_configuration` calls don't step on
each other's toes by one setting the other's promise. To do that, reset
the promise field at the end of `set_configuration` and check that it's
not engaged at the beginning.

Fixes #11288.

Closes #11325

* github.com:scylladb/scylladb:
  test: raft: randomized_nemesis_test: additional logging
  raft: server: handle aborts when waiting for config entry to commit
2022-08-25 12:49:09 +02:00
Avi Kivity
df87949241 Merge "Remove batch tokens update helper" from Pavel E
"
On token_metadata there are two update_normal_tokens() overloads --
one updates tokens for a single endpoint, another one -- for a set
(well -- std::map) of them. Other than updating the tokens both
methods also may add an endpoint to the t.m.'s topology object.

There's an ongoing effort in moving the dc/rack information from
snitch to topology, and one of the changes made in it is -- when
adding an entry to topology, the dc/rack info should be provided
by the caller (which is in 99% of the cases is the storage service).
The batched tokens update is extremely unfriendly to the latter
change. Fortunately, this helper is only used by tests, the core
code always uses fine-grained tokens updating.
"

* 'br-tokens-update-relax' of https://github.com/xemul/scylla:
  token_metadata: Indentation fix after prevuous patch
  token_metadata: Remove excessive empty tokens check
  token_metadata: Remove batch tokens updating method
  tests: Use one-by-one tokens updating method
2022-08-25 12:01:58 +02:00
Wojciech Mitros
9e6e8de38f tests: prevent test_wasm from occasional failing
Some cases in test_wasm.py assumed that all cases
are ran in the same order every time and depended
on values that should have been added to tables in
previous cases. Because of that, they were sometimes
failing. This patch removes this assumption by
adding the missing inserts to the affected cases.

Additionally, an assert that confirms low miss
rate of udfs is more precise, a comment is added
to explain it clearly.

Closes #11367
2022-08-25 11:32:06 +03:00
Kamil Braun
90233551be test: raft: randomized_nemesis_test: don't access failure detector service after it's stopped
It could happen that we accessed failure detector service after it was
stopped if a reconfiguration happened in the 'right' moment. This would
resolve in an assertion failure. Fix this.

Closes #11326
2022-08-25 11:32:06 +03:00
Tomasz Grabiec
1d0264e1a9 Merge 'Implement Raft upgrade procedure' from Kamil Braun
Start with a cluster with Raft disabled, end up with a cluster that performs
schema operations using group 0.

Design doc:
https://docs.google.com/document/d/1PvZ4NzK3S0ohMhyVNZZ-kCxjkK5URmz1VP65rrkTOCQ/
(TODO: replace this with .md file - we can do it as a follow-up)

The procedure, on a high level, works as follows:
- join group 0
- wait until every peer joined group 0 (peers are taken from `system.peers`
  table)
- enter `synchronize` upgrade state, in which group 0 operations are disabled
- wait until all members of group 0 entered `synchronize` state or some member
  entered the final state
- synchronize schema by comparing versions and pulling if necessary
- enter the final state (`use_new_procedures`), in which group 0 is used for
  schema operations.

With the procedure comes a recovery mode in case the upgrade procedure gets
stuck (and it may if we lose a node during recovery - the procedure, to
correctly establish a single group 0 cluster, requires contacting every node).

This recovery mode can also be used to recover clusters with group 0 already
established if they permanently lose a majority of nodes - killing two birds with
one stone. Details in the last commit message.

Read the design doc, then read the commits in topological order
for best reviewing experience.

---

I did some manual tests: upgrading a cluster, using the cluster to add nodes,
remove nodes (both with `decommission` and `removenode`), replacing nodes.
Performing recovery.

As a follow-up, we'll need to implement tests using the new framework (after
it's ready). It will be easy to test upgrades and recovery even with a single
Scylla version - we start with a cluster with the RAFT flag disabled, then
rolling-restart while enabling the flag (and recovery is done through simple
CQL statements).

Closes #10835

* github.com:scylladb/scylladb:
  service/raft: raft_group0: implement upgrade procedure
  service/raft: raft_group0: extract `tracker` from `persistent_discovery::run`
  service/raft: raft_group0: introduce local loggers for group 0 and upgrade
  service/raft: raft_group0: introduce GET_GROUP0_UPGRADE_STATE verb
  service/raft: raft_group0_client: prepare for upgrade procedure
  service/raft: introduce `group0_upgrade_state`
  db: system_keyspace: introduce `load_peers`
  idl-compiler: introduce cancellable verbs
  message: messaging_service: cancellable version of `send_schema_check`
2022-08-25 11:32:06 +03:00
Pavel Emelyanov
1d437302a8 tests: Use one-by-one tokens updating method
Tests are the only users of batch tokens updating "sugar" which
actually makes things more complicated

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-24 08:24:21 +03:00
Avi Kivity
6ce5e9079c Merge 'utils/logalloc: consolidate lsa state in shard tracker' from Botond Dénes
Currently the state of LSA is scattered across a handful of global variables. This series consolidates all these into a single one: the shard tracker. Beyond reducing the number of globals (the less globals, the better) this paves the way for a planned de-globalization of the shard tracker itself.
There is one separate global left, the static migrators registry. This is left as-is for now.

Closes #11284

* github.com:scylladb/scylladb:
  utils/logalloc: remove reclaim_timer:: globals
  utils/logalloc: make s_sanitizer_report_backtrace global a member of tracker
  utils/logalloc: tracker_reclaimer_lock: get shard tracker via constructor arg
  utils/logalloc: move global stat accessors to tracker
  utils/logalloc: allocating_section: don't use the global tracker
  utils/logalloc: pass down tracker::impl reference to segment_pool
  utils/logalloc: move segment pool into tracker
  utils/logalloc: add tracker member to basic_region_impl
  utils/logalloc: make segment independent of segment pool
2022-08-23 18:51:14 +02:00
Tomasz Grabiec
9c4e32d2e2 Merge 'raft: server: drop waiters in applier_fiber instead of io_fiber' from Kamil Braun
When `io_fiber` fetched a batch with a configuration that does not
contain this node, it would send the entries committed in this batch to
`applier_fiber` and proceed by any remaining entry dropping waiters (if
the node was no longer a leader).

If there were waiters for entries committed in this batch, it could
either happen that `applier_fiber` received and processed those entries
first, notifying the waiters that the entries were committed and/or
applied, or it could happen that `io_fiber` reaches the dropping waiters
code first, causing the waiters to be resolved with
`commit_status_unknown`.

The second scenario is undesirable. For example, when a follower tries
to remove the current leader from the configuration using
`modify_config`, if the second scenario happens, the follower will get
`commit_status_unknown` - this can happen even though there are no node
or network failures. In particular, this caused
`randomized_nemesis_test.remove_leader_with_forwarding_finishes` to fail
from time to time.

Fix it by serializing the notifying and dropping of waiters in a single
fiber - `applier_fiber`. We decided to move all management of waiters
into `applier_fiber`, because most of that management was already there
(there was already one `drop_waiters` call, and two `notify_waiters`
calls). Now, when `io_fiber` observes that we've been removed from the
config and no longer a leader, instead of dropping waiters, it sends a
message to `applier_fiber`. `applier_fiber` will drop waiters when
receiving that message.

Improve an existing test to reproduce this scenario more frequently.

Fixes #11235.

Closes #11308

* github.com:scylladb/scylladb:
  test: raft: randomized_nemesis_test: more chaos in `remove_leader_with_forwarding_finishes`
  raft: server: drop waiters in `applier_fiber` instead of `io_fiber`
  raft: server: use `visit` instead of `holds_alternative`+`get`
2022-08-23 17:19:44 +02:00
Kamil Braun
e350e37605 service/raft: raft_group0: implement upgrade procedure
A listener is created inside `raft_group0` for acting when the
SUPPORTS_RAFT feature is enabled. The listener is established after the
node enters NORMAL status (in `raft_group0::finish_setup_after_join()`,
called at the end of `storage_service::join_cluster()`).

The listener starts the `upgrade_to_group0` procedure.

The procedure, on a high level, works as follows:
- join group 0
- wait until every peer joined group 0 (peers are taken from
  `system.peers` table)
- enter `synchronize` upgrade state, in which group 0 operations are
  disabled (see earlier commit which implemented this logic)
- wait until all members of group 0 entered `synchronize` state or some
  member entered the final state
- synchronize schema by comparing versions and pulling if necessary
- enter the final state (`use_new_procedures`), in which group 0 is used
  for schema operations (only those for now).

The devil lies in the details, and the implementation is ugly compared
to this nice description; for example there are many retry loops for
handling intermittent network failures. Read the code.

`leave_group0` and `remove_group0` were adjusted to handle the upgrade
procedure being run correctly; if necessary, they will wait for the
procedure to finish.

If the upgrade procedure gets stuck (and it may, since it requires all
nodes to be available to contact them to correctly establish a single
group 0 raft cluster); or if a running cluster permanently loses a
majority of nodes, causing group 0 unavailability; the cluster admin
is not left without help.

We introduce a recovery mode, which allows the admin to
completely get rid of traces of existing group 0 and restart the
upgrade procedure - which will establish a new group 0. This works even
in clusters that never upgraded but were bootstrapped using group 0 from
scratch.

To do that, the admin does the following on every node:
- writes 'recovery' under 'group0_upgrade_state' key
  in `system.scylla_local` table,
- truncates the `system.discovery` table,
- truncates the `system.group0_history` table,
- deletes group 0 ID and group 0 server ID from `system.scylla_local`
  (the keys are `raft_group0_id` and `raft_server_id`
then the admin performs a rolling restart of their cluster. The nodes
restart in a "group 0 recovery mode", which simply means that the nodes
won't try to perform any group 0 operations. Then the admin calls
`removenode` to remove the nodes that are down. Finally, the admin
removes the `group0_upgrade_state` key from `system.scylla_local`,
rolling-restarts the cluster, and the cluster should establish group 0
anew.

Note that this recovery procedure will have to be extended when new
stuff is added to group 0 - like topology change state. Indeed, observe
that a minority of nodes aren't able to receive committed entries from a
leader, so they may end up in inconsistent group 0 states. It wouldn't
be safe to simply create group 0 on those nodes without first ensuring
that they have the same state from which group 0 will start.
Right now the state only consist of schema tables, and the upgrade
procedure ensures to synchronize them, so even if the nodes started in
inconsistent schema states, group 0 will correctly be established.
(TODO: create a tracking issue? something needs to remind us of this
 whenever we extend group 0 with new stuff...)
2022-08-23 13:51:01 +02:00
Kamil Braun
b42dfbc0aa test: raft: randomized_nemesis_test: additional logging
Add some more logging to `randomized_nemesis_test` such as logging the
start and end of a reconfiguration operation in a way that makes it easy
to find one given the other in the logs.
2022-08-23 13:14:30 +02:00
Tomasz Grabiec
0e5b86d3da Merge 'Optimize mutation consume of range tombstones in reverse' from Benny Halevy
Reversing the whole range_tombstone_list
into reversed_range_tombstones is inefficient
and can lead to reactor stalls with a large number of
range tombstones.

Instead, iterate over the range_tombsotne_list in reverse
direction and reverse each range_tombstone as we go,
keeping the result in the optional cookie.reversed_rt member.

While at it, this series contains some other cleanups on this path
to improve the code readability and maybe make the compiler's life
easier as for optimizing the cleaned-up code.

Closes #11271

* github.com:scylladb/scylladb:
  mutation: consume_clustering_fragments: get rid of reversed_range_tombstones;
  mutation: consume_clustering_fragments: reindent
  mutation: consume_clustering_fragments: shuffle emit_rt logic around
  mutation: consume, consume_gently: simplify partition_start logic
  mutation: consume_clustering_fragments: pass iterators to mutation_consume_cookie ctor
  mutation: consume_clustering_fragments: keep the reversed schema in cookie
  mutation: clustering_iterators: get rid of current_rt
  mutation_test: test_mutation_consume_position_monotonicity: test also consume_gently
2022-08-23 10:05:39 +02:00
Botond Dénes
7d17d675af utils/logalloc: move global stat accessors to tracker
These are pretend free functions, accessing globals in the background,
make them a member of the tracker instead, which everything needed
locally to compute them. Callers still have to access these stats
through the global tracker instance, but this can be changed to happen
through a local instance. Soon....
2022-08-23 10:38:58 +03:00
Nadav Har'El
9c15659194 Merge 'test.py: bump timeout of async requests for topology' from Alecco
Topology tests do async requests using the Python driver. The driver's
API for async doesn't use the session timeout.

Pass 60 seconds timeout (default is 10) to match the session's.

Fixes https://github.com/scylladb/scylladb/issues/11289

Closes #11348

* github.com:scylladb/scylladb:
  test.py: bump schema agreement timeout for topology tests
  test.py: bump timeout of async requests for topology
  test.py: fix bad indent
2022-08-23 10:30:59 +03:00
Botond Dénes
331033adae Merge 'Fix frozen mutation consume ordering' from Benny Halevy
Currently, frozen_mutation is not consumed in position_in_partition
order as all range tombstones are consumed before all rows.

This violates the range_tombstone_generator invariants
as its lower_bound needs to be monotonically increasing.

Fix this by adding mutation_partition_view::accept_ordered
and rewriting do_accept_gently to do the same,
both making sure to consume the range tombstones
and clustering rows in position_in_partition order,
similar to the mutation consume_clustering_fragments function.

Add a unit test that verifies that.

Fixes #11198

Closes #11269

* github.com:scylladb/scylladb:
  mutation_partition_view: make mutation_partition_view_virtual_visitor stoppable
  frozen_mutation: consume and consume_gently in-order
  frozen_mutation: frozen_mutation_consumer_adaptor: rename rt to rtc
  frozen_mutation: frozen_mutation_consumer_adaptor: return early when flush returns stop_iteration::yes
  frozen_mutation: frozen_mutation_consumer_adaptor: consume static row unconditionally
  frozen_mutation: frozen_mutation_consumer_adaptor: flush current_row before rt_gen
2022-08-23 06:37:04 +03:00
Alejo Sanchez
01cac33472 test.py: bump schema agreement timeout for topology tests
Increase the schema agreement timeout to match other timeouts.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-22 21:07:55 +02:00
Alejo Sanchez
f9d31112cf test.py: bump timeout of async requests for topology
Topology tests do async requests using the Python driver. The driver's
API for async doesn't use the session timeout.

Pass 60 seconds timeout (default is 10) to match the session's.

This will hopefully will fix timeout failures on debug mode.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-22 21:07:03 +02:00
Mikołaj Sielużycki
b5380baf8a frozen_mutation: consume and consume_gently in-order
Currently, frozen_mutation is not consumed in position_in_partition
order as all range tombstones are consumed before all rows.

This violates the range_tombstone_generator invariants
as its lower_bound needs to be monotonically increasing.

Fix this by adding mutation_partition_view::accept_ordered
and rewriting do_accept_gently to do the same,
both making sure to consume the range tombstones
and clustering rows in position_in_partition order,
similar to the mutation consume_clustering_fragments function.

Add a unit test that verifies that.

Fixes #11198

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-22 20:12:20 +03:00
Kamil Braun
e0c6153adf test: raft: randomized_nemesis_test: more chaos in remove_leader_with_forwarding_finishes
Improve the randomness of this test, making it a bit easier to
reproduce the scenarios that the test aims to catch.

Increase timeouts a bit to account for this additional randomness.
2022-08-22 18:53:48 +02:00
Alejo Sanchez
87c233b36b test.py: fix bad indent
Fix leftover bad indent

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-22 14:29:54 +02:00
Nadav Har'El
941c719a23 alternator: return ProvisionedThroughput in DescribeTable
DescribeTable is currently hard-coded to return PAY_PER_REQUEST billing
mode. Nevertheless, even in PAY_PER_REQUEST mode, the DescribeTable
operation must return a ProvisionedThroughput structure, listing both
ReadCapacityUnits and WriteCapacityUnits as 0. This requirement is not
stated in some DynamoDB documentation but is explictly mentioned in
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ProvisionedThroughput.html
Also in empirically, DynamoDB returns ProvisionedThroughput with zeros
even in PAY_PER_REQUEST mode. We even had an xfailing test to confirm this.

The ProvisionedThroughput structure being missing was a problem for
applications like DynamoDB connectors for Spark, if they implicitly
assume that ProvisionedThroughput is returned by DescribeTable, and
fail (as described in issue #11222) if it's outright missing.

So this patch adds the missing ProvisionedThroughput structure, and
the xfailing test starts to pass.

Note that this patch doesn't change the fact that attempting to set
a table to PROVISIONED billing mode is ignored: DescribeTable continues
to always return PAY_PER_REQUEST as the billing mode and zero as the
provisioned capacities.

Fixes #11222

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11298
2022-08-22 09:58:09 +02:00
Piotr Sarna
484004e766 Merge 'Fix mutation commutativity with shadowable tombstone'
from Tomasz Grabiec

This series fixes lack of mutation associativity which manifests as
sporadic failures in
row_cache_test.cc::test_concurrent_reads_and_eviction due to differences
in mutations applied and read.

No known production impact.

Refs https://github.com/scylladb/scylladb/issues/11307

Closes #11312

* github.com:scylladb/scylladb:
  test: mutation_test: Add explicit test for mutation commutativity
  test: random_mutation_generator: Workaround for non-associativity of mutations with shadowable tombstones
  db: mutation_partition: Drop unnecessary maybe_shadow()
  db: mutation_partition: Maintain shadowable tombstone invariant when applying a hard tombstone
  mutation_partition: row: make row marker shadowing symmetric
2022-08-20 16:46:32 +02:00
Kamil Braun
43687be1f1 service/raft: raft_group0_client: prepare for upgrade procedure
Now, whether an 'group 0 operation' (today it means schema change) is
performed using the old or new methods, doesn't depend on the local RAFT
fature being enabled, but on the state of the upgrade procedure.

In this commit the state of the upgrade is always
`use_pre_raft_procedures` because the upgrade procedure is not
implemented yet. But stay tuned.

The upgrade procedure will need certain guarantees: at some point it
switches from `use_pre_raft_procedures` to `synchronize` state. During
`synchronize` schema changes must be disabled, so the procedure can
ensure that schema is in sync across the entire cluster before
establishing group 0. Thus, when the switch happens, no schema change
can be in progress.

To handle all this weirdness we introduce `_upgrade_lock` and
`get_group0_upgrade_state` which takes this lock whenever it returns
`use_pre_raft_procedures`. Creating a `group0_guard` - which happens at
the start of every group 0 operation - will take this lock, and the lock
holder shall be stored inside the guard (note: the holder only holds the
lock if `use_pre_raft_procedures` was returned, no need to hold it for
other cases). Because `group0_guard` is held for the entire duration of
a group 0 operation, and because the upgrade procedure will also have to
take this lock whenever it wants to change the upgrade state (it's an
rwlock), this ensures that no group 0 operation that uses the old ways
is happening when we change the state.

We also implement `wait_until_group0_upgraded` using a condition
variable. It will be used by certain methods during upgrade (later
commits; stay tuned).

Some additional comments were written.
2022-08-19 19:15:19 +02:00
Nadav Har'El
516089beb0 Merge 'Raft test topology II part 1' from Alecco
- Remove `ScyllaCluster.__getitem__()`  (pending request by @kbr- in a previous pull request), for this remove all direct access to servers from caller code
- Increase Python driver timeouts (req by @nyh)
- Improve `ManagerClient` API requests: use `http+unix://<sockname>/<resource>` instead of `http://localhost/<resource>` and callers of the helper method only pass the resource
- Improve lint and type hints

Closes #11305

* github.com:scylladb/scylladb:
  test.py: remove ScyllaCluster.__getitem__()
  test.py: ScyllaCluster check kesypace with any server
  test.py: ScyllaCluster server error log method
  test.py: ScyllaCluster read_server_log()
  test.py: save log point for all running servers
  test.py: ScyllaCluster provide endpoint
  test.py: build host param after before_test
  test.py: manager client disable lint warnings
  test.py: scylla cluster lint and type hint fixes
  test.py: increase more timeouts
  test.py: ManagerClient improve API HTTP requests
2022-08-18 20:27:50 +03:00
Alejo Sanchez
fe07f9ceed test.py: make topology conftest module paths work when imported
To allow other suites to use topology suite conftest, add pylib to the
module lookup path.

Closes #11313
2022-08-18 20:22:35 +03:00
Benny Halevy
7747b8fa33 sstables: define run_identifier as a strong tagged_uuid type
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11321
2022-08-18 19:03:10 +03:00
Tomasz Grabiec
5a9df433c6 test: mutation_test: Add explicit test for mutation commutativity 2022-08-17 17:39:54 +02:00
Tomasz Grabiec
3d9efee3bf test: random_mutation_generator: Workaround for non-associativity of mutations with shadowable tombstones
Given 3 row mutations:

m1 = {
      marker: {row_marker: dead timestamp=-9223372036854775803},
      tombstone: {row_tombstone: {shadowable tombstone: timestamp=-9223372036854775807, deletion_time=0}, {tombstone: none}}
}

m2 = {
      marker: {row_marker: timestamp=-9223372036854775805}
}

m3 = {
      tombstone: {row_tombstone: {shadowable tombstone: timestamp=-9223372036854775806, deletion_time=2}, {tombstone: none}}
}

We get different shadowable tombstones depending on the order of merging:

(m1 + m2) + m3 = {
       marker: {row_marker: dead timestamp=-9223372036854775803},
       tombstone: {row_tombstone: {shadowable tombstone: timestamp=-9223372036854775806, deletion_time=2}, {tombstone: none}}

m1 + (m2 + m3) = {
       marker: {row_marker: dead timestamp=-9223372036854775803},
       tombstone: {row_tombstone: {shadowable tombstone: timestamp=-9223372036854775807, deletion_time=0}, {tombstone: none}}
}

The reason is that in the second case the shadowable tombstone in m3
is shadwed by the row marker in m2. In the first case, the marker in
m2 is cancelled by the dead marker in m1, so shadowable tombstone in
m3 is not cancelled (the marker in m1 does not cancel because it's
dead).

This wouldn't happen if the dead marker in m1 was accompanied by a
hard tombstone of the same timestamp, which would effectively make the
difference in shadowable tombstones irrelevant.

Found by row_cache_test.cc::test_concurrent_reads_and_eviction.

I'm not sure if this situation can be reached in practice (dead marker
in mv table but no row tombstone).

Work it around for tests by producing a row tombstone if there is a
dead marker.

Refs #11307
2022-08-17 17:39:54 +02:00
Benny Halevy
017f9b4131 mutation_test: test_mutation_consume_position_monotonicity: test also consume_gently
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-17 14:43:52 +03:00
Alejo Sanchez
d732d776ed test.py: remove ScyllaCluster.__getitem__()
Users of ScyllaCluster should not directly manage its ScyllaServers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-17 10:24:48 +02:00
Alejo Sanchez
729f8e2834 test.py: ScyllaCluster check kesypace with any server
Directly pick any server instead of calling self[0].

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-17 10:24:48 +02:00
Alejo Sanchez
7ad7a5e718 test.py: ScyllaCluster server error log method
Provide server error logs to caller (test.py).

Avoids direct access to list of servers.

To be done later: pick the failed server. For now it just provides the
log of one server.

While there, fix type hints.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-17 10:24:48 +02:00
Alejo Sanchez
e755207fcc test.py: ScyllaCluster read_server_log()
Instead of accessing the first server, now test.py asks ScyllaCluster
for the server log.

In a later commit, ScyllaCluster will pick the appropriate server.

Also removes another direct access to the list of servers we want to get
rid of.
2022-08-17 10:24:48 +02:00
Alejo Sanchez
f141ab95f9 test.py: save log point for all running servers
For error reporting, before a test a mark of the log point in time is
saved. Previously, only the log of the first server was saved. Now it's
done for all running servers.

While there, remove direct access to servers on test.py.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-17 10:24:48 +02:00
Alejo Sanchez
8fff636776 test.py: ScyllaCluster provide endpoint
For pytest CQL driver connections a host id (IP) is used. Provide it
with a method.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-17 10:24:48 +02:00
Alejo Sanchez
30c8e961ba test.py: manager client disable lint warnings
Disable noisy lint warnings.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-17 10:24:48 +02:00
Alejo Sanchez
2b4c7fbb8a test.py: scylla cluster lint and type hint fixes
Add missing docstrings, reorder imports, add type hints, improve
formatting, fix variable names, fix line lengths, iterate over dicts not
keys, and disable noisy lint warnings.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-17 10:24:48 +02:00
Alejo Sanchez
566a4ebf4e test.py: increase more timeouts
Increase Python driver connection timeouts to deal with extreme cases
for slow debug builds in slow machines as done (and explained) in
95bd02246a.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-17 10:24:48 +02:00