Commit Graph

7190 Commits

Author SHA1 Message Date
Asias He
9fdc596ff7 test: Add test_node_ops_metrics.py
It tests the node_ops_metrics_done metric reaches 100% when a node ops
is done.

Refs: #21174
(cherry picked from commit 9868ccbac0)
2024-10-28 09:54:30 +00:00
Lakshmi Narayanan Sreethar
91c693bf93 [Backport 6.1] replica/table: check memtable before discarding tombstone during read
On the read path, the compacting reader is applied only to the sstable
reader. This can cause an expired tombstone from an sstable to be purged
from the request before it has a chance to merge with deleted data in
the memtable leading to data resurrection.

Fix this by checking the memtables before deciding to purge tombstones
from the request on the read path. A tombstone will not be purged if a
key exists in any of the table's memtables with a minimum live timestamp
that is lower than the maximum purgeable timestamp.

Fixes #20916

`perf-simple-query` stats before and after this fix :

`build/Dev/scylla perf-simple-query --smp=1 --flush` :
```
// Before this Fix
// ---------------
94941.79 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59393 insns/op,   24029 cycles/op,        0 errors)
97551.14 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59376 insns/op,   23966 cycles/op,        0 errors)
96599.92 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59367 insns/op,   23998 cycles/op,        0 errors)
97774.91 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59370 insns/op,   23968 cycles/op,        0 errors)
97796.13 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59368 insns/op,   23947 cycles/op,        0 errors)

         throughput: mean=96932.78 standard-deviation=1215.71 median=97551.14 median-absolute-deviation=842.13 maximum=97796.13 minimum=94941.79
instructions_per_op: mean=59374.78 standard-deviation=10.78 median=59369.59 median-absolute-deviation=6.36 maximum=59393.12 minimum=59367.02
  cpu_cycles_per_op: mean=23981.67 standard-deviation=32.29 median=23967.76 median-absolute-deviation=16.33 maximum=24029.38 minimum=23947.19

// After this Fix
// --------------
95313.53 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59392 insns/op,   24058 cycles/op,        0 errors)
97311.48 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59375 insns/op,   24005 cycles/op,        0 errors)
98043.10 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59381 insns/op,   23941 cycles/op,        0 errors)
96750.31 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59396 insns/op,   24025 cycles/op,        0 errors)
93381.21 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59390 insns/op,   24097 cycles/op,        0 errors)

         throughput: mean=96159.93 standard-deviation=1847.88 median=96750.31 median-absolute-deviation=1151.55 maximum=98043.10 minimum=93381.21
instructions_per_op: mean=59386.60 standard-deviation=8.78 median=59389.55 median-absolute-deviation=6.02 maximum=59396.40 minimum=59374.73
  cpu_cycles_per_op: mean=24025.13 standard-deviation=58.39 median=24025.17 median-absolute-deviation=32.67 maximum=24096.66 minimum=23941.22
```

This PR fixes a regression introduced in ce96b472d3 and should be backported to older versions.

Closes scylladb/scylladb#20985

* github.com:scylladb/scylladb:
  topology-custom: add test to verify tombstone gc in read path
  replica/table: check memtable before discarding tombstone during read
  compaction_group: track maximum timestamp across all sstables

(cherry picked from commit 519e167611)

Backported from #20985 to 6.1.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#21250
2024-10-25 11:13:54 +03:00
Tomasz Grabiec
39c1a448f6 Merge '[Backport 6.1] replica: Fix tombstone GC during tablet split preparation' from Raphael Raph Carvalho
During split prepare phase, there will be more than 1 compaction group with
overlapping token range for a given replica.

Assume tablet 1 has sstable A containing deleted data, and sstable B containing
a tombstone that shadows data in A.

Then split starts:

sstable B is split first, and moved from main (unsplit) group to a
split-ready group
now compaction runs in split-ready group before sstable A is split
tombstone GC logic today only looks at underlying group, so compaction is step
2 will discard the deleted data in A, since it belongs to another group (the
unsplit one), and so the tombstone can be purged incorrectly.

To fix it, compaction will now work with all uncompacting sstables that belong
to the same replica, since tombstone GC requires all sstables that possibly
contain shadowed data to be available for correct decision to be made.

Fixes https://github.com/scylladb/scylladb/issues/20044.

Please replace this line with justification for the backport/* labels added to this PR
Branches 6.0, 6.1 and 6.2 are vulnerable, so backport is needed.

(cherry picked from commit bcd358595f)

(cherry picked from commit 93815e0649)

Refs https://github.com/scylladb/scylladb/pull/20939

Closes scylladb/scylladb#21205

* github.com:scylladb/scylladb:
  replica: Fix tombstone GC during tablet split preparation
  service: Improve error handling for split
2024-10-23 11:41:36 +02:00
Botond Dénes
03f370e971 Merge '[Backport 6.1] Check system.tablets update before putting it into the table' from ScyllaDB
Having tablet metadata with more than 1 pending replica will prevent this metadata from being (re)loaded due to sanity check on load. This patch fails the operation which tries to save the wrong metadata with a similar sanity check. For that, changes submitted to raft are validated, and if it's topology_change that affects system.tablets, the new "replicas" and "new_replicas" values are checked similarly to how they will be on (re)load.

fixes #20043

(cherry picked from commit f09fe4f351)

(cherry picked from commit e5bf376cbc)

(cherry picked from commit 1863ccd900)

 Refs #21020

Closes scylladb/scylladb#21110

* github.com:scylladb/scylladb:
  tablets: Validate system.tablets update
  group0_client: Introduce change validation
  group0_client: Add shared_token_metadata dependency
  replica/tablets: Add to_tablet_metadata_(row_)?key helpers
  replica/tablets: extract tablet_replica_set_from_cell()
2024-10-23 10:02:13 +03:00
Pavel Emelyanov
881ec8600f group0_client: Add shared_token_metadata dependency
It will be needed later to get tablet_metadata from.
The dependency is "OK", shared_token_metadata is low-level sharded
service. Client already references db::system_keyspace, which in turn
references replica::database which, finally, references token_metadata

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-22 13:16:52 +03:00
Botond Dénes
0d41447e1a Merge '[Backport 6.1] atomic_delete: allow deletion of sstables from several prefixes' from ScyllaDB
Allow create_pending_deletion_log to delete a bunch of sstables
potentially resides in different prefixes (e.g. in the base directory
and under staging/).

The motivation arises from table::cleanup_tablet that calls compaction_group::cleanup on all cg:s via cleanup_compaction_groups.  Cleanup, in turn, calls delete_sstables_atomically on all sstables in the compaction_group, in all states, including the normal state as well as staging - hence the requirement to support deleting sstables in different sub-directories.

Also, apparently truncate calls delete_atomically for all sstables too, via table::discard_sstables, so if it happened to be executed during view update generation, i.e. when there are sstables in staging, it should hit the assertion failure reported in https://github.com/scylladb/scylladb/issues/18862 as well (although I haven't seen it yet, but I see no reason why it would happen). So the issue was apparently present since the initial implementation of the pending_delete_log. It's just that with tablet migration it is more likely to be hit.

Fixes scylladb/scylladb#18862

Needs backport to 6.0 since tablets require this capability

(cherry picked from commit a7b92d7b6f)

(cherry picked from commit 027e64876a)

(cherry picked from commit 44bd183187)

(cherry picked from commit f47b5e60bc)

 Refs #19555

Closes scylladb/scylladb#20644

* github.com:scylladb/scylladb:
  sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory
  sstables: storage: keep base directory in base class
  sstables: storage: define opened_directory in header file
  sstable_directory: use only dirlog
2024-10-22 09:17:26 +03:00
Raphael S. Carvalho
5106d40577 replica: Fix tombstone GC during tablet split preparation
During split prepare phase, there will be more than 1 compaction group with
overlapping token range for a given replica.

Assume tablet 1 has sstable A containing deleted data, and sstable B containing
a tombstone that shadows data in A.

Then split starts:
1) sstable B is split first, and moved from main (unsplit) group to a
split-ready group
2) now compaction runs in split-ready group before sstable A is split

tombstone GC logic today only looks at underlying group, so compaction is step
2 will discard the deleted data in A, since it belongs to another group (the
unsplit one), and so the tombstone can be purged incorrectly.

To fix it, compaction will now work with all uncompacting sstables that belong
to the same replica, since tombstone GC requires all sstables that possibly
contain shadowed data to be available for correct decision to be made.

Fixes #20044.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 93815e0649)
2024-10-20 20:44:44 -03:00
Benny Halevy
a8e472178f sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory
To be able to atomically delete sstables both in
base table directory and in its sub-directories,
like `staging/`, use a shared pending_delete_dir
under under the base directory.

Note that this requires loading and processing
the base directory first.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit f47b5e60bc)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

# Conflicts:
#	sstables/sstable_directory.hh
2024-10-20 09:10:47 +03:00
Botond Dénes
7367544ea2 Merge '[Backport 6.1] tablet: Fix single-sstable split when attaching new unsplit sstables' from ScyllaDB
To fix a race between split and repair here c1de4859d8, a new sstable
  generated during streaming can be split before being attached to the sstable
  set. That's to prevent an unsplit sstable from reaching the set after the
  tablet map is resized.

  So we can think this split is an extension of the sstable writer. A failure
  during split means the new sstable won't be added. Also, the duration of split
  is also adding to the time erm is held. For example, repair writer will only
  release its erm once the split sstable is added into the set.

  This single-sstable split is going through run_custom_job(), which serializes
  with other maintenance tasks. That was a terrible decision, since the split may
  have to wait for ongoing maintenance task to finish, which means holding erm
  for longer. Additionally, if split monitor decides to run split on the entire
  compaction group, it can cause single-sstable split to be aborted since the
  former wants to select all sstables, propagating a failure to the streaming
  writer.
  That results in new sstable being leaked and may cause problems on restart,
  since the underlying tablet may have moved elsewhere or multiple splits may
  have happened. We have some fragility today in cleaning up leaked sstables on
  streaming failure, but this single-sstable split made it worse since the
  failure can happen during normal operation, when there's e.g. no I/O error.

  It makes sense to kill run_custom_job() usage, since the single-sstable split
  is offline and an extension of sstable writing, therefore it makes no sense to
  serialize with maintenance tasks. It must also inherit the sched group of the
  process writing the new sstable. The inheritance happens today, but is fragile.

  Fixes #20626.

(cherry picked from commit 999f1f1318)

(cherry picked from commit 38ce2c605d)

 Refs #20737

Closes scylladb/scylladb#20802

* github.com:scylladb/scylladb:
  tablet: Fix single-sstable split when attaching new unsplit sstables
  replica: Fix tablet split execute after restart
2024-10-17 19:36:47 +03:00
Piotr Smaron
f8d6215242 test: fix flaky test_multidc_alter_tablets_rf
The testcase is flaky due to a known python driver issue:
https://github.com/scylladb/python-driver/issues/317.
This issue causes the `CREATE KEYSPACE` statement to be sometimes
executed twice in a row, and the 2nd CREATE statement causes the test to
fail.
In order to work around it, it's enough to add `if not exists` when
creating a ks.

Fixes: #21034

Needs to be backported to all 6.x branches, as the PR introducing this flakiness is backported to every 6.x branch.

(cherry picked from commit 3969ffb39f)

Closes scylladb/scylladb#21106
2024-10-17 10:59:52 +03:00
Piotr Smaron
750ff26371 cql/tablets: handle MVs in ALTER tablets KEYSPACE
ALTERing tablets-enabled KEYSPACES (KS) didn't account for materialized
views (MV), and only produced tablets mutations changing tables.
With this patch we're producing tablets mutations for both tables and
MVs, hence when e.g. we change the replication factor (RF) of a KS, both the
tables' RFs and MVs' RFs are updated along with tablets replicas.
The `test_tablet_rf_change` testcase has been extended to also verify
that MVs' tablets replicas are updated when RF changes.

Fixes: #20240
(cherry picked from commit 5ac16e29e6)

Closes scylladb/scylladb#21023
2024-10-16 10:39:07 +03:00
Piotr Smaron
08165851fb cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS
Tablets load balancer is unable to process more than a single pending
replica, thus ALTER tablets KS cannot accept an ALTER statement which
would result in creating 2+ pending replicas, hence it has to validate
if the sum of absoulte differences of RFs specified in the statement is
not greter than 1.

(cherry picked from commit ee56bbfe61)
2024-10-08 18:06:54 +00:00
Piotr Smaron
1f6befe16d cql: join new and old KS options in ALTER tablets KS
A bug has been discovered while trying to ALTER tablets KS and
specifying only 1 out of 2 DCs - the not specified DC's RF has been
zeroed. This is because ALTER tablets KS updated the KS only with the
RF-per-DC mapping specified in the ALTER tablets KS statement, so if a
DC was ommitted, it was assigned a value of RF=0.
This commit fixes that plus additionally passes all the KS options, not
only the replication options, to the topology coordinator, where the KS
update is performed.
`initial_tablets` is a special case, which requires a special handling
in the source code, as we cannot simply update old initial_tablet's
settings with the new ones, because if only ` and TABLETS = {'enabled':
true}` is specified in the ALTER tablets KS statement, we should not zero the `initial_tablets`, but
rather keep the old value - this is tested by the
`test_alter_preserves_tablets_if_initial_tablets_skipped` testcase.
Other than that, the above mentioned testcase started to fail with
these changes, and it appeared to be an issue with the test not waiting
until ALTER is completed, and thus reading the old value, hence the
test's body has been modified to wait for ALTER to complete before
performing validation.

(cherry picked from commit 2aabe7f09c)
2024-10-08 18:06:53 +00:00
Piotr Smaron
0e0fe4d756 cql: validate RF change for new DCs in ALTER tablets KS
ALTER tablets KS validated if RF is not changed by more than 1 for DCs
that already had replicas, but not for DCs that didn't have them yet, so
specifying an RF jump from 0 to 2 was possible when listing a new DC in
ALTER tablets KS statement, which violated internal invariants of
tablets load balancer.
This PR fixes that bug and adds a multi-dc testcases to check if adding
replicas to a new DC and removing replicas from a DC is honoring the RF
change constraints.

Refs: #20039
(cherry picked from commit 47acdc1f98)
2024-10-08 18:06:46 +00:00
Piotr Smaron
78bf036419 cql: extend test_alter_tablet_keyspace_rf
Added cases to also test decreasing RF and setting the same RF.
Also added extra explanatory comments.

(cherry picked from commit 9c5950533f)
2024-10-08 18:06:45 +00:00
Piotr Smaron
4fc45b6fa6 cql: refactor test_tablets::test_alter_tablet_keyspace
1. Renamed the testcase to emphasize that it only focuses on testing
   changing RF - there are other tests that test ALTER tablets KS
in general.
2. Fixed whitespaces according to PEP8

(cherry picked from commit adf453af3f)
2024-10-08 18:06:44 +00:00
Piotr Smaron
dbb912c8dd cql: remove unused helper function from test_tablets
`change_default_rf` is not used anywhere, moreover it uses
`replication_factor` tag, which is forbidden in ALTER tablets KS
statement.

(cherry picked from commit 042825247f)
2024-10-08 18:06:42 +00:00
Pavel Emelyanov
190385ee2b cql: Check that CREATEing tablets/vnodes is consistent with the CLI
There are two bits that control whenter replication strategy for a
keyspace will use tablets or not -- the configuration option and CQL
parameter. This patch tunes its parsing to implement the logic shown
below:

    if (strategy.supports_tablets) {
         if (cql.with_tablets) {
             if (cfg.enable_tablets) {
                 return create_keyspace_with_tablets();
             } else {
                 throw "tablets are not enabled";
             }
         } else if (cql.with_tablets = off) {
              return create_keyspace_without_tablets();
         } else { // cql.with_tablets is not specified
              if (cfg.enable_tablets) {
                  return create_keyspace_with_tablets();
              } else {
                  return create_keyspace_without_tablets();
              }
         }
     } else { // strategy doesn't support tablets
         if (cql.with_tablets == on) {
             throw "invalid cql parameter";
         } else if (cql.with_tablets == off) {
             return create_keyspace_without_tablets();
         } else { // cql.with_tablets is not specified
             return create_keyspace_without_tablets();
         }
     }

closes: #20088

In order to enable tablets "by default" for NetworkTopologyStrategy
there's explicit check near ks_prop_defs::get_initial_tablets(), that's
not very nice. It needs more care to fix it, e.g. provide feature
service reference to abstract_replication_strategy constructor. But
since ks_prop_defs code already highjacks options specifically for that
strategy type (see prepare_options() helper), it's OK for now.

There's also #20768 misbehavior that's preserved in this patch, but
should be fixed eventually as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20928
2024-10-03 17:09:21 +03:00
Calle Wilund
4a1e83d6be commitlog: Fix buffer_list_bytes not updated correctly
Fixes #20862

With the change in 60af2f3cb2 the bookkeep
for buffer memory was changed subtly, the problem here that we would
shrink buffer size before we after flush use said buffer's size to
decrement the buffer_list_bytes value, previously inc:ed by the full,
allocated size. I.e. we would slowly grow this value instead of adjusting
properly to actual used bytes.

Test included.

(cherry picked from commit ee5e71172f)

Closes scylladb/scylladb#20914
2024-10-03 09:11:40 +03:00
Kamil Braun
a96654bea3 Merge '[Backport 6.1] Populate raft address map from gossiper on raft configuration change' from ScyllaDB
For each new node added to the raft config populate it's ID to IP mapping in raft address map from the gossiper. The mapping may have expired if a node is added to the raft configuration long after it first appears in the gossiper.

Fixes scylladb/scylladb#20600

Backport to all supported versions since the bug may cause bootstrapping failure.

(cherry picked from commit bddaf498df)

(cherry picked from commit 9e4cd32096)

 Refs #20601

Closes scylladb/scylladb#20848

* github.com:scylladb/scylladb:
  test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join
  group0: make sure that address map has an entry for each new node in the raft configuration
2024-09-30 17:03:03 +02:00
Kamil Braun
79119f58e8 Merge '[Backport 6.1] mark node as being replaced earlier' from Gleb Natapov
Before 17f4a151ce the node was marked as
been replaced in join_group0 state, before it actually joins the group0,
so by the time it actually joins and starts transferring snapshot/log no
traffic is sent to it. The commit changed this to mark the node as
being replaced after the snapshot/log is already transferred so we can
get the traffic to the node while it sill did not caught up with a
leader and this may causes problems since the state is not complete.
Mark the node as being replaced earlier, but still add the new node to
the topology later as the commit above intended.

Fixes: https://github.com/scylladb/scylladb/issues/20629

Need to be backported since this is a regression

(cherry picked from commit 644e7a2012)

(cherry picked from commit c0939d86f9)

(cherry picked from commit 1b4c255ffd)

Closes scylladb/scylladb#20834

* github.com:scylladb/scylladb:
  test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts
  topology coordinator:: mark node as being replaced earlier
  topology coordinator: do metadata barrier before calling finish_accepting_node() during replace
2024-09-27 16:10:07 +02:00
Andrei Chekun
392d95d2cd test.py: Increase workers for cluster cleaning
Increase workers for that used in method async_rmtree() that is used for
cleaning directories. This should help to reduce flakiness.
Increasing the workers count was introduced in f54b7f5427
but there is no need to backport the whole commit.

Closes scylladb/scylladb#20795
2024-09-27 14:47:08 +02:00
Gleb Natapov
39a8203160 test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join
(cherry picked from commit 9e4cd32096)
2024-09-26 21:13:39 +00:00
Gleb Natapov
c7be05cc50 test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts
(cherry picked from commit 1b4c255ffd)
2024-09-26 12:34:18 +03:00
Lakshmi Narayanan Sreethar
58da8fdbbc [Backport 6.1]: database::get_all_tables_flushed_at: fix return value
The `database::get_all_tables_flushed_at` method returns a variable
without setting the computed all_tables_flushed_at value. This causes
its caller, `maybe_flush_all_tables` to flush all the tables everytime
regardless of when they were last flushed. Fix this by returning
the computed value from `database::get_all_tables_flushed_at`.

Fixes #20301

Closes scylladb/scylladb#20471

* github.com:scylladb/scylladb:
  cql-pytest: add test to verify compaction_flush_all_tables_before_major_seconds config
  database::get_all_tables_flushed_at: fix return value

(cherry picked from commit 0e5b444777)

Backported from #20471 to 6.1.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#20581
2024-09-26 10:40:48 +03:00
Kamil Braun
92156e7930 test: fix topology_custom/test_raft_recovery_stuck flakiness
The test performs consecutive schema changes in RECOVERY mode. The
second change relies on the first. However the driver might route the
changes to different servers and we don't have group 0 to guarantee
linearizability. We must rely on the first change coordinator to push
the schema mutations to other servers before returning, but that only
happens when it sees other servers as alive when doing the schema
change. It wasn't guaranteed in the test. Fix this.

Fixes scylladb/scylladb#20791

Should be backported to all branches containing this test to reduce
flakiness.

(cherry picked from commit f390d4020a)

Closes scylladb/scylladb#20809
2024-09-25 15:11:50 +02:00
Abhinav
33b50a9d3a raft topology: add error for removal of non-normal nodes
In the current scenario, We check if a node being removed is normal
on the node initiating the removenode request. However, we don't have a
similar check on the topology coordinator. The node being removed could be
normal when we initiate the request, but it doesn't have to be normal when
the topology coordinator starts handling the request.
For example, the topology coordinator could have removed this node while handling
another removenode request that was added to the request queue earlier.

This commit intends to fix this issue by adding more checks in the enqueuing phase
and return errors for duplicate requests for node removal.

This PR fixes a bug. Hence we need to backport it.

Fixes: scylladb/scylladb#20271
(cherry picked from commit b25b8dccbd)

Closes scylladb/scylladb#20800
2024-09-25 11:35:27 +02:00
Raphael S. Carvalho
c0b2e89d35 replica: Fix tablet split execute after restart
let's assume there are 2 nodes, n1, n2. n1 is the coordinator.

1) n1 emits split
2) n1 and n2 complete split work
3) n1 becomes aware all replicas are ready for split
4) n2 restarts, but places split sstable into main group[1]
5) n1 executes split
6) n2 handles split completion, but see the main group is not empty

[1]: During split, main group should only contain unsplit sstables.
If all sstables are split, main must be empty.

This is a result of replica not setting storage group to split mode on restart
(using tablet map) and therefore sstables are incorrectly placed on main group.

The fix is about looking at tablet map and setting group to split mode before
sstables are populated into it.

Refs #20626.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 999f1f1318)
2024-09-25 02:13:42 +00:00
Gleb Natapov
43f9b3b997 test: skip test_lwt_semaphore::test_cas_semaphore in aarch64 debug mode
The test configures write timeout to much smaller value to make the test
run faster since for some writes sleep is inserted to hit the timeout,
but it makes aarch64 debug flaky since timeout happens when it should
not because of a natural slowness.

(cherry picked from commit 71a5b1c6dd)

Closes scylladb/scylladb#20777
2024-09-24 15:20:09 +02:00
Botond Dénes
7ed2f87414 Merge '[Backport 6.1] cql3: add option to not unify bind variables with the same' from Avi Kivity
Bind variables in CQL have two formats: positional (?) where a variable is referred to by its relative position in the statement, and named (:var), where the user is expected to supply a name->value mapping.

In 19a6e69001 we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to.

However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection.

Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the dialect and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables.

A unit test is added.

Fixes https://github.com/scylladb/scylladb/issues/15559

This may be useful to users transitioning from Cassandra, so merits a backport.

(cherry picked from commit f9322799af)

(cherry picked from commit d69bf4f010)

(cherry picked from commit ea8441dfa3)

Refs https://github.com/scylladb/scylladb/pull/19493

Closes scylladb/scylladb#20590

* github.com:scylladb/scylladb:
  cql3: add option to not unify bind variables with the same name
  cql3: introduce dialect infrastructure
  cql3: prepared_statement_cache: drop cache key default constructor
  Merge 'config: round-trip boolean configuration variables' from Avi Kivity
2024-09-24 15:15:05 +03:00
Piotr Dulikowski
bf6dd16071 Merge '[Backport 6.1] message/messaging_service: guard adding maintenance tenant under cluster feature' from Michał Jadwiszczak
In https://github.com/scylladb/scylladb/pull/18729, we introduced a new statement tenant $maintenance, but the change wasn't protected by any cluster feature.
This wasn't a problem for OSS, since unknown isolation cookie just uses default scheduling group. However, in enterprise that leads to creating a service level on not-upgraded nodes, which may end up in an error if user create maximum number of service levels.

This patch adds a cluster feature to guard adding the new tenant. It's done in the way to handle two upgrade scenarios:

version without $maintenance tenant -> version with $maintenance tenant guarded by a feature
version with $maintenance tenant but not guarded by a feature -> version with $maintenance tenant guarded by a feature
The PR adds enabled flag to statement tenants.
This way, when the tenant is disabled, it cannot be used to create a connection, but it can be used to accept an incoming connection.
The $maintenance tenant is added to the config as disabled and it gets enabled once the corresponding feature is enabled.

Fixes https://github.com/scylladb/scylladb/issues/20070
Refs https://github.com/scylladb/scylla-enterprise/issues/4403

(cherry picked from commit d44844241d)

(cherry picked from commit 71a03ef6b0)

(cherry picked from commit b4b91ca364)

Refs https://github.com/scylladb/scylladb/pull/19802

Closes scylladb/scylladb#20674

* github.com:scylladb/scylladb:
  message/messaging_service: guard adding maintenance tenant under cluster feature
  message/messaging_service: add feature_service dependency
  message/messaging_service: add `enabled` flag to statement tenants
2024-09-23 13:18:45 +02:00
Botond Dénes
f987afb2e1 Merge '[Manual Backport 6.1] generic_server: convert connection tracking to seastar::gate' from Laszlo Ersek
This is a manual backport of #20212 to 6.1, superseding #20345 (which had run into conflicts).

Please see the individual commit messages for backport notes.

Fixes #10305

Closes scylladb/scylladb#20355

* github.com:scylladb/scylladb:
  generic_server: make server::stop() idempotent
  generic_server: coroutinize server::shutdown()
  generic_server: make server::shutdown() idempotent
  test/generic_server: add test case
  configure, cmake: sort the lists of boost unit tests
  generic_server: convert connection tracking to seastar::gate
2024-09-18 15:52:32 +03:00
Michał Jadwiszczak
d11df0fcbc message/messaging_service: add feature_service dependency
(cherry-picked from 71a03ef6b0)
2024-09-18 11:26:56 +02:00
Tomasz Grabiec
edea822bd7 Merge '[Backport 6.1] tablets: Fix race between repair and split' from Raphael "Raph" Carvalho
Consider the following:

```
T
0   split prepare starts
1                               repair starts
2   split prepare finishes
3                               repair adds unsplit sstables
4                               repair ends
5   split executes
```
If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path.

The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set.

Fixes https://github.com/scylladb/scylladb/issues/19378.
Fixes https://github.com/scylladb/scylladb/issues/19416.

Please replace this line with justification for the backport/* labels added to this PR

(cherry picked from commit 239344ab55)

(cherry picked from commit 74612ad358)

Refs https://github.com/scylladb/scylladb/pull/19427

Closes scylladb/scylladb#20595

* github.com:scylladb/scylladb:
  tablets: Fix race between repair and split
  compaction: Allow "offline" sstable to be split
2024-09-17 13:25:03 +02:00
Aleksandra Martyniuk
032c9146d5 test: check if cleanup of deallocated sg is ignored
(cherry picked from commit 2c4b1d6b45)
2024-09-16 16:22:29 +02:00
Raphael S. Carvalho
fe56fa39c0 tablets: Fix race between repair and split
Consider the following:

T
0   split prepare starts
1                               repair starts
2   split prepare finishes
3                               repair adds unsplit sstables
4                               repair ends
5   split executes

If repair produces sstable after split prepare phase, the replica
will not split that sstable later, as prepare phase is considered
completed already. That causes split execution to fail as replicas
weren't really prepared. This also can be triggered with
load-and-stream which shares the same write (consumer) path.

The approach to fix this is the same employed to prevent a race
between split and migration. If migration happens during prepare
phase, it can happen source misses the split request, but the
tablet will still be split on the destination (if needed).
Similarly, the repair writer becomes responsible for splitting
the data if underlying table is in split mode. That's implemented
in replica::table for correctness, so if node crashes, the new
sstable missing split is still split before added to the set.

Fixes #19378.
Fixes #19416.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 74612ad358)
2024-09-13 21:32:01 -03:00
Avi Kivity
8ddfd0d70d cql3: add option to not unify bind variables with the same name
Bind variables in CQL have two formats: positional (`?`) where a
variable is referred to by its relative position in the statement,
and named (`:var`), where the user is expected to supply a
name->value mapping.

In 19a6e69001 we identified the case where a named bind variable
appears twice in a query, and collapsed it to a single entry in the
statement metadata. Without this, a driver using the named variable
syntax cannot disambiguate which variable is referred to.

However, it turns out that users can use the positional call form
even with the named variable syntax, by using the positional
API of the driver. To support this use case, we add a configuration
variable to disable the same-variable detection.

Because the detection has to happen when the entire statement is
visible, we have to supply the configuration to the parser. We
call it the `dialect` and pass it from all callers. The alternative
would be to add a pre-prepare call similar to fill_prepare_context that
rewrites all expressions in a statement to deduplicate variables.

A unit test is added.

Fixes #15559

(cherry picked from commit ea8441dfa3)
(cherry picked from commit edb3068ecf)
2024-09-13 18:17:15 +03:00
Avi Kivity
92dd47c6d6 cql3: introduce dialect infrastructure
A dialect is a different way to interpret the same CQL statement.

Examples:
 - how duplicate bind variable names are handled (later in this series)
 - whether `column = NULL` in LWT can return true (as is now) or
   whether it always returns NULL (as in SQL)

Currently, dialect is an empty structure and will be filled in later.
It is passed to query_processor methods that also accept a CQL string,
and from there to the parser. It is part of the prepared statement cache
key, so that if the dialect is changed online, previous parses of the
statement are ignored and the statement is prepared again.

The patch is careful to pick up the dialect at the entry point (e.g.
CQL protocol server) so that the dialect doesn't change while a statement
is parsed, prepared, and cached.

(cherry picked from commit d69bf4f010)
2024-09-13 18:11:11 +03:00
Nadav Har'El
d9ba5423bb Merge 'config: round-trip boolean configuration variables' from Avi Kivity
When you SELECT a boolean from system.config, it reads as true/false, but this isn't accepted
on UPDATE (instead, we accept 1/0). This is surprising and annoying, so accept true/false in
both directions.

Not a regression, so a backport isn't strictly necessary.

Closes scylladb/scylladb#19792

* github.com:scylladb/scylladb:
  config: specialize from-string conversion for bool
  config: wrap boost::lexical_cast<> when converting from strings

(cherry picked from commit 9eb47b3ef0)
2024-09-13 17:54:37 +03:00
Piotr Smaron
b60f9ef4c2 cql: fix exception when validating KS in CREATE TABLE
c70f321c6f added an extra check if KS
exists. This check can throw `data_dictionary::no_such_keyspace`
exception, which is supposed to be caught and a more user-friendly
exception should be thrown instead.
This commit fixes the above problem and adds a testcase to validate it
doesn't appear ever again.
Also, I moved the check for the keyspace outside of the `for` loop, as
it doesn't need to be checked repeatedly.
Additionally, I added an extra comment to both `no_such_keyspace` and
`no_such_column_family` exceptions explaining they should not be
returned directly to the caller, as they lack error code, which may not
trigger correct exceptions handling mechanisms on the driver side.

Fixes: #20097
(cherry picked from commit f1e8976fbe)

Closes scylladb/scylladb#20553
2024-09-13 11:36:51 +03:00
Kamil Braun
e4a18b0858 test: test_raft_no_quorum: increase raft timeout in debug mode
The test cases in this file use an error injection to reduce raft group
0 timeouts (from the default 1 minute), in order to speed up the tests;
the scenarios expect these timeouts to happen, so we want them to happen
as quick as possible, but we don't want to reduce timeouts so much that
it will make other operations fail when we don't expect them to (e.g.
when the test wants to add a node to the cluster).

Unfortunately the selected 5 seconds in debug mode was not enough and
made the tests flaky: scylladb/scylladb#20111.

Increase it to 10 seconds. This unfortunately will slow down these tests
as they have to sometimes wait for 10 seconds for the timeout to happen.
But better to have this than a flaky test.

Fixes: scylladb/scylladb#20111
(cherry picked from commit 52fdf5b4c9)

Closes scylladb/scylladb#20477
2024-09-10 08:48:06 +03:00
Botond Dénes
c64ae3f839 Merge '[Backport 6.1] repair: throw if batchlog manager isn't initialized' from ScyllaDB
repair_service::repair_flush_hints_batchlog_handler may access batchlog
manager while it is uninitialized.

Throw if batchlog manager isn't initialized.

Fixes:  #20236.

Needs backport to 6.0 and 6.1 as they suffer from the uninitialized bm access.

(cherry picked from commit d8e4393418)

(cherry picked from commit f38bb6483a)

 Refs #20251

Closes scylladb/scylladb#20351

* github.com:scylladb/scylladb:
  test: add test to ensure repair won't fail with uninitialized bm
  repair: throw if batchlog manager isn't initialized
2024-09-04 07:02:18 +03:00
Gleb Natapov
9db819763b topology coordinator: do not add replacing node without a ring to topology
When only inter dc encryption is enabled a non encrypted connection
between two nodes is allowed only if both nodes are in the same dc.
If a nodes that initiates the connection knows that dst is in the same
dc and hence use non encrypted connection, but the dst not yet knows the
topology of the src such connection will not be allowed since dst cannot
guaranty that dst is in the same dc.

Currently, when topology coordinator is used, a replacing node will
appear in the coordinator's topology immediately after it is added to the
group0. The coordinator will try to send raft message to the new node
and (assuming only inter dc encryption is enabled and replacing node and
the coordinator are in the same dc) it will try to open regular, non encrypted,
connection to it. But the replacing node will not have the coordinator
in it's topology yet (it needs to sync the raft state for that). so it
will reject such connection.

To solve the problem the patch does not add a replacing node that was
just added to group0 to the topology. It will be added later, when
tokens will be assigned to it. At this point a replacing node will
already make sure that its topology state is up-to-date (since it will
execute a raft barrier in join_node_response_params handler) and it knows
coordinator's topology. This aligns replace behaviour with bootstrap
since bootstrap also does not add a node without a ring to the topology.

The patch effectively reverts b8ee8911ca

Fixes: scylladb/scylladb#19025
(cherry picked from commit 17f4a151ce)
2024-09-01 11:57:25 +03:00
Gleb Natapov
4769e694d1 test: add test for replace in clusters with encryption enabled
(cherry picked from commit 2f1b1fd45e)
2024-09-01 11:56:37 +03:00
Gleb Natapov
74012c562a test.py: add server encryption support to cluster manager
(cherry picked from commit b98282a976)
2024-09-01 11:56:25 +03:00
Laszlo Ersek
16321fc243 test/generic_server: add test case
Check whether we can stop a generic server without first asking it to
listen.

The test fails currently; the failure mode is a hang, which triggers the 5
minute timeout set in the test:

> unknown location(0): fatal error: in "stop_without_listening":
> seastar::timed_out_error: timedout
> seastar/src/testing/seastar_test.cc(43): last checkpoint
> test/boost/generic_server_test.cc(34): Leaving test case
> "stop_without_listening"; testing time: 300097447us

Backport notes for 6.1:

- Replace

    #include "utils/assert.hh"
    SCYLLA_ASSERT(false);

  with

    #include <cassert>
    assert(false);

  due to 6.1 lacking commit aa1270a00c ("treewide: change assert() to
  SCYLLA_ASSERT()", 2024-08-05). The header file "utils/assert.hh"
  wouldn't be difficult to backport, but separating it from the treewide
  changes in commit aa1270a00c might not be the best idea.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
(cherry picked from commit dbc0ca6354)
2024-08-30 16:17:44 +02:00
Laszlo Ersek
8f0f362a30 configure, cmake: sort the lists of boost unit tests
Both lists were obviously meant to be sorted originally, but by today
we've introduced many instances of disorder -- thus, inserting a new test
in the proper place leaves the developer scratching their head. Sort both
lists.

Backport notes for 6.1:

- Conflicts in "configure.py", unsurprisingly. For the backport, I sorted
  the boost unit test list manually, from scratch.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
(cherry picked from commit 931f2f8d73)
2024-08-30 16:16:53 +02:00
Aleksandra Martyniuk
93fbe3af12 test: add test to ensure repair won't fail with uninitialized bm
(cherry picked from commit f38bb6483a)
2024-08-30 13:55:48 +00:00
Botond Dénes
e6d2d29dd1 Merge '[Backport 6.1] repair: do_rebuild_replace_with_repair: use source_dc only when safe' from ScyllaDB
It is unsafe to restrict the sync nodes for repair to the source data center if it has too low replication factor in network_topology_replication_strategy, or if other nodes in that DC are ignored.

Also, this change restricts the usage of source_dc to `network_topology` and `everywhere_topology`
strategies, as with simple replication strategy
there is no guarantee that there would be any
more replicas in that data center.

Fixes #16826

Reproducer submitted as https://github.com/scylladb/scylla-dtest/pull/3865
It fails without this fix and passes with it.

* Requires backport to live versions.  Issue hit in the filed with 2022.2.14

(cherry picked from commit 8b1877f3ca)

(cherry picked from commit 0419b1d522)

(cherry picked from commit b5d0ab092c)

(cherry picked from commit 9729dd21c3)

(cherry picked from commit 8665eef98c)

(cherry picked from commit 5f655e41e3)

 Refs #16827

Closes scylladb/scylladb#20228

* github.com:scylladb/scylladb:
  raft_rebuild: propagate source_dc force option to rebuild_option
  repair: do_rebuild_replace_with_repair: use source_dc only when safe
  repair: replace_with_repair: pass the replace_node downstream
  repair: replace_with_repair: pass ignore_nodes as a set of host_id:s
  repair: replace_rebuild_with_repair: pass ks_erms from caller
  nodetool: rebuild: add force option
  Add and use utils::optional_param to pass source_dc
2024-08-29 07:35:05 +03:00
Lakshmi Narayanan Sreethar
01661e1eaa test/pylib: fix keyspace_compaction method
The `keyspace_compaction` method incorrectly appends the column family
parameter to the URL using a regular string, `"?cf={table}"`, instead of
an f-string, `f"?cf={table}"`. As a result, the column family name is
sent as `{table}` to the server, causing the compaction request to fail.
Fix this issue by passing the parameter to the POST request using a
dictionary instead of appending it to the URL.

Fixes #20264

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit dc5c45e803)

Closes scylladb/scylladb#20273
2024-08-28 20:08:58 +03:00