Commit Graph

43683 Commits

Author SHA1 Message Date
Kefu Chai
ef32ba704d docs: explain precedence of configure options
to explain for instance which setting takes effect if both
command line options and `scylla.yaml` configures the same parameter.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 1aa030a8cd)

Closes scylladb/scylladb#20775
2024-09-26 10:47:42 +03:00
Anna Stuchlik
10d71d2f4b doc: update the unified installer instructions
This commit updates the unified installer instructions to avoid specifying a given version.
At the moment, we're technically unable to use variables in URLs, so we need to update
the page each release.

Fixes https://github.com/scylladb/scylladb/issues/20677

(cherry picked from commit 400a14eefa)

Closes scylladb/scylladb#20709
2024-09-26 10:45:53 +03:00
Anna Stuchlik
9afb3daf98 doc: fix a broken link
This commit fixes a link to the Manager by adding a missing underscore
to the external link.

(cherry picked from commit aa0c95c95c)

Closes scylladb/scylladb#20707
2024-09-26 10:45:17 +03:00
Tzach Livyatan
82e7cb5bf5 Update client-node-encryption: OpsnSSL is FIPS *enabled*
(cherry picked from commit cb864b11d8)

Closes scylladb/scylladb#20651
2024-09-26 10:42:12 +03:00
Lakshmi Narayanan Sreethar
58da8fdbbc [Backport 6.1]: database::get_all_tables_flushed_at: fix return value
The `database::get_all_tables_flushed_at` method returns a variable
without setting the computed all_tables_flushed_at value. This causes
its caller, `maybe_flush_all_tables` to flush all the tables everytime
regardless of when they were last flushed. Fix this by returning
the computed value from `database::get_all_tables_flushed_at`.

Fixes #20301

Closes scylladb/scylladb#20471

* github.com:scylladb/scylladb:
  cql-pytest: add test to verify compaction_flush_all_tables_before_major_seconds config
  database::get_all_tables_flushed_at: fix return value

(cherry picked from commit 0e5b444777)

Backported from #20471 to 6.1.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#20581
2024-09-26 10:40:48 +03:00
Kamil Braun
92156e7930 test: fix topology_custom/test_raft_recovery_stuck flakiness
The test performs consecutive schema changes in RECOVERY mode. The
second change relies on the first. However the driver might route the
changes to different servers and we don't have group 0 to guarantee
linearizability. We must rely on the first change coordinator to push
the schema mutations to other servers before returning, but that only
happens when it sees other servers as alive when doing the schema
change. It wasn't guaranteed in the test. Fix this.

Fixes scylladb/scylladb#20791

Should be backported to all branches containing this test to reduce
flakiness.

(cherry picked from commit f390d4020a)

Closes scylladb/scylladb#20809
2024-09-25 15:11:50 +02:00
Abhinav
33b50a9d3a raft topology: add error for removal of non-normal nodes
In the current scenario, We check if a node being removed is normal
on the node initiating the removenode request. However, we don't have a
similar check on the topology coordinator. The node being removed could be
normal when we initiate the request, but it doesn't have to be normal when
the topology coordinator starts handling the request.
For example, the topology coordinator could have removed this node while handling
another removenode request that was added to the request queue earlier.

This commit intends to fix this issue by adding more checks in the enqueuing phase
and return errors for duplicate requests for node removal.

This PR fixes a bug. Hence we need to backport it.

Fixes: scylladb/scylladb#20271
(cherry picked from commit b25b8dccbd)

Closes scylladb/scylladb#20800
2024-09-25 11:35:27 +02:00
Gleb Natapov
43f9b3b997 test: skip test_lwt_semaphore::test_cas_semaphore in aarch64 debug mode
The test configures write timeout to much smaller value to make the test
run faster since for some writes sleep is inserted to hit the timeout,
but it makes aarch64 debug flaky since timeout happens when it should
not because of a natural slowness.

(cherry picked from commit 71a5b1c6dd)

Closes scylladb/scylladb#20777
2024-09-24 15:20:09 +02:00
Botond Dénes
7ed2f87414 Merge '[Backport 6.1] cql3: add option to not unify bind variables with the same' from Avi Kivity
Bind variables in CQL have two formats: positional (?) where a variable is referred to by its relative position in the statement, and named (:var), where the user is expected to supply a name->value mapping.

In 19a6e69001 we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to.

However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection.

Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the dialect and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables.

A unit test is added.

Fixes https://github.com/scylladb/scylladb/issues/15559

This may be useful to users transitioning from Cassandra, so merits a backport.

(cherry picked from commit f9322799af)

(cherry picked from commit d69bf4f010)

(cherry picked from commit ea8441dfa3)

Refs https://github.com/scylladb/scylladb/pull/19493

Closes scylladb/scylladb#20590

* github.com:scylladb/scylladb:
  cql3: add option to not unify bind variables with the same name
  cql3: introduce dialect infrastructure
  cql3: prepared_statement_cache: drop cache key default constructor
  Merge 'config: round-trip boolean configuration variables' from Avi Kivity
2024-09-24 15:15:05 +03:00
Jenkins Promoter
f4ad3436cb Update ScyllaDB version to: 6.1.3 2024-09-24 15:07:23 +03:00
Benny Halevy
d13c77e1eb time_window_compaction_strategy: get_reshaping_job: restrict sort of multi_window vector to its size
Currently the function calls boost::partial_sort with a middle
iterator that might be out of bound and cause undefined behavior.

Check the vector size, and do a partial sort only if its longer
than `max_sstables`, otherwise sort the whole vector.

Fixes scylladb/scylladb#20608

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 39ce358d82)

Closes scylladb/scylladb#20663
2024-09-23 15:38:35 +03:00
Piotr Dulikowski
bf6dd16071 Merge '[Backport 6.1] message/messaging_service: guard adding maintenance tenant under cluster feature' from Michał Jadwiszczak
In https://github.com/scylladb/scylladb/pull/18729, we introduced a new statement tenant $maintenance, but the change wasn't protected by any cluster feature.
This wasn't a problem for OSS, since unknown isolation cookie just uses default scheduling group. However, in enterprise that leads to creating a service level on not-upgraded nodes, which may end up in an error if user create maximum number of service levels.

This patch adds a cluster feature to guard adding the new tenant. It's done in the way to handle two upgrade scenarios:

version without $maintenance tenant -> version with $maintenance tenant guarded by a feature
version with $maintenance tenant but not guarded by a feature -> version with $maintenance tenant guarded by a feature
The PR adds enabled flag to statement tenants.
This way, when the tenant is disabled, it cannot be used to create a connection, but it can be used to accept an incoming connection.
The $maintenance tenant is added to the config as disabled and it gets enabled once the corresponding feature is enabled.

Fixes https://github.com/scylladb/scylladb/issues/20070
Refs https://github.com/scylladb/scylla-enterprise/issues/4403

(cherry picked from commit d44844241d)

(cherry picked from commit 71a03ef6b0)

(cherry picked from commit b4b91ca364)

Refs https://github.com/scylladb/scylladb/pull/19802

Closes scylladb/scylladb#20674

* github.com:scylladb/scylladb:
  message/messaging_service: guard adding maintenance tenant under cluster feature
  message/messaging_service: add feature_service dependency
  message/messaging_service: add `enabled` flag to statement tenants
2024-09-23 13:18:45 +02:00
Botond Dénes
f987afb2e1 Merge '[Manual Backport 6.1] generic_server: convert connection tracking to seastar::gate' from Laszlo Ersek
This is a manual backport of #20212 to 6.1, superseding #20345 (which had run into conflicts).

Please see the individual commit messages for backport notes.

Fixes #10305

Closes scylladb/scylladb#20355

* github.com:scylladb/scylladb:
  generic_server: make server::stop() idempotent
  generic_server: coroutinize server::shutdown()
  generic_server: make server::shutdown() idempotent
  test/generic_server: add test case
  configure, cmake: sort the lists of boost unit tests
  generic_server: convert connection tracking to seastar::gate
2024-09-18 15:52:32 +03:00
Michał Jadwiszczak
7e14df5ba7 message/messaging_service: guard adding maintenance tenant under cluster feature
Set `enabled` flag for `$maintenance` tenant to false and
enable it when `MAINTENANCE_TENANT` feature is enabled.

(cherry-picked from b4b91ca364)
2024-09-18 11:31:26 +02:00
Michał Jadwiszczak
d11df0fcbc message/messaging_service: add feature_service dependency
(cherry-picked from 71a03ef6b0)
2024-09-18 11:26:56 +02:00
Michał Jadwiszczak
f928bb7967 message/messaging_service: add enabled flag to statement tenants
Adding a new tenant needs to be done under cluster feature protection.
However it wasn't the case for adding `$maintenance` statement tenant
and to fix it we need to support an upgrade from node which doesn't
know about maintenance tenant at all and from one which uses it without
any cluster feature protection.

This commit adds `enabled` flag to statement tenants.
This way, when the tenant is disabled, it cannot be used to create
a connection, but it can be used to accept an incoming connection.

(cherry-picked from d44844241d)
2024-09-18 11:23:02 +02:00
Tomasz Grabiec
edea822bd7 Merge '[Backport 6.1] tablets: Fix race between repair and split' from Raphael "Raph" Carvalho
Consider the following:

```
T
0   split prepare starts
1                               repair starts
2   split prepare finishes
3                               repair adds unsplit sstables
4                               repair ends
5   split executes
```
If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path.

The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set.

Fixes https://github.com/scylladb/scylladb/issues/19378.
Fixes https://github.com/scylladb/scylladb/issues/19416.

Please replace this line with justification for the backport/* labels added to this PR

(cherry picked from commit 239344ab55)

(cherry picked from commit 74612ad358)

Refs https://github.com/scylladb/scylladb/pull/19427

Closes scylladb/scylladb#20595

* github.com:scylladb/scylladb:
  tablets: Fix race between repair and split
  compaction: Allow "offline" sstable to be split
2024-09-17 13:25:03 +02:00
Avi Kivity
fb98d6f832 Merge '[Backport 6.1] replica: ignore cleanup of deallocated storage group' from Aleksandra Martyniuk
Cleanup of a deallocated tablet throws an exception.
Since failed cleanup is retried, we end up in an infinite loop.

Ignore cleanup of deallocated storage groups.

Fixes: https://github.com/scylladb/scylladb/issues/19752.

Needs to be backported to all branches with tablets (6.0 and later)

(cherry picked from commit 20d6cf55f2)

(cherry picked from commit 2c4b1d6b45)

Refs https://github.com/scylladb/scylladb/pull/20584

Closes scylladb/scylladb#20627

* github.com:scylladb/scylladb:
  test: check if cleanup of deallocated sg is ignored
  replica: ignore cleanup of deallocated storage group
2024-09-17 12:22:00 +03:00
Gleb Natapov
d2e9007442 paxos_state: release semaphore units before checking if a semaphore can be dropped
To drop a semaphore it should not be held by anyone, so we need to
release out units before checking if a semaphore can be dropped.

Fixes: scylladb/scylladb#20602
(cherry picked from commit 9cc54932ae)

Closes scylladb/scylladb#20621
2024-09-16 22:08:45 +03:00
Aleksandra Martyniuk
032c9146d5 test: check if cleanup of deallocated sg is ignored
(cherry picked from commit 2c4b1d6b45)
2024-09-16 16:22:29 +02:00
Aleksandra Martyniuk
120ff5aeb8 replica: ignore cleanup of deallocated storage group
Currently, attempt to cleanup deallocated storage group throws
an exception. Failed tablet cleanup is retried, stucking
in an endless loop.

Ignore cleanup of deallocated storage group.

(cherry picked from commit 20d6cf55f2)
2024-09-16 12:44:36 +00:00
Raphael S. Carvalho
fe56fa39c0 tablets: Fix race between repair and split
Consider the following:

T
0   split prepare starts
1                               repair starts
2   split prepare finishes
3                               repair adds unsplit sstables
4                               repair ends
5   split executes

If repair produces sstable after split prepare phase, the replica
will not split that sstable later, as prepare phase is considered
completed already. That causes split execution to fail as replicas
weren't really prepared. This also can be triggered with
load-and-stream which shares the same write (consumer) path.

The approach to fix this is the same employed to prevent a race
between split and migration. If migration happens during prepare
phase, it can happen source misses the split request, but the
tablet will still be split on the destination (if needed).
Similarly, the repair writer becomes responsible for splitting
the data if underlying table is in split mode. That's implemented
in replica::table for correctness, so if node crashes, the new
sstable missing split is still split before added to the set.

Fixes #19378.
Fixes #19416.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 74612ad358)
2024-09-13 21:32:01 -03:00
Avi Kivity
8ddfd0d70d cql3: add option to not unify bind variables with the same name
Bind variables in CQL have two formats: positional (`?`) where a
variable is referred to by its relative position in the statement,
and named (`:var`), where the user is expected to supply a
name->value mapping.

In 19a6e69001 we identified the case where a named bind variable
appears twice in a query, and collapsed it to a single entry in the
statement metadata. Without this, a driver using the named variable
syntax cannot disambiguate which variable is referred to.

However, it turns out that users can use the positional call form
even with the named variable syntax, by using the positional
API of the driver. To support this use case, we add a configuration
variable to disable the same-variable detection.

Because the detection has to happen when the entire statement is
visible, we have to supply the configuration to the parser. We
call it the `dialect` and pass it from all callers. The alternative
would be to add a pre-prepare call similar to fill_prepare_context that
rewrites all expressions in a statement to deduplicate variables.

A unit test is added.

Fixes #15559

(cherry picked from commit ea8441dfa3)
(cherry picked from commit edb3068ecf)
2024-09-13 18:17:15 +03:00
Avi Kivity
92dd47c6d6 cql3: introduce dialect infrastructure
A dialect is a different way to interpret the same CQL statement.

Examples:
 - how duplicate bind variable names are handled (later in this series)
 - whether `column = NULL` in LWT can return true (as is now) or
   whether it always returns NULL (as in SQL)

Currently, dialect is an empty structure and will be filled in later.
It is passed to query_processor methods that also accept a CQL string,
and from there to the parser. It is part of the prepared statement cache
key, so that if the dialect is changed online, previous parses of the
statement are ignored and the statement is prepared again.

The patch is careful to pick up the dialect at the entry point (e.g.
CQL protocol server) so that the dialect doesn't change while a statement
is parsed, prepared, and cached.

(cherry picked from commit d69bf4f010)
2024-09-13 18:11:11 +03:00
Avi Kivity
4bf81f54b4 cql3: prepared_statement_cache: drop cache key default constructor
It's unnecessary, and interferes with the following patch where
we change the cache key type.

(cherry picked from commit f9322799af)
2024-09-13 17:56:06 +03:00
Nadav Har'El
d9ba5423bb Merge 'config: round-trip boolean configuration variables' from Avi Kivity
When you SELECT a boolean from system.config, it reads as true/false, but this isn't accepted
on UPDATE (instead, we accept 1/0). This is surprising and annoying, so accept true/false in
both directions.

Not a regression, so a backport isn't strictly necessary.

Closes scylladb/scylladb#19792

* github.com:scylladb/scylladb:
  config: specialize from-string conversion for bool
  config: wrap boost::lexical_cast<> when converting from strings

(cherry picked from commit 9eb47b3ef0)
2024-09-13 17:54:37 +03:00
Piotr Smaron
b60f9ef4c2 cql: fix exception when validating KS in CREATE TABLE
c70f321c6f added an extra check if KS
exists. This check can throw `data_dictionary::no_such_keyspace`
exception, which is supposed to be caught and a more user-friendly
exception should be thrown instead.
This commit fixes the above problem and adds a testcase to validate it
doesn't appear ever again.
Also, I moved the check for the keyspace outside of the `for` loop, as
it doesn't need to be checked repeatedly.
Additionally, I added an extra comment to both `no_such_keyspace` and
`no_such_column_family` exceptions explaining they should not be
returned directly to the caller, as they lack error code, which may not
trigger correct exceptions handling mechanisms on the driver side.

Fixes: #20097
(cherry picked from commit f1e8976fbe)

Closes scylladb/scylladb#20553
scylla-6.1.2 scylla-6.1.2-candidate-20240915043632
2024-09-13 11:36:51 +03:00
Piotr Dulikowski
00e96d4b70 Merge '[Backport 6.1]: hints: send hints with CL=ALL if target is leaving' from Piotr Dulikowski
Currently, when attempting to send a hint, we might choose its recipients in one of two ways:

- If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other,
- Otherwise, we send the hint to all current replicas of the mutation.

There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded.

As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destination of the hint is leaving.

Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again.

Fixes scylladb/scylladb#20558
Fixes scylladb/scylla-dtest#4582
Refs scylladb/scylladb#19835

This is a backport of the original PR without the tests, done avoid the need of resolving merge conflicts in that area.

Closes scylladb/scylladb#20557

* github.com:scylladb/scylladb:
  hints: send hints with CL=ALL if target is leaving
  hints: inline do_send_one_mutation
2024-09-13 09:39:36 +02:00
Abhi
848054079b raft: Add descriptions for requested abort errors
Fixes: scylladb/scylladb#18902

This PR only improves error messages, no need to backport it.

(cherry picked from commit 9b09439065)

Closes scylladb/scylladb#20526
2024-09-13 10:13:49 +03:00
Botond Dénes
c80cefe422 docs/cql/ddl.rst: fix description of sstable_compression
ScyllaDB doesn't support custom compressors. The available compressors
are the only available ones, not the default ones.
Adjust the text to reflect this.

(cherry picked from commit 08f109724b)

Closes scylladb/scylladb#20524
2024-09-13 10:12:59 +03:00
Takuya ASADA
b07c74a65c install.sh: fix more incorrect permission on strict umask
Even after 13caac7, we still have more files incorrect permission, since
we use "cp -r" and creating new file with redirect.

To fix this, we need to replace "cp -r" with "cp -pr", and "chmod <perm>" on
newly created files.

Fixes #14383
Related #19775

(cherry picked from commit 9d7fed40b5)

Closes scylladb/scylladb#20432
2024-09-13 10:12:22 +03:00
Piotr Dulikowski
2556c7a0dc hints: send hints with CL=ALL if target is leaving
Currently, when attempting to send a hint, we might choose its
recipients in one of two ways:

- If the original destination is a natural endpoint of the hint, we only
  send the hint to that node and none other,
- Otherwise, we send the hint to all current replicas of the mutation.

There is a problem when we decommission a node: while data is streamed
away from that node, it is still considered to be a natural endpoint of
the data that it used to own. Because of that, it might happen that a
hint is sent directly to it but streaming will miss it, effectively
resulting in the hint being discarded.

As sending the hint _only_ to the leaving replica is a rather bad idea,
send the hint to all replicas also in the case when the original
destiantion of the hint is leaving.

Note that this is a conservative fix written only with the decommission
+ vnode-based keyspaces combo in mind. In general, such "data loss" can
occur in other situations where the replica set is changing and we go
through a streaming phase, i.e. other topology operations in case of
vnodes and tablet load balancing. However, the consistency guarantees of
hinted handoff in the face of topology changes are not defined and it is
not clear what they should be, if there should be any at all. The
picture is further complicated by the fact that hints are used by
materialized views, and sending view updates to more replicas than
necessary can introduce inconsistencies in the form of "ghost rows".
This fix was developed in response to a failing test which checked the
hint replay + decommission scenario, and it makes it work again.

Fixes scylladb/scylla-dtest#4582
Refs scylladb/scylladb#19835

(cherry picked from commit 61ac0a336d)
2024-09-12 10:55:29 +02:00
Piotr Dulikowski
132d77f447 hints: inline do_send_one_mutation
It's a small method and it is only used once in send_one_mutation.
Inlining it lets us get rid of its declaration in the header - now, if
one needs to change the variables passed from one function to another,
it is no longer necessary to change the header.

(cherry picked from commit 8abb06ab82)
2024-09-12 10:55:21 +02:00
Gleb Natapov
bb9249f055 db/consistency_level: do not use result from hit weighted load balancer if it contains duplicates
Because of https://github.com/scylladb/scylladb/issues/9285 hit weighted
load balancer may sometimes return same node twice. It may cause wrong
data to be read or unexpected errors to be returned to a client. Since
the original bug is not easy to fix and it is rare lets introduce a
workaround. We will check for duplicates and will use non HWLB one if
one is found.

(cherry picked from commit e06a772b87)

Closes scylladb/scylladb#20468
2024-09-10 17:18:47 +03:00
Kamil Braun
e4a18b0858 test: test_raft_no_quorum: increase raft timeout in debug mode
The test cases in this file use an error injection to reduce raft group
0 timeouts (from the default 1 minute), in order to speed up the tests;
the scenarios expect these timeouts to happen, so we want them to happen
as quick as possible, but we don't want to reduce timeouts so much that
it will make other operations fail when we don't expect them to (e.g.
when the test wants to add a node to the cluster).

Unfortunately the selected 5 seconds in debug mode was not enough and
made the tests flaky: scylladb/scylladb#20111.

Increase it to 10 seconds. This unfortunately will slow down these tests
as they have to sometimes wait for 10 seconds for the timeout to happen.
But better to have this than a flaky test.

Fixes: scylladb/scylladb#20111
(cherry picked from commit 52fdf5b4c9)

Closes scylladb/scylladb#20477
2024-09-10 08:48:06 +03:00
Kefu Chai
105293b2ab docs: do not install scylla/ppa repo when perform upgrade
for following reasons:

1. the ppa in question does not provide the build for the latest ubuntu's LTS release. it only builds for trusty, xenial, bionic and jammy. according to https://wiki.ubuntu.com/Releases, the latest LTS release is ubuntu noble at the time of writing.
2. the ppa in question does not provide the packages used in production. it does provides the package for *building* scylla
3. after we introduced the relocatable package, there is no need to provide extra user space dependencies apart from scylla packages.

so, in this change, we remove all references to enabling the Scylla/PPA repository.

Fixes scylladb/scylladb#20449

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit fe0e961856)

Closes scylladb/scylladb#20453
2024-09-10 08:46:47 +03:00
Nadav Har'El
ad47c0e2f9 alternator ttl: fix use-after-free
The Alternator TTL scanning code uses an object "scan_ranges_context"
to hold the scanning context. One of the members of this object is
a service::query_state, and that in turn holds a reference to a
service::client_state. The existing constructor created a temporary
client_state object and saved a reference to it - which can result
in use after free as the temporary object is freed as soon as the
constructor ends.

The fix is to save a client_state in the scan_ranges_context object,
instead of a temporary object.

Fixes #19988

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 15f8046fcb)

Closes scylladb/scylladb#20436
2024-09-10 08:43:14 +03:00
Kefu Chai
0eb66cbee5 sstables: correct the debugging message printed when removing temp dir
in 372a4d1b79, we introduced a change
which was for debugging the logging message. but the logging message
intended for printing the temp_dir not prints an `optional<int>`. this
is both confusing, and more importantly, it hurts the debuggability.

in this change, the related change is reverted.

Fixes scylladb/scylladb#20408

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit d26bb9ae30)

Closes scylladb/scylladb#20434
2024-09-10 08:42:29 +03:00
Kefu Chai
a2458f07d7 dist: drop %pretrans section
before this change, if user does not have `/bin/sh` around, when
installing scylla packages, the script in `%pretrans" is executed,
and fails due to missing `/bin/sh`. per
https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#pretrans

> Note that the %pretrans scriptlet will, in the particular case of
> system installation, run before anything at all has been installed.
> This implies that it cannot have any dependencies at all. For this
> reason, %pretrans is best avoided, but if used it MUST (by necessity)
> be written in Lua. See
> https://rpm-software-management.github.io/rpm/manual/lua.html for more
> information.

but we were trying to warn users upgrading from scylla < 1.7.3, which
was released 7 years ago at the time of writing.

in this change, we drop the `%pretrans` section. hopefuly they will
find their way out if they still exist.

Fixes scylladb/scylladb#20321

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 6970c502c9)

Closes scylladb/scylladb#20384
2024-09-10 08:40:11 +03:00
Avi Kivity
b484effcad docs: cql: document ZstdCompressor for CREATE TABLE
Adjust the wording slightly to be less awkward.

(cherry picked from commit 60acfd8c08)

Closes scylladb/scylladb#20380
2024-09-10 08:39:08 +03:00
Raphael S. Carvalho
4c4d1cce14 storage_service: avoid processing same table unnecessarily in split monitor
If there's a token metadata for a given table, and it is in split mode,
it will be registered such that split monitor can look at it, for
example, to start split work, or do nothing if table completed it.

during topology change, e.g. drain, split is stalled since it cannot
take over the state machine.
It was noticed that the log is being spammed with a message saying the
table completed split work, since every tablet metadata update, means
waking up the monitor on behalf of a table. So it makes sense to
demote the logging level to debug. That persists until drain completes
and split can finally complete.

Another thing that was noticed is that during drain, a table can be
submitted for processing faster than the monitor can handle, so the
candidate queue may end up with multiple duplicated entries for same
table, which means unnecessary work. That is fixed by using a
sequenced set, which keeps the current FIFO behavior.

Fixes #20339.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 26facd807e)

Closes scylladb/scylladb#20343
2024-09-10 08:37:20 +03:00
Botond Dénes
c64ae3f839 Merge '[Backport 6.1] repair: throw if batchlog manager isn't initialized' from ScyllaDB
repair_service::repair_flush_hints_batchlog_handler may access batchlog
manager while it is uninitialized.

Throw if batchlog manager isn't initialized.

Fixes:  #20236.

Needs backport to 6.0 and 6.1 as they suffer from the uninitialized bm access.

(cherry picked from commit d8e4393418)

(cherry picked from commit f38bb6483a)

 Refs #20251

Closes scylladb/scylladb#20351

* github.com:scylladb/scylladb:
  test: add test to ensure repair won't fail with uninitialized bm
  repair: throw if batchlog manager isn't initialized
2024-09-04 07:02:18 +03:00
Kamil Braun
f77686cefb Merge '[Backport 6.1] Fix node replace with inter-dc encryption enabled.' from Gleb Natapov
Currently if a coordinator and a node being replaced are in the same DC
while inter-dc encryption is enabled (connections between nodes in the
same DC should not be encrypted) the replace operation will fail. It
fails because a coordinator uses non encrypted connection to push raft
data to the new node, but the new node will not accept such connection
until it knows which DC the coordinator belongs to and for that the raft
data needs to be transferred.

The series adds the test for this scenario and the fix for the
chicken&egg problem above.

The series (or at least the fix itself) is needs to be backported because
this is a serious regression.

Fixes: https://github.com/scylladb/scylladb/issues/19025

(cherry picked from commit 84757a4ed3)

(cherry picked from commit b98282a976)

(cherry picked from commit 2f1b1fd45e)

(cherry picked from commit 17f4a151ce)

(cherry picked from commit 32a59ba98f)

Refs https://github.com/scylladb/scylladb/pull/20290

Closes scylladb/scylladb#20374

* github.com:scylladb/scylladb:
  topology coordinator: fix indentation after the last patch
  topology coordinator: do not add replacing node without a ring to topology
  test: add test for replace in clusters with encryption enabled
  test.py: add server encryption support to cluster manager
  .gitignore: fix pattern for resources to match only one specific directory
2024-09-02 16:14:37 +02:00
Gleb Natapov
d6a1a55d6c topology coordinator: fix indentation after the last patch
(cherry picked from commit 32a59ba98f)
2024-09-01 11:57:34 +03:00
Gleb Natapov
9db819763b topology coordinator: do not add replacing node without a ring to topology
When only inter dc encryption is enabled a non encrypted connection
between two nodes is allowed only if both nodes are in the same dc.
If a nodes that initiates the connection knows that dst is in the same
dc and hence use non encrypted connection, but the dst not yet knows the
topology of the src such connection will not be allowed since dst cannot
guaranty that dst is in the same dc.

Currently, when topology coordinator is used, a replacing node will
appear in the coordinator's topology immediately after it is added to the
group0. The coordinator will try to send raft message to the new node
and (assuming only inter dc encryption is enabled and replacing node and
the coordinator are in the same dc) it will try to open regular, non encrypted,
connection to it. But the replacing node will not have the coordinator
in it's topology yet (it needs to sync the raft state for that). so it
will reject such connection.

To solve the problem the patch does not add a replacing node that was
just added to group0 to the topology. It will be added later, when
tokens will be assigned to it. At this point a replacing node will
already make sure that its topology state is up-to-date (since it will
execute a raft barrier in join_node_response_params handler) and it knows
coordinator's topology. This aligns replace behaviour with bootstrap
since bootstrap also does not add a node without a ring to the topology.

The patch effectively reverts b8ee8911ca

Fixes: scylladb/scylladb#19025
(cherry picked from commit 17f4a151ce)
2024-09-01 11:57:25 +03:00
Gleb Natapov
4769e694d1 test: add test for replace in clusters with encryption enabled
(cherry picked from commit 2f1b1fd45e)
2024-09-01 11:56:37 +03:00
Gleb Natapov
74012c562a test.py: add server encryption support to cluster manager
(cherry picked from commit b98282a976)
2024-09-01 11:56:25 +03:00
Gleb Natapov
51215fb7f7 .gitignore: fix pattern for resources to match only one specific directory
(cherry picked from commit 84757a4ed3)
2024-09-01 11:54:42 +03:00
Laszlo Ersek
370bf14872 generic_server: make server::stop() idempotent
After server::shutdown(), make server::stop() more robust too, by allowing
callers (internal or external) to call it several times (not concurrently
though, just yet; see
<https://github.com/scylladb/scylladb/issues/20309>).

Suggested-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
(cherry picked from commit 49bff3b1ab)
2024-08-30 16:17:44 +02:00
Laszlo Ersek
860a1872bc generic_server: coroutinize server::shutdown()
By turning server::shutdown() into a coroutine, we need not dynamically
allocate "nr_conn".

Verified as follows:

(1) In terminal #1:

    build/Dev/scylla --overprovisioned --developer-mode=yes \
        --memory=2G --smp=1 --default-log-level error \
        --logger-log-level cql_server=debug:cql_server_controller=debug

> INFO  [...] cql_server_controller - Starting listening for CQL clients
>                                     on 127.0.0.1:9042 (unencrypted,
>                                     non-shard-aware)
> INFO  [...] cql_server_controller - Starting listening for CQL clients
>                                     on 127.0.0.1:19042 (unencrypted,
>                                     shard-aware)

(2) In terminals #2 and #3:

    tools/cqlsh/bin/cqlsh.py

(3) Press ^C in terminal #1:

> DEBUG [...] cql_server - abort accept nr_total=2
> DEBUG [...] cql_server - abort accept 1 out of 2 done
> DEBUG [...] cql_server - abort accept 2 out of 2 done
> DEBUG [...] cql_server - shutdown connection nr_total=4
> DEBUG [...] cql_server - shutdown connection 1 out of 4 done
> DEBUG [...] cql_server - shutdown connection 2 out of 4 done
> DEBUG [...] cql_server - shutdown connection 3 out of 4 done
> DEBUG [...] cql_server - shutdown connection 4 out of 4 done
> INFO  [...] cql_server_controller - CQL server stopped

This patch is best viewed with "git show --word-diff=color".

Suggested-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
(cherry picked from commit 1138347e7e)
2024-08-30 16:17:44 +02:00