Commit Graph

4008 Commits

Author SHA1 Message Date
Nadav Har'El
0c26032e70 test/cql-pytest: translate more Cassandra tests
This patch includes a translation of two more test files from
Cassandra's CQL unit test directory cql3/validation/operations.

All tests included here pass on Cassandra. Several test fail on Scylla
and are marked "xfail". These failures discovered two previously-unknown
bugs:

    #12243: Setting USING TTL of "null" should be allowed
    #12247: Better error reporting for oversized keys during INSERT

And also added reproducers for two previously-known bugs:

    #3882: Support "ALTER TABLE DROP COMPACT STORAGE"
    #6447: TTL unexpected behavior when setting to 0 on a table with
           default_time_to_live

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12248
2022-12-11 21:42:57 +02:00
Nadav Har'El
09a3c63345 cross-tree: allow std::source_location in clang 14
We recently (commit 6a5d9ff261) started
to use std::source_location instead of std::experimental::source_location.
However, this does not work on clang 14, because libc++ 12's
<source_location> only works if __builtin_source_location, and that is
not available on clang 14.

clang 15 is just three months old, and several relatively-recent
distributions still carry clang 14 so it would be nice to support it
as well.

So this patch adds a trivial compatibility header file, which, when
included and compiled with clang 14, it aliases the functional
std::experimental::source_location to std::source_location.

It turns out it's enough to include the new header file from three
headers that included <source_location> -  I guess all other uses
of source_location depend on those header files directly or indirectly.
We may later need to include the compatibility header file in additional
places, bug for now we don't.

Refs #12259

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12265
2022-12-11 20:28:49 +02:00
Avi Kivity
e6ffc22053 Merge 'cql3: Server-side DESC statement' from Michał Jadwiszczak
This PR adds server-side `DESCRIBE` statement, which is required in latest cqlsh version.

The only change from the user perspective is the `DESC ...` statement can be used with cqlsh version >= 6.0. Previously the statement was executed from client side, but starting with Cassandra 4.0 and cqlsh 6.0, execution of describe was moved to server side, so the user was unable to do `DESC ...` with Scylla and cqlsh 6.0.

Implemented describe statements:
- `DESC CLUSTER`
- `DESC [FULL] SCHEMA`
- `DESC [ONLY] KEYSPACE`
- `DESC KEYSPACES/TYPES/FUNCTIONS/AGGREGATES/TABLES`
- `DESC TYPE/FUNCTION/AGGREGATE/MATERIALIZED VIEW/INDEX/TABLE`
- `DESC`

[Cassandra's implementation for reference](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/DescribeStatement.java)

Changes in this patch:
- cql3::util: added `single_quite()` function
- added `data_dictionary::keyspace_element` interface
- implemented `data_dictionary::keyspace_element` for:
    - keyspace_metadata,
    - UDT, UDF, UDA
    - schema
- cql3::functions: added `get_user_functions()` and `get_user_aggregates()` to get all UDFs/UDAs in specified keyspace
- data_dictionary::user_types_metadata: added `has_type()` function
- extracted `describe_ring()` from storage_service to standalone helper function in `locator/util.hh`
- storage_proxy: added `describe_ring()` (implemented using helper function mentioned above)
- extended CQL grammar to handle describe statement
- increased version in `version.hh` to 4.0.0, so cqlsh will use server-side describe statement

Referring: https://github.com/scylladb/scylla/issues/9571, https://github.com/scylladb/scylladb/issues/11475

Closes #11106

* github.com:scylladb/scylladb:
  version: Increasing version
  cql-pytest: Add tests for server-side describe statement
  cql-pytest: creating random elements for describe's tests
  cql3: Extend CQL grammar with server-side describe statement
  cql3:statements: server-side describe statement
  data_dictonary: add `get_all_keyspaces()` and `get_user_keyspaces()`
  storage_proxy: add `describe_ring()` method
  storage_service, locator: extract describe_ring()
  data_dictionary:user_types_metadata: add has_type() function
  cql3:functions: `get_user_functions()` and `get_user_aggregates()`
  implement `keyspace_element` interface
  data_dictionary: add `keyspace_element` interface
  cql3: single_quote() util function
  view: row_lock: lock_ck: reindent
  test/topology: enable replace tests
  service/raft: report an error when Raft ID can't be found in `raft_group0::remove_from_group0`
  service: handle replace correctly with Raft enabled
  gms/gossiper: fetch RAFT_SERVER_ID during shadow round
  service: storage_service: sleep 2*ring_delay instead of BROADCAST_INTERVAL before replace
2022-12-11 18:29:36 +02:00
Michał Jadwiszczak
3ddde7c5ad cql-pytest: Add tests for server-side describe statement 2022-12-10 12:51:05 +01:00
Michał Jadwiszczak
f91d05df43 cql-pytest: creating random elements for describe's tests
Add helper functions to create random elements (keyspaces, tables, types)
to increase the coverage of describe statment's tests.

This commit also adds `random_seed` fixture. The fixture should be
always used when using random functions. In case of test's failure, the
seed will be present in test's signature and the case can be easili
recreated.
After the test finishes, the fixture restores state of `random` to
before-test state.
2022-12-10 12:51:05 +01:00
Michał Jadwiszczak
673393d88a data_dictonary: add get_all_keyspaces() and get_user_keyspaces()
Adds functions to `data_dictionary::database` in order to obtain names
of all keyspaces/all user keyspaces.
2022-12-10 12:51:05 +01:00
Michał Jadwiszczak
29ad5a08a8 implement keyspace_element interface
This patch implements `data_dictionary::keyspace_element` interfece
in: `keyspace_metadata`, `user_type_impl`, `user_function`,
`user_aggregate` and schema.
2022-12-10 12:34:09 +01:00
Kamil Braun
c43e64946a test/topology: enable replace tests
Also add some TODOs for enhancing existing tests.
2022-12-10 12:27:22 +01:00
Nadav Har'El
e47794ed98 test/cql-pytest: regression test for index scan with start token
When we have a table with partition key p and an indexed regular column
v, the test included in this patch checks the query

     SELECT p FROM table WHERE v = 1 AND TOKEN(p) > 17

This can work and not require ALLOW FILTERING, because the secondary index
posting-list of "v=1" is ordered in p's token order (to allow SELECT with
and without an index to return the same order - this is explained in
issue #7443). So this test should pass, and indeed it does on both current
Scylla, and Cassandra.

However, it turns out that this was a bug - issue #7043 - in older
versions of Scylla, and only fixed in Scylla 4.6. In older versions,
the SELECT wasn't accepted, claiming it requires ALLOW FILTERING,
and if ALLOW FILTERING was added, the TOKEN(p) > 17 part was silently
ignored.

The fix for issue #7043 actually included regression tests, C++ tests in
test/boost/secondary_index_test.cc. But in this patch we also add a Python
test in test/cql-pytest.

One of the benefits of cql-pytest is that we can (and I did) run the same
test on Cassandra to verify we're not implementing a wrong feature.
Another benefit is that we can run a new test on an old version, and
not even require re-compilation: You can run this new test on any
existing installation of Scylla to check if it still has issue #7043.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12237
2022-12-09 09:33:16 +02:00
Pavel Emelyanov
6075e01312 test/lib: Remove sstable_utils.hh from simple_schema.hh
The latter is pretty popular test/lib header that disseminates the
former one over whole lot of unit tests. The former, in turn, naturally
includes sstables.hh thus making tons of unrelated tests depend on
sstables class unused by them.

However, simple removal doesn't work, becase of local_shard_only bool
class definition in sstable_utils.hh used in simple_schema.hh. This
thing, in turn, is used in keys making helpers that don't belong to
sstable utils, so these are moved into simple_schema as well.

When done, this affects the mutation_source_test.hh, which needs the
local_shard_only bool class (and helps spreading the sstables.hh
throughout more unrelated tests) and a bunch of .cc test sources that
used sstable_utils.hh to indirectly include various headers of their
demand.

After patching, sstables.hh touches 2x times less tests. As a side
effect the sstables_manager.hh also becomes 2x times less dependent
on by tests.

Continuation of 9bdea110a6

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12240
2022-12-08 15:37:33 +02:00
Nadav Har'El
4cdaba778d Merge 'Secondary indexes on static columns' from Piotr Dulikowski
This pull request introduces support for global secondary indexes based on static columns.

Local secondary indexes based on secondary columns are not planned to be supported and are explicitly forbidden. Because there is only one static row per partition and local indexes require full partition key when querying, such indexes wouldn't be very useful and would only waste resources.

The index table for secondary indexes on static columns, unlike other secondary indexes, do not contain clustering keys from the base table. A static column's value determines a set of full partitions, so the clustering keys would only be unnecessary.

The already existing logic for querying using secondary indexes works after introducing minimal notifications. The view update generation path now works on a common representation of static and clustering rows, but the new representation allowed to keep most of the logic intact.

New cql-pytests are added. All but one of the existing tests for secondary indexes on static columns - ported from Cassandra - now work and have their `xfail` marks lifted; the remaining test requires support for collection indexing, so it will start working only after #2962 is fixed.

Materialized view with static rows as a key are __not__ implemented in this PR.

Fixes: #2963

Closes #11166

* github.com:scylladb/scylladb:
  test_materialized_view: verify that static columns are not allowed
  test_secondary_index: add (currently failing) test for static index paging
  test_secondary_index: add more tests for secondary indexes on static columns
  cassandra_tests: enable existing tests for static columns
  create_index_statement: lift restriction on secondary indexes on static rows
  db/view: fetch and process static rows when building indexes
  gms/feature_service: introduce SECONDARY_INDEXES_ON_STATIC_COLUMNS cluster feature
  create_index_statement: disallow creation of local indexes with static columns
  select_statement: prepare paging for indexes on static columns
  select_statement: do not attempt to fetch clustering columns from secondary index's table
  secondary_index_manager: don't add clustering key columns to index table of static column index
  replica/table: adjust the view read-before-write to return static rows when needed
  db/view: process static rows in view_update_builder::on_results
  db/view: adjust existing view update generation path to use clustering_or_static_row
  column_computation: adjust to use clustering_or_static_row
  db/view: add clustering_or_static_row
  deletable_row: add column_kind parameter to is_live
  view_info: adjust view_column to accept column_kind
  db/view: base_dependent_view_info: split non-pk columns into regular and static
2022-12-08 09:54:05 +02:00
Piotr Dulikowski
4883e43677 test_materialized_view: verify that static columns are not allowed
Adds a test which verifies that static columns are not allowed in
materialized views. Although we added support for static columns in
secondary indexes, which share a lot of code with materialized views,
static columns in materialized views are not yet ready to use.
2022-12-08 07:41:33 +01:00
Piotr Dulikowski
f864944dcb test_secondary_index: add (currently failing) test for static index paging
Currently, when executing queries accelerated by an index on a static
column, paging is unable to break base table partitions across pages and
is forced to return them in whole. This will cause problems if such a
query must return a very large base table partition because it will have
to be loaded into memory.

Fixing this issue will require a more sophisticated approach than what
was done in the PR. For the time being, an xfailing pytest is added
which should start passing after paging is improved.
2022-12-08 07:41:33 +01:00
Piotr Dulikowski
4f836115fd test_secondary_index: add more tests for secondary indexes on static columns
Adds cql-pytests which test the secondary index on static columns
feature.
2022-12-08 07:41:32 +01:00
Tomasz Grabiec
a46b2e4e4c Merge 'Make node replace procedure work with Raft' from Kamil Braun
We need to obtain the Raft ID of the replaced node during the shadow round and
place it in the address map. It won't be placed by the regular gossiping route
if we're replacing using the same IP, because we override the application state
of the replaced node. Even if we replace a node with a different IP, it is not
guaranteed that background gossiping manages update the address map before we
need it, especially in tests where we set ring_delay to 0 and disable
wait_for_gossip_to_settle. The shadow round, on the other hand, performs a
synchronous request (and if it fails during bootstrap, bootstrap will fail -
because we also won't be able to obtain the tokens and Host ID of the replaced
node).

Fetch the Raft ID of the replaced node in `prepare_replacement_info`,
which runs the shadow round. Return it in `replacement_info`. Then
`join_token_ring` passes it to `setup_group0`, which stores it in the
address map. It does that after `join_group0` so the entry is
non-expiring (the replaced node is a member of group 0). Later in the
replace procedure, we call `remove_from_group0` for the replaced node.
`remove_from_group0` will be able to reverse-translate the IP of the
replaced node to its Raft ID using the address map.

Also remove an unconditional 60 seconds sleep from the replace code. Make it
dependent on ring_delay.

Enable the replace tests.

Modify some code related to removing servers from group 0 which depended on
storing IP addresses in the group 0 configuration.

Closes #12172

* github.com:scylladb/scylladb:
  test/topology: enable replace tests
  service/raft: report an error when Raft ID can't be found in `raft_group0::remove_from_group0`
  service: handle replace correctly with Raft enabled
  gms/gossiper: fetch RAFT_SERVER_ID during shadow round
  service: storage_service: sleep 2*ring_delay instead of BROADCAST_INTERVAL before replace
2022-12-07 15:30:27 +01:00
Pavel Emelyanov
9bdea110a6 code: Reduce fanout of sstables(_manager)?.hh over headers
This change removes sstables.hh from some other headers replacing it
with version.hh and shared_sstable.hh. Also this drops
sstables_manager.hh from some more headers, because this header
propagates sstables.hh via self. That change is pretty straightforward,
but has a recochet in database.hh that needs disk-error-handler.hh.

Without the patch touch sstables/sstable.hh results in 409 targets
recompillation, with the patch -- 299 targets.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12222
2022-12-07 14:34:19 +02:00
Avi Kivity
444de2831e dirty_memory_manager: move to replica module
It's a replica-side thing, so move it there. The related
flush_permit and sstable_write_permit are moved alongside.
2022-12-06 22:24:17 +02:00
Avi Kivity
a038a35ad6 test: dirty_memory_manager_test: disambiguate classes named 'test_region_group'
There are two similarly named classes: ::test_region_group and
dirty_memory_manager_logalloc::test_region_group. Rename the
former to ::raii_region_group (that's what it's for) and the
latter to ::test_region_group, to reduce confusion.
2022-12-06 22:20:38 +02:00
Piotr Dulikowski
680423ad9d cassandra_tests: enable existing tests for static columns
Removes the "xfail" marker from the now-passing tests related to
secondary indexes on static columns.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
05d4328f02 deletable_row: add column_kind parameter to is_live
While deletable_row is used to hold regular columns of a clustering row,
its name or implementation doesn't suggest that it is a requirement. In
fact, some of its methods already take a column_kind parameter which is
used to interpret the kind of columns held in the row.

This commit removes the assumption about the column kind from the
`deletable_row::is_live` method.
2022-12-06 11:21:16 +01:00
Avi Kivity
6f2d060d12 Merge 'Make sstable_directory call sstable_manager for sstables' components' from Pavel Emelyanov
This PR hits two goals for "object storage" effort

1. Sstables loader "knows" that sstables components are stored in a Linux directory and uses utils/lister to access it. This is not going to work with sstables over object storage, the loader should be abstracted from the underlying storage.

2. Currently class keyspace and class column_family carry "datadir" and "all_datadirs" on board which are path on local filesystem where sstable files are stored (those usually started with /var/lib/scylla/data). The paths include subsdirs like "snapshots", "staging", etc. This is not going to look nice for obejct storage, the /var/lib/ prefix is excessive and meaningless in this case. Instead, ks and cf should know their "location" and some other component should know the directory where in which the files are stored.

Said that, this PR prepares distributed_loader and sstables_directly to stop using Linux paths explicitly by making both call sstables_manager to list and open sstables object. After it will be possible to teach manager to list sstables from object storage. Also this opens the way to removing paths from keyspace and column_family classes and replacing those with relative "location"s.

Closes #12128

* github.com:scylladb/scylladb:
  sstable_directory: Get components lister from manager
  sstable_directory: Extract directory lister
  sstable_directory: Remove sstable creation callback
  sstable_directory: Call manager to make sstables
  sstable_directory: Keep error handler generator
  sstable_directory: Keep schema_ptr
  sstable_directory: Use directory semaphore from manager
  sstable_directory: Keep reference on manager
  tests: Use sstables creation helper in some cases
  sstables_manager: Keep directory semaphore reference
  sstables, code: Wrap directory semaphore with concurrency
2022-12-05 18:54:17 +02:00
Gleb Natapov
022a825b33 raft: introduce not_a_member error and return it when non member tries to do add/modify_config
Currently if a node that is outside of the config tries to add an entry
or modify config transient error is returned and this causes the node
to retry. But the error is not transient. If a node tries to do one of
the operations above it means it was part of the cluster at some point,
but since a node with the same id should not be added back to a cluster
if it is not in the cluster now it will never be.

Return a new error not_a_member to a caller instead.

Message-Id: <Y42mTOx8bNNrHqpd@scylladb.com>
2022-12-05 17:11:04 +01:00
Kamil Braun
3f8aaeeab9 test/topology: enable replace tests
Also add some TODOs for enhancing existing tests.
2022-12-05 11:50:07 +01:00
Pavel Emelyanov
abd3602b10 sstable_directory: Remove sstable creation callback
It's no longer used.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
db657a8d1c sstable_directory: Keep error handler generator
Yet another continuation to previous patch -- IO error handlers
generator is also needed to create sstables.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
4281f4af42 sstable_directory: Keep schema_ptr
Continuation of one-before-previous patch. In order to create sstable
without external lambda the directory code needs schema.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
8df1bcb907 sstable_directory: Use directory semaphore from manager
After previous patch sstables_directory code may no longer require for
semaphore argument, because it can get one from manager. This makes the
directory API shorter and simpler.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
4da941e159 sstable_directory: Keep reference on manager
The sstables_directly accesses /var/lib/scylla/data in two ways -- lists
files in it and opens sstables. The latter is abdtracted with the help
of lambdas passed around, but the former (listing) is done by using
directory liters from utils.

Listing sstables components with directlry lister won't work for object
storage, the directory code will need to call some abstraction layer
instead. Opening sstables with the help of a lambda is a bit of
overkill, having sstables manager at hand could make it much simpler.

Said that, this patch makes sstables_directly reference sstables_manager
on start.

This change will also simplify directory semaphore usage (next patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
784d78810a tests: Use sstables creation helper in some cases
Several test cases push sstables creation lambda into
with_sstables_directory helper. There's a ready to use helper class that
does the same. Next patch will make additional use of that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
5e13ce2619 sstables_manager: Keep directory semaphore reference
Preparational patch. The semaphore will be used by sstables_directory in
next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:18 +03:00
Pavel Emelyanov
be8512d7cc sstables, code: Wrap directory semaphore with concurrency
Currently this is a sharded<semaphore> started/stopped in main and
referenced by database in order to be fed into sstables code. This
semaphore always comes with the "concurrency" parameter that limits the
parallel_for_each parallelizm.

This patch wraps both together into directory_semaphore class. This
makes its usage simpler and will allow extending it in the future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 11:59:30 +03:00
Avi Kivity
02b66bb31a Merge 'Mark sstable::<directory accessing methods> private' from Pavel Emelyanov
One of the prerequisites to make sstables reside on object-storage is not to let the rest of the code "know" the filesystem path they are located on (because sometimes they will not be on any filesystem path). This patch makes the methods that can reveal this path back private so that later they can be abstracted out.

Closes #12182

* github.com:scylladb/scylladb:
  sstable: Mark some methods private
  test: Don't get sstable dir when known
  test: Use move_to_quarantine() helper
  test: Use sstable::filename() overload without dir name
  sstables: Reimplement batch directory sync after move
  table, tests: Make use of move_to_new_dir() default arg
  sstables: Remove fsync_directory() helper
  table: Simplify take_snapshot()'s collecting sstables names
2022-12-04 17:45:37 +02:00
Kamil Braun
b551cd254c test: test_raft_upgrade: fix test_recover_stuck_raft_upgrade flakiness
The test enables an error injection inside the Raft upgrade procedure
on one of the nodes which will cause the node to throw an exception
before entering `synchronize` state. Then it restarts other nodes with
Raft enabled, waits until they enter `synchronize` state, puts them in
RECOVERY mode, removes the error-injected node and creates a new Raft
group 0.

As soon as the other nodes enter `synchronize`, the test disabled the
error injection (the rest of the test was outside the `async with
inject_error(...)` block). There was a small chance that we disabled the
error injection before the node reached it. In that case the node also
entered `synchronize` and the cluster managed to finish the upgrade
procedure. We encountered this during next promotion.

Eliminate this possibility by extending the scope of the `async with
inject_error(...)` block, so that the RECOVERY mode steps on the other
nodes are performed within that block.

Closes #12162
2022-12-02 21:26:44 +01:00
Avi Kivity
94f18b5580 test: sstable_conforms_to_mutation_source: use do_with_async() where needed
The test clearly needs a thread (it converts a reader to a mutation
without waiting), so give it one.

Closes #12178
2022-12-02 20:48:37 +01:00
Pavel Emelyanov
fb63850f2c test: Don't get sstable dir when known
The sstable_move_test creates sstables in its own temp directories and
the requests these dirs' paths back from sstables. Test can come with
the paths it has at hand, no need to call sstables for it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:13:58 +03:00
Pavel Emelyanov
4c742a658d test: Use move_to_quarantine() helper
Two places in tests move sstable to quarantine subdir by hand. There's
the class sstable method that does the same, so use it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:13:19 +03:00
Pavel Emelyanov
d6244b7408 test: Use sstable::filename() overload without dir name
The dir this place currently uses is the directory where the sstable was
created, so dropping this argument would just render the same path.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:12:21 +03:00
Pavel Emelyanov
1b42d5fce3 table, tests: Make use of move_to_new_dir() default arg
The method in question accepts boolean bit whether or not it should sync
directories at the end. It's always true but in one case, so there's the
default value for it. Make use of it.

Anticipating the suggestion to replace bool with bool_class -- next
patch will replace it with something else.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:07:16 +03:00
Tomasz Grabiec
1a6bf2e9ca Merge 'service/raft: specialized verb for failure detector pinger' from Kamil Braun
We used GOSSIP_ECHO verb to perform failure detection. Now we use
a special verb DIRECT_FD_PING introduced for this purpose.

There are multiple reasons to do so.

One minor reason: we want to use the same connection as other Raft
verbs: if we can't deliver Raft append_entries or vote messages
somewhere, that endpoint should be marked dead; if we can, the
endpoint should be marked alive. So putting pings on the same
connection as the other Raft verbs is important when dealing with
weird situations where some connections are available but others are
not. Observe that in `do_get_rpc_client_idx`, we put the new verb in
the right place.

Another minor reason: we remove the awkward gossiper `echo_pinger`
abstraction which required storing and updating gossiper generation
numbers. This also removes one dependency from Raft service code to
gossiper.

Major reason 1: the gossip echo handler has a weird mechanism where a
replacing node returns errors during the replace operation to some of
the nodes. In Raft however, we want to mark servers as alive when they
are alive, including a server running on a node that's replacing
another node.

Major reason 2, related to the previous one: when server B is
replacing server A with the same IP, the failure detector will try to
ping both servers. Both servers are mapped to the same IP by the
address map, so pings to both servers will reach server B. We want
server B to respond to the pings destined for server B, but not to
pings destined for server A, so the sender can mark B alive but keep A
marked dead.

To do this, we include the destination's Raft ID in our RPCs. The
destination compares the received ID with its own. If it's different,
it returns a `wrong_destination` response, and the failure detector
knows that the ping did not reach the destination (it reached someone
else).

Yet another reason: removes "Not ready to respond gossip echo
message" log spam during replace.

Closes #12107

* github.com:scylladb/scylladb:
  service/raft: specialized verb for failure detector pinger
  db: system_keyspace: de-staticize `{get,set}_raft_server_id`
  service/raft: make this node's Raft ID available early in group registry
2022-12-02 13:54:02 +01:00
Pavel Emelyanov
1d91914166 sstables: Drop set_generation() method
The method became unused since 70e5252a (table: no longer accept online
loading of SSTable files in the main directory) and the whole concept of
reshuffling sstables was dropped later by 7351db7c (Reshape upload files
and reshard+reshape at boot).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12165
2022-12-01 22:17:10 +02:00
Kamil Braun
cbdcc944b5 service/raft: specialized verb for failure detector pinger
We used GOSSIP_ECHO verb to perform failure detection. Now we use
a special verb DIRECT_FD_PING introduced for this purpose.

There are multiple reasons to do so.

One minor reason: we want to use the same connection as other Raft
verbs: if we can't deliver Raft append_entries or vote messages
somewhere, that endpoint should be marked dead; if we can, the
endpoint should be marked alive. So putting pings on the same
connection as the other Raft verbs is important when dealing with
weird situations where some connections are available but others are
not. Observe that in `do_get_rpc_client_idx`, we put the new verb in
the right place.

Another minor reason: we remove the awkward gossiper `echo_pinger`
abstraction which required storing and updating gossiper generation
numbers. This also removes one dependency from Raft service code to
gossiper.

Major reason 1: the gossip echo handler has a weird mechanism where a
replacing node returns errors during the replace operation to some of
the nodes. In Raft however, we want to mark servers as alive when they
are alive, including a server running on a node that's replacing
another node.

Major reason 2, related to the previous one: when server B is
replacing server A with the same IP, the failure detector will try to
ping both servers. Both servers are mapped to the same IP by the
address map, so pings to both servers will reach server B. We want
server B to respond to the pings destined for server B, but not to
pings destined for server A, so the sender can mark B alive but keep A
marked dead.

To do this, we include the destination's Raft ID in our RPCs. The
destination compares the received ID with its own. If it's different,
it returns a `wrong_destination` response, and the failure detector
knows that the ping did not reach the destination (it reached someone
else).

Yet another reason: removes "Not ready to respond gossip echo
message" log spam during replace.
2022-12-01 20:54:18 +01:00
Kamil Braun
99fe580068 service/raft: make this node's Raft ID available early in group registry
Raft ID was loaded or created late in the boot procedure, in
`storage_service::join_token_ring`.

Create it earlier, as soon as it's possible (when `system_keyspace`
is started), pass it to `raft_group_registry::start` and store it inside
`raft_group_registry`.

We will use this Raft ID stored in group registry in following patches.
Also this reduces the number of disk accesses for this node's Raft ID.
It's now loaded from disk once, stored in `raft_group_registry`, then
obtained from there when needed.

This moves `raft_group_registry::start` a bit later in the startup
procedure - after `system_keyspace` is started - but it doesn't make
a difference.
2022-12-01 20:54:18 +01:00
Nadav Har'El
6fcb5302a6 alternator-test: xfail a flaky test exposing a known bug
In a recent commit 757d2a4, we removed the "xfail" mark from the test
test_manual_requests.py::test_too_large_request_content_length
because it started to pass on more modern versions of Python, with a
urllib3 bug fixed.

Unfortunately, the celebration was premature: It turns out that although
the test now *usually* passes, it sometimes fails. This is caused by
a Seastar bug scylladb/seastar#1325, which I opened #12166 to track
in this project. So unfortunately we need to add the "xfail" mark back
to this test.

Note that although the test will now be marked "xfail", it will actually
pass most of the time, so will appear as "xpass" to people run it.
I put a note in the xfail reason string as a reminder why this is
happening.

Fixes #12143
Refs #12166
Refs scylladb/seastar#1325

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12169
2022-12-01 20:00:46 +02:00
Kamil Braun
3cd035d1b9 test/pylib: scylla_cluster: remove ScyllaCluster.decommissioned field
The field was not used for anything. We can keep decommissioned server
in `stopped` field.

In fact it caused us a problem: since recently, we're using
`ScyllaCluster.uninstall` to clean-up servers after test suite finishes
(previously we were using `ScyllaServer.uninstall` directly). But
`ScyllaCluster.uninstall` didn't look into the `decommissioned` field,
so if a server got decommissioned, we wouldn't uninstall it, and it left
us some unnecessary artifacts even for successful tests. This is now
fixed.

Closes #12163
2022-12-01 19:07:26 +02:00
Avi Kivity
a4b77a5691 Merge 'Cleanup sstables::test_env's manager usage' from Pavel Emelyanov
Mainly this PR removes global db::config and feature service that are used by sstables::test_env as dependencies for embedded sstables_manager. Other than that -- drop unused methods, remove nested test_env-s and relax few cases that use two temp dirs at a time for no gain.

Closes #12155

* github.com:scylladb/scylladb:
  test, utils: Use only one tempdir
  sstable_compaction_test: Dont create nested envs
  mutation_reader_test: Remove unused create_sstable() helper
  tests, lib: Move globals onto sstables::test_env
  tests: Use sstables::test_env.db_config() to access config
  features: Mark feature_config_from_db_config const
  sstable_3_x_test: Use env method to create sst
  sstable_3_x_test: Indentation fix after previous patch
  sstable_3_x_test: Use sstable::test_env
  test: Add config to sstable::test_env creation
  config: Add constexpr value for default murmur ignore bits
2022-12-01 17:47:25 +02:00
Pavel Emelyanov
4c6bfc078d code: Use http::re(quest|ply) instead of httpd:: ones
Recent seastar update deprecated those from httpd namespace.

fixes: #12142

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12161
2022-12-01 17:33:35 +02:00
Pavel Emelyanov
adc6ee7ea8 test, utils: Use only one tempdir
There's a do_with_cloned_tmp_directory that makes two temp dirs to toss
sstables between them. Make it go with just one, all the more so it
would resemble existing manipulations aroung staging/ subdir

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:39:57 +03:00
Pavel Emelyanov
15a7b9cafa sstable_compaction_test: Dont create nested envs
The "compact" test case runs in sstables::test_env and additionally
wraps it with another instance provided by do_with_tmp_directory helper.
It's simpler to create the temp dir by hand and use outter env.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:39:56 +03:00
Pavel Emelyanov
69fe5fd054 mutation_reader_test: Remove unused create_sstable() helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:39:54 +03:00
Pavel Emelyanov
400bc2c11d tests, lib: Move globals onto sstables::test_env
There's a bunch of objects that are used by test_env as sstables_manager
dependencies. Now when no other code needs those globals they better sit
on the test_env next to the manager

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:39:36 +03:00