Commit Graph

6059 Commits

Author SHA1 Message Date
Patryk Jędrzejczak
16b0eeb3d6 test: ManagerClient: servers_add: specify consistent-topology-changes assumption
ManagerClient.servers_add can be called only if the cluster uses
consistent topology changes. We add this specification to the
leading comment.
2024-01-02 12:19:31 +01:00
Benny Halevy
5abf556399 gms: endpoint_state: define application_state_map
Have a central definition for the map held
in the endpoint_state (before changing it to
std::unordered_map).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Nadav Har'El
91636f6d21 test/cql-pytest: reproducer of slightly too strict parser of timestamp
Scylla refuses the timestamp format "2014-01-01 12:15:45.0000000Z" that
has 6 digits of precision for the fractional second, and only allows
3 digits of precision. This restriction makes sense - after all CQL
timestamp columns (note - this is NOT "using timestamp"!) only have
millisecond precision. Nevertheless, Cassandra does not have this
restriction and does allow these over-precise timestamps. In this patch
we add a test that demonstrates this difference.

Curiously, in the past Scylla *generated* this forbidden timestamp
format when outputting the timestamp to a string (e.g. toJson()),
which it then couldn't read back! This was issue #16575.
Today Scylla no longer generates this forbidden timestamp format.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16576
2023-12-28 19:01:25 +02:00
Avi Kivity
6394854f04 Merge 'Some cleanups in tests for tablets + MV ' from Nadav Har'El
This small series improves two things in the multi-node tests for tablet supports in materialized views:

1. The test for Alternator LSI, which "sometimes" could reproduce the bug by creating 10-node cluster with a random tablet distribution, is replaced by a reliable 2-node cluster which controls the tablet distribution. The new test also confirms that tablets are actually enabled in Alternator (reviewers of the original test noted it would be easy to pass the test if tablets were accidentally not enabled... :-)).
2. Simplify the tablet lookup code in the test to not go through a "table id", and lookup the table's (or view's) name directly (requires a full-table of the tablets table, but that's entirely reasonable in a test).

The third patch in this series also fixes a comment typo discovered in a previous review.

Closes scylladb/scylladb#16440

* github.com:scylladb/scylladb:
  materialized views: fix typo in comment
  test_mv_tablets: simplify lookup of tablets
  alternator, tablets: improve Alternator LSI tablets test
2023-12-27 20:18:14 +02:00
Avi Kivity
3ce0576a31 Merge 'Sanitize keyspace_metadata creation' from Pavel Emelyanov
The amount of arguments needed to create ks metadata object is pretty large and there are many different ways it can be and it is created over the code. This set simplifies it for the most typical patterns.

closes: #16447
closes: #16449

Closes scylladb/scylladb#16565

* github.com:scylladb/scylladb:
  schema_tables: Use new_keyspace() sugar
  keyspace_metadata: Drop vector-of-schemas argument from new_keyspace()
  keyspace_metadata: Add default value for new_keyspace's durable_writes
  keyspace_metadata: Pack constructors with default arguments
2023-12-27 17:15:04 +02:00
Botond Dénes
1647b29cba tools/schema_loader: add db::config parameter to all load methods
So that a single centrally managed db::config instance can be shared by
all code requiring it, instead of creating local instances where needed.
This is required to load schema from encrypted schema-tables, and it
also helps memory consumption a bit (db::config consumes a lot of
memory).

Fixes: #16480

Closes scylladb/scylladb#16495
2023-12-27 16:28:38 +02:00
Nadav Har'El
e6dc9bca0d Merge 'Profile dumping rest api support' from Eliran Sinvani
This change is motivated by wanting to have code coverage reporting support.
Currently the only way to get a profile dump in ScyllaDB is stopping it with SIGTERM, however, this doesn't
suite all cases, more specifically:
1. In dtest, when some of the tests intentionally abruptly kill a node
2. In test.py, where we would like to distinguish (at least for now), graceful shutdown of ScyllaDB testing and
teardown procedures (which currently kills the nodes).

This mini series adds two changes:
1. It adds the support for profile dumping in ScyllaDB with rest api ('/system/dump_profile')
2. It adds the support for this API in test.py and also adds a call for it as part of the node stop procedure in a permissive way that will not fail the teardown or test if the call doesn't succeed for whatever reason - after this change, all current
test.py suits except for pylib_test (expected) dumps profiles if instrumented and will be able to participate in coverage
reporting.

Refs #16323

Closes scylladb/scylladb#16557

* github.com:scylladb/scylladb:
  test.py: Dump coverage profile before killing a node
  rest api: Add an api for profile dumping
2023-12-27 12:06:39 +02:00
Eliran Sinvani
e49b3ffc89 test.py: Dump coverage profile before killing a node
Up until now the only way to get a coverage profile was to shut down the
ScyllaDB nodes gracefully (using SIGTERM), this means that the coverage
profile was lost for every node that was killed abruptly (SIGKILL).
This in turn would have been requiring us to shut down all nodes
gracefully which is not something we set out to do.
Here we use the rest API for dumping the coverage profile which will
cause the most minimal impact possible on the test runs.
If the dumping fails (due to the node doesn't support the API or due to
a real error in dumping we ignore it as it is not part of the system we
would like to test.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-12-27 07:17:26 +02:00
Nadav Har'El
fc71c34597 Merge 'select statement: verify EXECUTE permissions only for non native functions' from Eliran Sinvani
Commit 62458b8e4f introduced the enforcement of EXECUTE permissions of functions in cql select. However, according to the reference in #12869, the permissions should be enforced only on UDFs and UDAs.
The code does not distinguish between the two so the permissions are also unintenionally enforced also on native function. This commit introduce the distinction and only enforces the permissions on non native functions.

Fixes #16526

Manually verified (before and after change) with the reproducer supplied in #16526 and also with some the `min` and `max` native functions.
Also added test that checks for regression on native functions execution and verified that it fails on authorization before
the fix and passes after the fix.

Closes scylladb/scylladb#16556

* github.com:scylladb/scylladb:
  test.py: Add test for native functions permissions
  select statement: verify EXECUTE permissions only for non native functions
2023-12-26 18:14:21 +02:00
Pavel Emelyanov
a1ad2571fc keyspace_metadata: Drop vector-of-schemas argument from new_keyspace()
It's only testing code that wants to call new_keyspace with existing
schemas, all the other callers either construct the ks metadata
directly, or use convenience new_keyspace with explicitly empty schemas.
By and large it's nicer if new_keyspace() doesn't requires this
argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-26 13:00:44 +03:00
Eliran Sinvani
a336550041 test.py: Add test for native functions permissions
Native functions (non UDF/UDA functions), should be usable even if a
user is not granted EXECUTE permissions on them.

This is a regression test that was added following:
https://github.com/scylladb/scylladb/issues/16526

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-12-26 10:27:04 +02:00
Avi Kivity
3968fc11bf Merge 'cql: fix regression in SELECT * GROUP BY' from Nadav Har'El
This short series fixes a regression from Scylla 5.2 to Scylla 5.4 in "SELECT * GROUP BY" - this query was supposed to return just a single row from each partition (the first one in clustering order), but after the expression rewrite started to wrongly return all rows.

The series also includes a regression test that verifies that this query works doesn't work correctly before this series, but works with this patch - and also works as expected in Scylla 5.2 and in Cassadra.

Fixes #16531.

Closes scylladb/scylladb#16559

* github.com:scylladb/scylladb:
  test/cql-pytest: check that most aggregators don't take "*"
  cql-pytest: add reproducer for GROUP BY regression
  cql: fix regression in SELECT * GROUP BY
2023-12-25 19:53:55 +02:00
Pavel Emelyanov
ac3dd4bf5d test: Coroutinize some secondary_index_test cases
Now they are long then-chains that are hard to read

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16547
2023-12-25 18:08:19 +02:00
Nadav Har'El
55317666c6 test/cql-pytest: check that most aggregators don't take "*"
Although you can "SELECT COUNT(*)", this has special handling in the CQL
parser (it is converted into a special row-counting request) and you can't
give "*" to other aggregators - e.g., "SELECT SUM(*)". This patch includes
a simple test that confirms this.

I wanted to check this in relation to the previous patch, which did,
sort of, a "SELECT $$first$$(*)" - a syntax which this test shows
wouldn't have actually worked if we tried it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-25 17:53:42 +02:00
Nadav Har'El
e2773b4a3a cql-pytest: add reproducer for GROUP BY regression
test/cql-pytest/test_group_by.py has tests that verifies that requests
like

   SELECT p,c1,c2,v FROM tbl WHERE p=0 GROUP BY p

work as expected - the "GROUP BY p" means in this case that we should
only return the first row in the p=0 partition.

As a user discovered, it turns out that the almost identical request:

   SELECT * FROM tbl WHERE p=0 GROUP BY p

Doesn't work the same - before the fix in the previous patch, it
erroneously returned all rows in p=0, not just the first one.
The test in this patch demonstrates this - it fails on Scylla 5.4,
passes on Scylla 5.2 and on Cassandra - and passes when the fix
from the previous patch is used.

This patch includes another tiny test, to check the interaction of GROUP BY
with filtering. This second test passes on Scylla - but I want it in
anyway because it is yet another interaction that might break (the
user that reported #16531 also had filtering, and I was worried it might
have been related).

Refs #16531

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-25 17:53:42 +02:00
Pavel Emelyanov
1d2c871219 test: Add sanity tests for tablets initialization and altering
Check that the initial_tablets appears in system_schema.scylla_keyspaces
if turned on explicitly

Check that it's possible to change initial_tablets with ALTER KEYSPACE

Check that changing r.s. from simple to network-topology doesn't
activate tablets

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:09:01 +03:00
Pavel Emelyanov
c43501d973 locator,schema: Move initial tablets from r.s. options to params
The option is kepd in DDL, but is _not_ stored in
system_schema.keyspaces. Instead, it's removed from the provided options
and kept in scylla_keyspaces table in its own column. All the places
that had optional initial_tablets disengaged now set this value up the
way the find appropriate.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:07:10 +03:00
Pavel Emelyanov
562fcf0c19 locator: Keep optional initial_tablets on r.s. params
Now all the callers have it at hands (spoiler: not yet initialized, but
still) so the params can also have it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:02:41 +03:00
Pavel Emelyanov
a67c535539 keyspace_metadata: Carry optional<initial_tablets> on board
The object in question fully describes the keyspace to be created and,
among other things, contains replication strategy options. Next patches
move the "initial_tablets" option out of those options and keep it
separately, so the ks metadata should also carry this option separately.

This patch is _just_ extending the metadata creation API, in fact the
new field is unused (write-only) so all the places that need to provide
this data keep it disengaged and are explicitly marked with FIXME
comment. Next patches will fix that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:58:05 +03:00
Pavel Emelyanov
a943bd927b locator: Call create_replication_strategy() with r.s. params
Previous patch added params to r.s. classes' constructors, but callers
don't construct those directly, instead they use the create_r.s.()
wrapper. This patch adds params to the wrapper too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:54:59 +03:00
Pavel Emelyanov
f88ba0bf5a locator: Wrap replication_strategy_config_options into replication_strategy_params
When replication strategy class is created caller parr const reference
on the config options which is, in turn, a map<string, string>. In the
future r.s. classes will need to get "scylla specific" info along with
legacy options and this patch prepares for that by passing more generic
params argument into constructor. Currently the only inhabitant of the
new params is the legacy options.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:53:03 +03:00
Nadav Har'El
f9f20e779c test_mv_tablets: simplify lookup of tablets
The tests looked up a table's tablets in an elaborate two-stage search -
first find the table's "id", and then look up this id in the list of
tablets. It is much simpler to just look up the table's name directly
in the list of tablets - although this name is not a key, an ALLOW
FILTERING search is good enough for a test.

As a bonus, with the new technique we don't care if the given name
is the name of a table or a view, further simplifying the test.

This is just a test code cleanup - there is no functional change in
the test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-24 10:12:44 +02:00
Nadav Har'El
cdd5b19f12 alternator, tablets: improve Alternator LSI tablets test
The test test_tablet_alternator_lsi_consistency, checking that Alternator
LSI allow strongly-consistent reads even with tablets, used a large
cluster (10 nodes), to improve the chance of reaching an "unlucky" tablet
placement - and even then only failed in about half the runs without
the code fixed.

In this patch, we rewrite the test using a much more reliable approach:
We start only two nodes, and force the base's tablet onto one node, and
the view's table onto the second node. This ensures with 100% certainty
that the view update is remote, and the new test fails every single time
before the code fix (I reverted the fix to verify) - and passes after it.

The new test is not only more reliable, it's also significantly faster
because it doesn't need to start a 10-node cluster.

We can also remove the tag that excluded this test from debug build
mode tests because the 10-node boot was too slow.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-24 10:11:43 +02:00
Tomasz Grabiec
2590274f95 Merge 'Don't allow ALTER KEYSPACE to change replication strategy vnode/per-table flavor' from Pavel Emelyanov
This switch is currently possible, but results in not supported keyspace state

Closes scylladb/scylladb#16513

* github.com:scylladb/scylladb:
  test: Add a test that switching between vnodes and tablets is banned
  cql3/statements: Don't allow switching between vnode and per-table replication strategies
  cql3/statements: Keep local keyspace variable in alter_keyspace_statement::validate
2023-12-22 17:22:36 +01:00
Kefu Chai
642652efab test/cql-pytest/test_tools.py: test shard-of with a single partition
test_scylla_sstable_shard_of takes lots of time preparing the keys for a
certain shard. with the debug build, it takes 3 minutes to complete the
test.

so in order to test the "shard-of" subcommand in an more efficient way,
in this change, we improve the test in two ways:

1. cache the output of 'scylla types shardof`. so we can avoid the
   overhead of running a seastar application repeatly for the
   same keys.
2. reduce the number of partitions from 42 to 1. as the number of
   partitions in an sstable does not matter when testing the
   output of "shard-of" command of a certain sstable. because,
   the sstable is always generated by a certain shard.

before this change, with pytest-profiling:

```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/3    0.000    0.000  181.950   60.650 runner.py:219(call_and_report)
      4/3    0.000    0.000  181.948   60.649 runner.py:247(call_runtest_hook)
      4/3    0.000    0.000  181.948   60.649 runner.py:318(from_call)
      4/3    0.000    0.000  181.948   60.649 runner.py:262(<lambda>)
    44/11    0.000    0.000  181.935   16.540 _hooks.py:427(__call__)
    43/11    0.000    0.000  181.935   16.540 _manager.py:103(_hookexec)
    43/11    0.000    0.000  181.935   16.540 _callers.py:30(_multicall)
      361    0.001    0.000  181.531    0.503 contextlib.py:141(__exit__)
   782/81    0.001    0.000  177.578    2.192 {built-in method builtins.next}
     1044    0.006    0.000   92.452    0.089 base_events.py:1894(_run_once)
       11    0.000    0.000   91.129    8.284 fixtures.py:686(<lambda>)
    17/11    0.000    0.000   91.129    8.284 fixtures.py:1025(finish)
        4    0.000    0.000   91.128   22.782 fixtures.py:913(_teardown_yield_fixture)
      2/1    0.000    0.000   91.055   91.055 runner.py:111(pytest_runtest_protocol)
      2/1    0.000    0.000   91.055   91.055 runner.py:119(runtestprotocol)
        2    0.000    0.000   91.052   45.526 conftest.py:50(cql)
        2    0.000    0.000   91.040   45.520 util.py:161(cql_session)
        1    0.000    0.000   91.040   91.040 runner.py:180(pytest_runtest_teardown)
        1    0.000    0.000   91.040   91.040 runner.py:509(teardown_exact)
     1945    0.002    0.000   90.722    0.047 events.py:82(_run)
```

after this change:
```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/3    0.000    0.000    8.271    2.757 runner.py:219(call_and_report)
    44/11    0.000    0.000    8.270    0.752 _hooks.py:427(__call__)
    44/11    0.000    0.000    8.270    0.752 _manager.py:103(_hookexec)
    44/11    0.000    0.000    8.270    0.752 _callers.py:30(_multicall)
      4/3    0.000    0.000    8.269    2.756 runner.py:247(call_runtest_hook)
      4/3    0.000    0.000    8.269    2.756 runner.py:318(from_call)
      4/3    0.000    0.000    8.269    2.756 runner.py:262(<lambda>)
       48    0.000    0.000    8.269    0.172 {method 'send' of 'generator' objects}
       27    0.000    0.000    5.671    0.210 contextlib.py:141(__exit__)
       11    0.000    0.000    4.297    0.391 fixtures.py:686(<lambda>)
      2/1    0.000    0.000    4.228    4.228 runner.py:111(pytest_runtest_protocol)
      2/1    0.000    0.000    4.228    4.228 runner.py:119(runtestprotocol)
        2    0.000    0.000    4.213    2.106 capture.py:877(pytest_runtest_teardown)
        1    0.000    0.000    4.213    4.213 runner.py:180(pytest_runtest_teardown)
        1    0.000    0.000    4.213    4.213 runner.py:509(teardown_exact)
        2    0.000    0.000    3.628    1.814 capture.py:872(pytest_runtest_call)
        1    0.000    0.000    3.627    3.627 runner.py:160(pytest_runtest_call)
        1    0.000    0.000    3.627    3.627 python.py:1797(runtest)
   114/81    0.001    0.000    3.505    0.043 {built-in method builtins.next}
       15    0.784    0.052    3.183    0.212 subprocess.py:417(check_output)
```

Fixes #16516
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16523
2023-12-22 15:20:03 +02:00
Tomasz Grabiec
9c7e5f6277 Merge 'Fix secondary index feature with tablets' from Nadav Har'El
Before this series, materialized views already work correctly on keyspaces with tablets, but secondary indexes do not. The goal of these series is make CQL secondary indexes fully supported on tablets:

1. First we need to make CREATE INDEX work with tablets (it didn't before this series). Fixes #16396.
2. Then we need to keep the promise that our documentation makes - that **local** secondary index should be synchronously updated - Fixes #16371.

As you can see in the patches below, and as was expected already in the design phase, the code changes needed to make indexes support tablets were minimal. But writing reliable tests for these issues was the biggest effort that went into this series.

Closes scylladb/scylladb#16436

* github.com:scylladb/scylladb:
  secondary-index, tablets: ensure that LSI are synchronous
  test: add missing "tags" schema extension to cql_test_env
  mv, test: fix delay_before_remote_view_update injection point
  secondary index: fix view creation when using tablets
2023-12-21 23:37:00 +01:00
Botond Dénes
1ce07c6f27 test/cql-pytest: test_select_from_mutation_fragments: bump timeout for test_many_partitions
The test test_many_partitions is very slow, as it tests a slow scan over
a lot of partitions. This was observed to time out on the slower ARM
machines, making the test flaky. To prevent this, create an
extra-patient cql connection with a 10 minutes timeout for the scan
itself.
This is a follow-up to fb9379edf1, which
attempted to fix this, but didn't patch all the places doing slow scans.
This patch fixes the other scan, the one actually observed to time-out
in CI.

Fixes: #16145

Closes scylladb/scylladb#16370
2023-12-21 19:55:06 +02:00
Pavel Emelyanov
a03755d6d7 test: Add a test that switching between vnodes and tablets is banned
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-21 19:57:55 +03:00
Nadav Har'El
a41140f569 Merge 'scylla-sstable: handle attempt to load schema for non-existent tables more gracefully' from Botond Dénes
In other words, print more user-friendly messages, and avoid crashing.
Specifically:
* Don't crash when attempting to load schema tables from configured data-dir, while configuration does not have any configured data-directories.
* Detect the case where schema mutations have no rows for the current table -- the keyspace exists, but the table doesn't.
* Add negative tests for schema-loading.

Fixes: https://github.com/scylladb/scylladb/issues/16459

Closes scylladb/scylladb#16494

* github.com:scylladb/scylladb:
  test/cql-pytest: test_tools.py: add test for failed schema loadig
  tools/scylla-sstable: use at() instead of operator [] when obtaining data dirs
  tools/schema_loader: also check for empty table/column mutations
  tools/schema_loader: log more details when loading schema from schema tables
2023-12-21 15:40:51 +02:00
Nadav Har'El
a613a3cad2 secondary-index, tablets: ensure that LSI are synchronous
CQL Local Secondary Index is a Scylla-only extension to Cassandra's
secondary index API where the index is separate per partition.
Scylla's documentation guarantees that:

  "As of Scylla Open Source 4.0, updates for local secondary indexes are
   performed synchronously. When updates are synchronous, the client
   acknowledges the write operation only after both the base table
   modification and the view up date are written."

This happened automatically with vnodes, because the base table and the
view have the same partition key, so base and view replicas are co-located,
and the view update is always local and therefore done synchronously.

But with tablets, this does NOT happen automatically - the base and view
tablets may be located on different nodes, and the view update may be
remote, and NOT synchronous.

So in this patch we explicitly mark the view as synchronous_update when
building the view for an LSI.

The bigger part of this patch is to add a test which reliably fails
before this patch, and passes after it. The test creates a two-node
cluster and a table with LSI, and pins the base's tablets to one node
and the view's to the second node, forcing the view updates to be
remote. It also uses an injection point to make the view update slower.
The test then writes to the base and immediately tries to use the index
to read. Before this patch, the read doesn't find the new data (contrary
to the guarantee in the documentation). After this patch, the read
does find the new data - because the write waited for the index to
be updated.

Fixes #16371

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Nadav Har'El
7c5092cb8f test: add missing "tags" schema extension to cql_test_env
One of the unfortunate anti-features of cql_test_env (the framework used
in our CQL tests that are written in C++) is that it needs to repeat
various bizarre initializations steps done in main.cc, otherwise various
requests work incorrectly. One of these steps that main.cc is to initialize
various "schema extensions" which some of the Scylla features need to work
correctly.

We remembered to initialize some schema extensions in cql_test_env, but
forgot others. The one I will need in the following patch is the "tags"
extension, which we need to mark materialized views used by local
secondary indexes as "synchronous_updates" - without this patch the LSI
tests in secondary_index_test.cc will crash.

In addition to adding the missing extension, this patch also replaces
the segmentation-fault crash when it's missing (caused by a dynamic
cast failure) by a clearer on_internal_error() - so if we ever have
this bug again, it will be easier to debug.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Nadav Har'El
8181e28731 secondary index: fix view creation when using tablets
In commit 88a5ddabce, we fixed materialized
view creation to support tablets. We added to the function called to
create materialized views in CQL, prepare_new_view_announcement()
a missing call to the on_before_create_column_family() notifier that
creates tablets for this new view.

Unfortunately, We have the same problem when creating a secondary index,
because it does not use prepare_new_view_announcement(), and instead uses
a generic function to "update" the base table, which in some cases ends
up creating new views when a new index is requested. In this path, the
notifier did not get called to the notifier, so we must add it here too.
Unfortunately, the notifiers must run in a Seastar thread, which means
that yet another function now needs to run in a Seastar thread.

Before this patch, creating a secondary index in a table using tablets
fails with "Tablet map not found for table <uuid>". With this patch,
it works.

The patch also includes tests for creating a regular and local secondary
index. Both tests fail (with the aforementioned error) before this
patch, and pass with it.

Fixes #16396

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Raphael S. Carvalho
ee203f846e test: Fix segfault when running offstrategy test
Observer, that references table_for_test, must of course, not
outlive table_for_test. Observer can be called later after the
last input sstable is removed from sstable manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#16428
2023-12-20 19:04:41 +02:00
Raphael S. Carvalho
d1e6dfadea sstables: Harden estimate_droppable_tombstone_ratio() interface
The interface is fragile because the user may incorrectly use the
wrong "gc before". Given that sstable knows how to properly calculate
"gc before", let's do it in estimate__d__t__r(), leaving no room
for mistakes.

sstable_run's variant was also changed to conform to new interface,
allowing ICS to properly estimate droppable ratio, using GC before
that is calculated using each sstable's range. That's important for
upcoming tablets, as we want to query only the range that belongs
to a particular tablet in the repair history table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#15931
2023-12-20 19:04:41 +02:00
Pavel Emelyanov
5866d265c3 Merge ' tools/utils: tool_app_template: handle the case of no args ' from Botond Dénes
Currently, `tool_app_template::run_async()` crashes when invoked with empty argv (with just `argv[0]` populated). This can happen if the tool app is invoked without any further args, e.g. just invoking `scylla nodetool`. The crash happens because unconditional dereferencing of `argv[1]` to get the current operation.

To fix, add an early-exit for this case, just printing a usage message and exiting with exit code 2.

Fixes: #16451

Closes scylladb/scylladb#16456

* github.com:scylladb/scylladb:
  test: add regression tests for invoking tools with no args
  tools/utils: tool_app_template: handle the case of no args
  tools/utils: tool_app_template: remove "scylla-" prefix from app name
2023-12-20 19:04:40 +02:00
Kamil Braun
6fcaec75db Merge 'Add maintenance socket' from Mikołaj Grzebieluch
It enables interaction with the node through CQL protocol without authentication. It gives full-permission access.
The maintenance socket is available by Unix domain socket with file permissions `755`, thus it is not accessible from outside of the node and from other POSIX groups on the node.
It is created before the node joins the cluster.

To set up the maintenance socket, use the `maintenance-socket` option when starting the node.

* If set to `ignore` maintenance socket will not be created.
* If set to `workdir` maintenance socket will be created in `<node's workdir>/cql.m`.
* Otherwise maintenance socket will be created in the specified path.

The default value is `ignore`.

* With python driver

```python
from cassandra.cluster import Cluster
from cassandra.connection import UnixSocketEndPoint
from cassandra.policies import HostFilterPolicy, RoundRobinPolicy

socket = "<node's workdir>/cql.m"
cluster = Cluster([UnixSocketEndPoint(socket)],
                  # Driver tries to connect to other nodes in the cluster, so we need to filter them out.
                  load_balancing_policy=HostFilterPolicy(RoundRobinPolicy(), lambda h: h.address == socket))
session = cluster.connect()
```

Merge note: apparently cqlsh does not support unix domain sockets; it
will have to be fixed in a follow-up.

Closes scylladb/scylladb#16172

* github.com:scylladb/scylladb:
  test.py: add maintenance socket test
  test.py: enable maintenance socket in tests by default
  docs: add maintenance socket documentation
  main: add maintenance socket
  main: refactor initialization of cql controller and auth service
  auth/service: don't create system_auth keyspace when used by maintenance socket
  cql_controller: maintenance socket: fix indentation
  cql_controller: add option to start maintenance socket
  db/config: add maintenance_socket_enabled bool class
  auth: add maintenance_socket_role_manager
  db/config: add maintenance_socket variable
2023-12-20 19:04:40 +02:00
Botond Dénes
5ef0d16eb3 test/cql-pytest: test_tools.py: add test for failed schema loadig 2023-12-20 10:31:03 -05:00
Kamil Braun
ffb6ae917f Merge 'Add support for tablets in Alternator' from Nadav Har'El
The pull requests adds support for tablets in Alternator, and particularly focuses in getting Alternator's GSI and LSI (i.e., materialized views)  to work.

After this series support for tablets in Alternator _mostly_ work, but not completely:
1. CDC doesn't yet work with tablets, and Alternator needs to provide CDC (known as "DynamoDB Streams").
2. Alternator's TTL feature was not tested with tablets, and probably doesn't work because it assumes the replication map belongs to a keyspace.

Because of these reasons, Alternator does not yet use tablets by default and it needs to be enabled explicitly be adding an experimental tag to the new table. This will allow us to test Alternator with tablets even before it is ready for the limelight.

Fixes #16203
Fixes #16313

Closes scylladb/scylladb#16353

* github.com:scylladb/scylladb:
  mv, tablets, alternator: test for Alternator LSI with tablets
  mv: coroutinize wait code for remote view updates
  mv, test: add injection point to delay remove view update
  alternator: explicitly request synchronous updates for LSI
  alternator: fix view creation when using tablets
  alternator: add experimental method to create a table with tablets
2023-12-20 10:00:31 +01:00
Kamil Braun
1f6460972b Merge 'Fix crash on table drop concurrent with streaming ' from Tomasz Grabiec
The observed crash was in the following piece on "cf" access:

        if (*table_is_dropped) {
            sslog.info("[Stream #{}] Skipped streaming the dropped table {}.{}", si->plan_id, si->cf.schema()->ks_name(), si->cf.schema()->cf_name());

Fixes #16181

Also, add a test case which reproduces the problem by doing table drop during tablet migration. But note that the problem is not tablet-specific.

Closes scylladb/scylladb#16341

* github.com:scylladb/scylladb:
  test: tablets: Add test case which tests table drop concurrent with migration
  tests: tablets: Do read barrier in get_tablet_replicas()
  streaming: Keep table by shared ptr to avoid crash on table drop
2023-12-20 09:57:06 +01:00
Kefu Chai
db9e314965 treewide: apply codespell to the comments in source code
for less spelling errors in comment.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16408
2023-12-20 10:25:03 +02:00
Nadav Har'El
2e031f2d8e mv, tablets, alternator: test for Alternator LSI with tablets
This patch adds a test (in the topology test framework) for issue #16313 -
the bug where Alternator LSI must use synchronous view updates but didn't.
This test fails with high probability (around 50%) before the previous patch,
which fixed this bug - and passes consistently after the patch (I ran it
100 times and it didn't fail even once).

This is the first test in the topology framework that uses the DynamoDB
API and not CQL. This required a couple of tiny convenience functions,
which are introduced in the only test file that uses them - but if we
want we can later move them out to a library file.

Unfortunately, the standard AWS SDK for Python - boto3 - is *not*
asynchronous, so this test is also not really asynchronous, and will
block the event loop while making requests to Alternator. However,
for now it doesn't matter (we do NOT run multiple tests in the same
event loop), and if it ever matters, I mentioned a couple of options
what we can do in a comment.

Because this test uses a 10-node cluster, it is skipped in debug-mode
runs. In a later patch we will replace it by a more efficent - and
more reliable - 2-node test.

Refs #16313

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-19 15:41:15 +02:00
Avi Kivity
15acceb69f Merge 'commitlog_test::test_commitlog_reader: handle segment_truncation' from Calle Wilund
Fixes #16312

This test replays a segment before it might be closed or even fully flushed, thus it can (with the new semantics) generate a segment_truncation exception if hitting eof earlier than expected. (Note: test does not use pre-allocated segments).

(First patch makes the test coroutinized to make for a nicer, easier fix change.

Closes scylladb/scylladb#16368

* github.com:scylladb/scylladb:
  commitlog_test::test_commitlog_reader: handle segment_truncation
  commitlog_test: coroutinize test_commitlog_reader
2023-12-19 15:33:38 +02:00
Botond Dénes
6abdced7b9 test: add regression tests for invoking tools with no args
This was recently found to produce a crash. Add a simple regression
test, to make sure future changes don't re-introduce problems with this
rarely used code-path.
2023-12-19 04:08:48 -05:00
Mikołaj Grzebieluch
ef10b497e1 test.py: add maintenance socket test
Test that when connecting to the maintenance socket, the user has superuser permissions,
even if the authentication is enabled on the regular port.
2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
e327478bb5 test.py: enable maintenance socket in tests by default 2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
999be1d14b auth/service: don't create system_auth keyspace when used by maintenance socket
The maintenance socket is created before joining the cluster. When maintenance auth service
is started it creates system_auth keyspace if it's missing. It is not synchronized
with other nodes, because this node hasn't joined the group0 yet. Thus a node has
a mismatched schema and is unable to join the cluster.

The maintenance socket doesn't use role management, thus the problem is solved
by not creating system_auth keyspace when maintenance auth service is created.

The logic of regular CQL port's auth service won't be changed. For the maintenance
socket will be created a new separate auth service.
2023-12-18 17:58:13 +01:00
Tomasz Grabiec
84ea8b32b2 test: tablets: Restart cluster in a graceful manner to avoid connection drop in the middle of request serving
After restarting each node, we should wait for other nodes to notice
the node is UP before restarting the next server. Otherwise, the next
node we restart may not send the shutdown notification to the
previously restarted node, if it still sees it as down when we
initiate its shutdown. In this case, the node will learn about the
restart from gossip later, possible when we already started CQL
requests. When a node learns that some node restarted while it
considers it as UP, it will close connections to that node. This will
fail RPC sent to that node, which will cause CQL request to time-out.

Fixes #14746

Closes scylladb/scylladb#16010
2023-12-18 16:22:02 +01:00
Raphael S. Carvalho
63e4d6c965 test: Enable debug compaction logging for sstable_compaction_test
It will make it easier to understand obscure issues like
https://github.com/scylladb/scylladb/issues/13280.

Refs #13280.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#16426
2023-12-18 16:57:46 +03:00
Kefu Chai
db16048761 test/pylib: avoid using asyncio.get_event_loop()
asyncio.get_event_loop() returns the current event loop. but if there
is not, the result of `get_event_loop_policy().get_event_loop()` is
returned. but this behavior is deprecated since Python 3.12, so let's
use asyncio.run() as recommended by
https://docs.python.org/3/library/asyncio-eventloop.html.
asyncio.run() was introduced by Python 3.7, so we should be able to
use it.

this change should silence the waring when running this script
as a stand-alone script with Python 3.12.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16385
2023-12-18 16:47:31 +03:00
Kamil Braun
3b108f2e31 Merge 'db: config: make consistent_cluster_management mandatory' from Patryk Jędrzejczak
We make `consistent_cluster_management` mandatory in 5.5. This
option will be always unused and assumed to be true.

Additionally, we make `override_decommission` deprecated, as this option
has been supported only with `consistent_cluster_management=false`.

Making `consistent_cluster_management` mandatory also simplifies
the code. Branches that execute only with
`consistent_cluster_management` disabled are removed.

We also update documentation by removing information irrelevant in 5.5.

Fixes scylladb/scylladb#15854

Note about upgrades: this PR does not introduce any more limitations
to the upgrade procedure than there are already. As in
scylladb/scylladb#16254, we can upgrade from the first version of Scylla
that supports the schema commitlog feature, i.e. from 5.1 (or
corresponding Enterprise release) or later. Assuming this PR ends up in
5.5, the documented upgrade support is from 5.4. For corresponding
Enterprise release, it's from 2023.x (based on 5.2), so all requirements
are met.

Closes scylladb/scylladb#16334

* github.com:scylladb/scylladb:
  docs: update after making consistent_cluster_management mandatory
  system_keyspace, main, cql_test_env: fix indendations
  db: config: make consistent_cluster_management mandatory
  test: boost: schema_change_test: replace disable_raft_schema_config
  db: config: make override_decommission deprecated
  db: config: make force_schema_commit_log deprecated
2023-12-18 09:44:52 +01:00