Commit Graph

6991 Commits

Author SHA1 Message Date
Andrei Chekun
293cf355df [test.py] Fix log for failed node was nod added to failed directory
If something happens during nod adding to the cluster, it will not be registered as a part of the cluster. This leads to situations during log gathering that logs for a such node will be missing.
2024-06-17 11:16:55 +02:00
Andrei Chekun
7bbb8d9260 [test.py] Fix URl for failed logs directory in CI
Incorrect passing of the artifacts_dir_url parameter from test.py to pytest leads to the situation when it will pass None as a string and pytest will generate incorrect URL.
2024-06-17 11:16:48 +02:00
Kamil Braun
bbb424a757 Merge '[test.py] Add uniqueness to the test name' from Andrei Chekun
In CI test always executed with option --repeat=3 that leads to generate 3 test results with the same name. Junit plugin in CI cannot distinguish correctly the difference between these results. In case when we have two passes and one fail, the link to test result will sometimes be redirected to the incorrect one because the test name is the same. To fix this ReportPlugin added that will be responsible to modify the test case name during junit report generation adding to the test name mode and run id.

Fixes: https://github.com/scylladb/scylladb/issues/17851

Fixes: https://github.com/scylladb/scylladb/issues/15973

Closes scylladb/scylladb#19235

* github.com:scylladb/scylladb:
  [test.py] Add uniqueness to the test name
  [test.py] Refactor alternator, nodetool, rest_api
2024-06-14 17:59:07 +02:00
Kefu Chai
d498ca3afa test: randomized_nemesis_test: use BOOST_REQUIRE_* when appropriate
for better debuggability.

Refs #17030
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19282
2024-06-14 15:33:07 +03:00
Botond Dénes
b2ebc172d0 Merge 'Fix usage of utils/chunked_vector::reserve_partial' from Lakshmi Narayanan Sreethar
utils/chunked_vector::reserve_partial: fix usage in callers

The method reserve_partial(), when used as documented, quits before the
intended capacity can be reserved fully. This can lead to overallocation
of memory in the last chunk when data is inserted to the chunked vector.
The method itself doesn't have any bug but the way it is being used by
the callers needs to be updated to get the desired behaviour.

Instead of calling it repeatedly with the value returned from the
previous call until it returns zero, it should be repeatedly called with
the intended size until the vector's capacity reaches that size.

This PR updates the method comment and all the callers to use the
right way.

Fixes #19254

Closes scylladb/scylladb#19279

* github.com:scylladb/scylladb:
  utils/large_bitset: remove unused includes identified by clangd
  utils/large_bitset: use thread::maybe_yield()
  test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial
  utils/lsa/chunked_managed_vector: fix reserve_partial()
  utils/chunked_vector: return void from reserve_partial and make_room
  test/boost/chunked_vector_test: fix testcase tests_reserve_partial
  utils/chunked_vector::reserve_partial: fix usage in callers
2024-06-14 15:31:00 +03:00
Kamil Braun
982fa31250 Merge 'test: servers_add: fix the expected_error parameter' from Patryk Jędrzejczak
This PR fixes two problems with the `expected_error`
parameter in `server_add` and `servers_add`.
1. It didn't work in `server_add` if the cluster was empty
because of an incorrect attempt to connect the driver.
2. It didn't work in `servers_add` completely because the
`seeds` parameter was handled incorrectly.

This PR only adds improvements in the testing framework,
no need to backport it.

Closes scylladb/scylladb#19255

* github.com:scylladb/scylladb:
  test: manager_client, scylla_cluster: fix type annotations in add_servers
  test: manager_client: don't connect driver after failed server_{add, start}
  test: scylla_cluster: pass seeds to add_servers
2024-06-14 11:33:21 +02:00
Andrei Chekun
8d1d206aff [test.py] Add uniqueness to the test name
In CI test always executed with option --repeat=3 that leads to generate 3 test results with the same name. Junit plugin in CI cannot distinguish correctly the difference between these results. In case when we have two passes and one fail, the link to test result will sometimes be redirected to the incorrect one because the test name is the same.
To fix this ReportPlugin added that will be responsible to modify the test case name during junit report generation adding to the test name mode and run id.

Fixes: https://github.com/scylladb/scylladb/issues/17851

Fixes: https://github.com/scylladb/scylladb/issues/15973
2024-06-14 11:23:04 +02:00
Wojciech Mitros
9bae1814ab test: add test for failed view building write
For various reasons, a view building write may fail. When that
happens, the view building should not finish until these writes
are successfully retried and they should not interfere with any
writes that are performed to the base table while the view is
building.

The test introduced in this patch confirms that this is the case.

Refs scylladb/scylladb#19261

Closes scylladb/scylladb#19263
2024-06-14 10:38:21 +02:00
Lakshmi Narayanan Sreethar
310c5da4bb test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial
Update the maximum size tested by the testcase. The test always created
only one chunk as the maximum size tested by it (1 << 12 = 4KB) was less
than the default max chunk size (12.8 KB). So, use twice the
max_chunk_capacity as the test size distribution upper limit to verify
that partial_reserve can reserve multiple chunks.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar
d4f8b91bd6 utils/lsa/chunked_managed_vector: fix reserve_partial()
Fix the method comment and return types of chunked_managed_vector's
reserve_partial() similar to chunked_vector's reserve_partial() as it
has the same issues mentioned in #19254. Also update the usage in the
chunked_managed_vector_test.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar
29f036a777 test/boost/chunked_vector_test: fix testcase tests_reserve_partial
Fix the usage of reserve_partial in the testcase. Also update the
maximum chunk size used by the testcase. The test always created only
one chunk as the maximum size tested by it (1 << 12 = 4KB) was less
than the default max chunk size (128 KB). So, use smaller chunk size,
512 bytes, to verify that partial_reserve can reserve multiple chunks.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-14 13:43:07 +05:30
Kefu Chai
df094061e3 test: randomized_nemesis_test: define static variable
before this change, when linking randomized_nemesis_test with ld.lld:

```
[4/4] Linking CXX executable test/raft/RelWithDebInfo/randomized_nemesis_test
FAILED: test/raft/RelWithDebInfo/randomized_nemesis_test
: && /home/kefu/.local/bin/clang++ -ffunction-sections -fdata-sections -O3 -g -gz -Xlinker --build-id=sha1 --ld-path=ld.lld -dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 -Xlinker --gc-sections test/raft/CMakeFiles/test-raft-helper.dir/RelWithDebInfo/helpers.cc.o test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o -o test/raft/RelWithDebInfo/randomized_nemesis_test -L/home/kefu/dev/scylladb/idl/absl::headers -Wl,-rpath,/home/kefu/dev/scylladb/idl/absl::headers  test/lib/RelWithDebInfo/libtest-lib.a  seastar/RelWithDebInfo/libseastar.a  /usr/lib64/libxxhash.so  seastar/RelWithDebInfo/libseastar_testing.a  test/lib/RelWithDebInfo/libtest-lib.a  -Xlinker --push-state -Xlinker --whole-archive  auth/RelWithDebInfo/libscylla_auth.a  -Xlinker --pop-state  /usr/lib64/libcrypt.so  cdc/RelWithDebInfo/libcdc.a  compaction/RelWithDebInfo/libcompaction.a  mutation_writer/RelWithDebInfo/libmutation_writer.a  -Xlinker --push-state -Xlinker --whole-archive  dht/RelWithDebInfo/libscylla_dht.a  -Xlinker --pop-state  types/RelWithDebInfo/libtypes.a  index/RelWithDebInfo/libindex.a  -Xlinker --push-state -Xlinker --whole-archive  locator/RelWithDebInfo/libscylla_locator.a  -Xlinker --pop-state  message/RelWithDebInfo/libmessage.a  gms/RelWithDebInfo/libgms.a  sstables/RelWithDebInfo/libsstables.a  readers/RelWithDebInfo/libreaders.a  schema/RelWithDebInfo/libschema.a  -Xlinker --push-state -Xlinker --whole-archive  tracing/RelWithDebInfo/libscylla_tracing.a  -Xlinker --pop-state  RelWithDebInfo/libscylla-main.a  abseil/absl/strings/RelWithDebInfo/libabsl_cord.a  abseil/absl/strings/RelWithDebInfo/libabsl_cordz_info.a  abseil/absl/strings/RelWithDebInfo/libabsl_cord_internal.a  abseil/absl/strings/RelWithDebInfo/libabsl_cordz_functions.a  abseil/absl/strings/RelWithDebInfo/libabsl_cordz_handle.a  abseil/absl/crc/RelWithDebInfo/libabsl_crc_cord_state.a  abseil/absl/crc/RelWithDebInfo/libabsl_crc32c.a  abseil/absl/crc/RelWithDebInfo/libabsl_crc_internal.a  abseil/absl/crc/RelWithDebInfo/libabsl_crc_cpu_detect.a  abseil/absl/strings/RelWithDebInfo/libabsl_str_format_internal.a  /usr/lib64/libz.so  service/RelWithDebInfo/libservice.a  node_ops/RelWithDebInfo/libnode_ops.a  service/RelWithDebInfo/libservice.a  node_ops/RelWithDebInfo/libnode_ops.a  -lsystemd  raft/RelWithDebInfo/libraft.a  repair/RelWithDebInfo/librepair.a  streaming/RelWithDebInfo/libstreaming.a  replica/RelWithDebInfo/libreplica.a  db/RelWithDebInfo/libdb.a  mutation/RelWithDebInfo/libmutation.a  data_dictionary/RelWithDebInfo/libdata_dictionary.a  cql3/RelWithDebInfo/libcql3.a  transport/RelWithDebInfo/libtransport.a  cql3/RelWithDebInfo/libcql3.a  transport/RelWithDebInfo/libtransport.a  lang/RelWithDebInfo/liblang.a  /usr/lib64/liblua-5.4.so  -lm  /usr/lib64/libsnappy.so.1.1.10  abseil/absl/container/RelWithDebInfo/libabsl_raw_hash_set.a  abseil/absl/hash/RelWithDebInfo/libabsl_hash.a  abseil/absl/hash/RelWithDebInfo/libabsl_city.a  abseil/absl/types/RelWithDebInfo/libabsl_bad_variant_access.a  abseil/absl/hash/RelWithDebInfo/libabsl_low_level_hash.a  abseil/absl/types/RelWithDebInfo/libabsl_bad_optional_access.a  abseil/absl/container/RelWithDebInfo/libabsl_hashtablez_sampler.a  abseil/absl/profiling/RelWithDebInfo/libabsl_exponential_biased.a  abseil/absl/synchronization/RelWithDebInfo/libabsl_synchronization.a  abseil/absl/debugging/RelWithDebInfo/libabsl_stacktrace.a  abseil/absl/synchronization/RelWithDebInfo/libabsl_graphcycles_internal.a  abseil/absl/synchronization/RelWithDebInfo/libabsl_kernel_timeout_internal.a  abseil/absl/debugging/RelWithDebInfo/libabsl_symbolize.a  abseil/absl/debugging/RelWithDebInfo/libabsl_debugging_internal.a  abseil/absl/base/RelWithDebInfo/libabsl_malloc_internal.a  abseil/absl/debugging/RelWithDebInfo/libabsl_demangle_internal.a  abseil/absl/time/RelWithDebInfo/libabsl_time.a  abseil/absl/strings/RelWithDebInfo/libabsl_strings.a  abseil/absl/strings/RelWithDebInfo/libabsl_strings_internal.a  abseil/absl/strings/RelWithDebInfo/libabsl_string_view.a  abseil/absl/base/RelWithDebInfo/libabsl_throw_delegate.a  abseil/absl/numeric/RelWithDebInfo/libabsl_int128.a  abseil/absl/base/RelWithDebInfo/libabsl_base.a  abseil/absl/base/RelWithDebInfo/libabsl_raw_logging_internal.a  abseil/absl/base/RelWithDebInfo/libabsl_log_severity.a  abseil/absl/base/RelWithDebInfo/libabsl_spinlock_wait.a  -lrt  abseil/absl/time/RelWithDebInfo/libabsl_civil_time.a  abseil/absl/time/RelWithDebInfo/libabsl_time_zone.a  rust/RelWithDebInfo/libwasmtime_bindings.a  rust/librust_combined.a  /usr/lib64/libdeflate.so  utils/RelWithDebInfo/libutils.a  /usr/lib64/libxxhash.so  /usr/lib64/libcryptopp.so  /usr/lib64/libboost_regex.so.1.83.0  /usr/lib64/libicui18n.so  /usr/lib64/libicuuc.so  /usr/lib64/libboost_unit_test_framework.so.1.83.0  seastar/RelWithDebInfo/libseastar_testing.a  seastar/RelWithDebInfo/libseastar.a  /usr/lib64/libboost_program_options.so  /usr/lib64/libboost_thread.so  /usr/lib64/libboost_chrono.so  /usr/lib64/libboost_atomic.so  /usr/lib64/libcares.so  /usr/lib64/libfmt.so.10.2.1  /usr/lib64/liblz4.so  -ldl  /usr/lib64/libgnutls.so  -latomic  /usr/lib64/libsctp.so  /usr/lib64/libprotobuf.so  /usr/lib64/libyaml-cpp.so  /usr/lib64/libhwloc.so  //usr/lib64/liburing.so  /usr/lib64/libnuma.so  /usr/lib64/libboost_unit_test_framework.so && :
ld.lld: error: undefined symbol: append_seq::magic
>>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92)
>>>               test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(__cxx_global_var_init.38)
>>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92)
>>>               test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(__cxx_global_var_init.38)
>>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92)
>>>               test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(append_seq::append(int) const)
>>> referenced 5 more times
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```

it turns out `append_seq::magic` is only declared, but never defined.
please note, the non-inline static member variable in its class
definition is not considered as a definition, see
[class.static.data](https://eel.is/c++draft/class.static.data#3)

> The declaration of a non-inline static data member in its class
> definition is not a definition and may be of an incomplete type
> other than cv void.

so, let's declare it as a `constexpr` instead. it implies `inline`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19283
2024-06-14 10:00:21 +03:00
Botond Dénes
bf429695b6 Merge 'test_tablets: add test_tablet_storage_freeing' from Michał Chojnowski
Before work on tablets was completed, it was noticed that — due to some missing pieces of implementation — Scylla doesn't properly close sstables for migrated-away tablets. Because of this, disk space wasn't being reclaimed properly.

Since the missing pieces of implementation were added, the problem should be gone now. This patch adds a test which was used to reproduce the problem earlier. It's expected to pass now, validating that the issue was fixed.

Should be backported to branch-6.0, because the tested problem was also affecting that branch.

Fixes #16946

Closes scylladb/scylladb#18906

* github.com:scylladb/scylladb:
  test_tablets: add test_tablet_storage_freeing
  test: pylib: add get_sstables_disk_usage()
2024-06-14 08:08:54 +03:00
Benny Halevy
fb3db7d81f perf-simple-query: add cpu_cycles / op metric
Example output:
```
bhalevy@[] scylla$ build/release/scylla perf-simple-query --default-log-level=error -c 1 --duration 10
random-seed=4058714023
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
86912.75 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42346 insns/op,   22811 cycles/op,        0 errors)
91348.29 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42306 insns/op,   22362 cycles/op,        0 errors)
87965.84 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42338 insns/op,   22966 cycles/op,        0 errors)
90793.67 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42351 insns/op,   22783 cycles/op,        0 errors)
90104.27 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42358 insns/op,   22875 cycles/op,        0 errors)
90397.13 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42355 insns/op,   22735 cycles/op,        0 errors)
89142.39 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42363 insns/op,   22996 cycles/op,        0 errors)
90410.40 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42363 insns/op,   22725 cycles/op,        0 errors)
88173.10 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42366 insns/op,   23160 cycles/op,        0 errors)
88416.51 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42379 insns/op,   23102 cycles/op,        0 errors)

median 90104.26849997675
median absolute deviation: 1244.02
maximum: 91348.29
minimum: 86912.75
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18818
2024-06-14 07:42:09 +03:00
Andrei Chekun
93b9b85c12 [test.py] Refactor alternator, nodetool, rest_api
Make alternator, nodetool and rest_api test directories as python packages.
Move scylla-gdb to scylla_gdb and make it python package.
2024-06-13 13:56:10 +02:00
Avi Kivity
f1819419cc Merge 'scylla-sstable: add method to load the schema from the sstable itself' from Botond Dénes
As it turns out, each sstable carries its own schema in its serialization header (Statistics component). This schema is incomplete -- the names of the key columns are not stored, just their type. Static and regular columns do have names and types stored however. This bare-bones schema is enough to parse and display the content of the sstable. Another thing missing is schema options (the stuff after the `WITH` keyword, except the clustering order). The only options stored are the compression options (in the CompressionInfo component), this is actually needed to read the Data component.

This series adds a new method to `tools/schema_loader.cc` to extract the schema stored in the sstable itself. This new schema load method is used as the last fall-back for obtaining the schema, in case scylla-sstable is trying to autodetect the schema of the sstable. Although, right now this bare-bones schema is enough for everything scylla-sstable does, it is more future proof to stick to the "full" schema if possible, so this new method is the last resort for now.

Fixes: https://github.com/scylladb/scylladb/issues/17869
Fixes: https://github.com/scylladb/scylladb/issues/18809

New functionality, no backport needed.

Closes scylladb/scylladb#19169

* github.com:scylladb/scylladb:
  tools/scylla-sstable: log loaded schema with trace level
  tools/scylla-sstable: load schema from the sstable as fallback
  tools/schema_loader: introduce load_schema_from_sstable()
  test/lib/random_schema: remove assert on min number of regular columns
  sstables: introduce load_metadata()
2024-06-13 12:21:09 +03:00
Nadav Har'El
44ea1993ba test/cql-pytest: tests CREATE/DROP INDEX during paged query
This patch includes extensive testing for what happens to an ongoing
paged query when a secondary index is suddenly added or dropped.
Issue #18992 was opened suggesting that this would be broken, and indeed
the tests included here show that it is indeed broken.

The four tests included in this patch are heavily commented to explain
what they are testing and why, but here is a short summary of what is
being tested by each of them:

1. A paged query filtering on v=17 continues correctly even if an
   index is created on v.

2. A paged query filtering on v1 and v2 where v2 is indexed,
   continues correctly even if an index is created on v1 (remember
   that Scylla prefers to use the first index mentioned in the query).

3. A paged query using an index on v continues correctly even if that
   index is deleted.

4. However, if the query doesn't say "ALLOW FILTERING", it cannot
   be continued after the index is deleted.

All these tests pass on Cassandra, but all of them except the fourth
fail on Scylla, reproducing issue #18992. Somewhat to my suprise, the
failure of the query in all the failed tests is silent (i.e., trying to
fetch the next page just fetches nothing and says the iteration is done).
I was expecting more dramatic failures ("marshaling error" messages,
crashes, etc.) but didn't get them.

Refs #18992

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19000
2024-06-13 08:39:22 +03:00
Botond Dénes
43c44f0af5 tools/scylla-sstable: load schema from the sstable as fallback
When auto-detecting the schema of the sstable, if all other methods
failed, load the schema from the sstable's serialization header. This
schema is incomplete. It is just enough to parse and display the content
of the sstable. Although parsing and displaying the content of the
sstable is all scylla-sstable does, it is more future-compatible to us
the full schema when possible. So the always-available but minimal
schema that each sstable has on itself, is used just as a fallback.

The test which tested the case when all schema load attempts fail,
doesn't work now, because loading the serialization header always
succeeds. So convert this test into two positive tests, testing the
serialization header schema fallback instead.
2024-06-13 01:32:17 -04:00
Botond Dénes
8f2ba03465 tools/schema_loader: introduce load_schema_from_sstable()
Allows loading the schema from an sstable's serialization header. This
schema is incomplete, but it is enough to parse and display the content
of the sstable.
2024-06-13 01:32:17 -04:00
Botond Dénes
0d7335dd27 test/lib/random_schema: remove assert on min number of regular columns
It is legal for a schema to have 0 regular columns, so remove the assert
on the schema specification's regular column count.
2024-06-13 01:32:17 -04:00
Piotr Dulikowski
0b5a0c969a Merge 'hinted handoff: migrate sync point to host ID' from Michael Litvak
Change the format of sync points to use host ID instead of IPs, to be consistent with the use of host IDs in hinted handoff module.
Introduce sync point v3 format which is the same as v2 except it stores host IDs instead of IPs.
The decoding supports both formats with host IDs and IPs, so a sync point contains now a variant of either types, and in the case of new type the translation is avoided.

Fixes #18653

Closes scylladb/scylladb#19134

* github.com:scylladb/scylladb:
  db/hints: migrate sync point to host ID
  db/hints: rename sync point structures with _v1 suffix to _v1_v2
2024-06-13 06:16:00 +02:00
Patryk Jędrzejczak
a7ab9a015a test: manager_client, scylla_cluster: fix type annotations in add_servers 2024-06-12 16:51:20 +02:00
Patryk Jędrzejczak
1eb25d22c6 test: manager_client: don't connect driver after failed server_{add, start}
If adding or starting a server fails expectedly, there is no reason
to update or connect the driver. Moreover, before this patch, we
couldn't use `server_add` and `servers_add` with `expected_error`
if the cluster was empty. After expected bootstrap failures, we
tried to connect the driver, which rightfully failed on
`assert len(hosts) > 0` in `cluster_con`.
2024-06-12 16:51:20 +02:00
Patryk Jędrzejczak
8f486de8d3 test: scylla_cluster: pass seeds to add_servers
This parameter was incorrectly missing. For this reason,
`expected_error` was passed from `add_servers` to `add_server` as
`seeds`, which caused strange crashes.
2024-06-12 16:51:19 +02:00
Kefu Chai
223fba3243 test: memtable_test: increase unspooled_dirty_soft_limit
before this change, when performing memtable_test, we expect that
the memtables of ks.cf is the only memtables being flushed. and
we inject 4 failures in the code path of flush, and wait until 4
of them are triggered. but in the background, `dirty_memory_manager`
performs flush on all tables when necessary. so, the total number of
failures is not necessary the total number of failures triggered
when flushing ks.cf, some of them could be triggered when flushing
system tables. that's why we have sporadict test failures from
this test. as we might check `t.min_memtable_timestamp()` too soon.

after this change, we increase `unspooled_dirty_soft_limit` setting,
in order to disable `dirty_memory_manager`, so that the only flush
is performed by the test.

Fixes #19034
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-12 19:17:27 +08:00
Kefu Chai
2df4e9cfc2 test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE
before this change, we verify the behavior of design under test using
`BOOST_ASSERT()`, which is a wrapper around `assert()`, so if a test
fails, the test just aborts. this is not very helpful for postmortem
debugging.

after this change, we use `BOOST_REQUIRE` macro for verifying the
behavior, so that Boost.Test prints out the condition if it does not
hold when we test it.

Refs #19034
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-12 19:17:27 +08:00
Kefu Chai
4175e02d9d clustering_bounds_comparator: drop operator<< for bound_kind
turns out operator<< for bound_kind is not used anymore, so let's
drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19159
2024-06-11 18:01:06 +02:00
Avi Kivity
6608f49718 Merge 'make enable_compacting_data_for_streaming_and_repair truly live-update' from Botond Dénes
This config item is propagated to the table object via table::config. Although the field in `table::config`, used to propagate the value, was `utils::updateable_value<T>`, it was assigned a constant and so the live-update chain was broken.
This series fixes this and adds a test which fails before the patch and passes after. The test needed new test infrastructure, around the failure injection api, namely the ability to exfiltrate the value of internal variable. This infrastructure is also added in this series.

Fixes: https://github.com/scylladb/scylladb/issues/18674

- [x] This patch has to be backported because it fixes broken functionality

Closes scylladb/scylladb#18705

* github.com:scylladb/scylladb:
  test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update
  test/pylib: rest_client: add get_injection()
  api/error_injection: add getter for error_injection
  utils/error_injection: add set_parameter()
  replica/database: fix live-update enable_compacting_data_for_streaming_and_repair
2024-06-11 15:53:19 +03:00
Michael Litvak
afc9a1a8a6 db/hints: migrate sync point to host ID
Change the format of sync points to use host ID instead of IPs, to be
consistent with the use of host IDs in hinted handoff module.
Introduce sync point v3 format which is the same as v2 except it stores
host IDs instead of IPs.
The encoding of sync points now always uses the new v3 format with host
IDs.
The decoding supports both formats with host IDs and IPs, so a sync point
contains now a variant of either types, and in the case of the new
format the translation from IP to host ID is avoided.
2024-06-11 11:07:00 +02:00
Botond Dénes
8ef4fbdb87 test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update
Avoid this the live-update feature of this config item breaking
silently.
2024-06-11 04:17:48 -04:00
Botond Dénes
0c61b1822c test/pylib: rest_client: add get_injection()
The /v2/error_injection/{injection} endpoint now has a GET method too,
expose this.
2024-06-11 04:17:48 -04:00
Pavel Emelyanov
1b9cedb3f3 test: Reduce failure detector timeout for failed tablets migration test
Most of the time this test spends waiting for a node to die. Helps 3x times

Was
  real	9m21,950s
  user	1m11,439s
  sys	1m26,022s

Now
  real	3m37,780s
  user	0m58,439s
  sys	1m13,698s

refs: #17764

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19222
2024-06-11 09:55:06 +02:00
Raphael S. Carvalho
7b41630299 replica: Refresh mutation source when allocating tablet replicas
Consider the following:

1) table A has N tablets and views
2) migration starts for a tablet of A from node 1 to 2.
3) migration is at write_both_read_old stage
4) coordinator will push writes to both nodes (pending and leaving)
5) A has view, so writes to it will also result in reads (table::push_view_replica_updates())
6) tablet's update_effective_replication_map() is not refreshing tablet sstable set (for new tablet migrating in)
7) so read on step 5 is not being able to find sstable set for tablet migrating in

Causes the following error:
"tablets - SSTable set wasn't found for tablet 21 of table mview.users"

which means loss of write on pending replica.

The fix will refresh the table's sstable set (tablet_sstable_set) and cache's snapshot.
It's not a problem to refresh the cache snapshot as long as the logical
state of the data hasn't changed, which is true when allocating new
tablet replicas. That's also done in the context of compactions for example.

Fixes #19052.
Fixes #19033.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#19099
2024-06-11 06:59:04 +03:00
Calle Wilund
51c53d8db6 main/minio_server.py: Respect any preexisting AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY vars
Fixes scylladb/scylla-pkg#3845

Don't overwrite (or rather change) AWS credentials variables if already set in
enclosing environment. Ensures EAR tests for AWS KMS can run properly in CI.

v2:
* Allow environment variables in reading obj storage config - allows CI to
  use real credentials in env without risking putting them info less seure
  files
* Don't write credentials info from miniserver into config, instead use said
  environment vars to propagate creds.

v3:
* Fix python launch scripts to not clear environment, thus retaining above aws envs.

Closes scylladb/scylladb#19086
2024-06-11 06:59:04 +03:00
Nadav Har'El
73dfa4143a cql-pytest: translate Cassandra's tests for SELECT DISTINCT
This is a translation of Cassandra's CQL unit test source file
DistinctQueryPagingTest.java into our cql-pytest framework.

The 5 tests did not reproduce any previously-unknown bug, but did provide
additional reproducers for one already-known issue:

Refs #10354: SELECT DISTINCT should allow filter on static columns,
             not just partition keys

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18971
2024-06-11 06:59:04 +03:00
Michał Chojnowski
823da140dd test_tablets: add test_tablet_storage_freeing
Tests that tablet storage is freed after it is migrated away.

Fixes #16946
2024-06-10 14:25:37 +02:00
Michał Chojnowski
7741491b47 test: pylib: add get_sstables_disk_usage()
Adds an util for measuring the disk usage of the given table on the given
node.
Will be used in a follow-up patch for testing that sstables are freed
properly.
2024-06-10 14:25:37 +02:00
Botond Dénes
7b2aad56c4 test/boost/sstable_datafile_test: remove unused semaphores
The tests use the ones from test_env, the explicitely created ones are
unused.

Closes scylladb/scylladb#19167
2024-06-09 20:43:59 +03:00
Tomasz Grabiec
c8f71f4825 test: tablets: Fix flakiness of test_removenode_with_ignored_node due to read timeout
The check query may be executed on a node which doesn't yet see that
the downed server is down, as it is not shut down gracefully. The
query coordinator can choose the down node as a CL=1 replica for read
and time out.

To fix, wait for all nodes to notice the node is down before executing
the checking query.

Fixes #17938

Closes scylladb/scylladb#19137
2024-06-09 19:39:57 +03:00
Avi Kivity
7b301f0cb9 Merge 'Encapsulate wasm and lua management in lang::manager service' from Pavel Emelyanov
After wasm udf appeared, code in main, create_function_statement and schema_tables got some involvements into details of wasm engine management. Also, even prior to this, there was duplication in how function context is created by statement code and schema_tables code.

This PR generalizes function context creation and encapsulates the management in sharded<lang::manager> service. Also it removes the wasm::startup_context thing and makes wasm start/stop be "classical" (see #2737)

Closes scylladb/scylladb#19166

* github.com:scylladb/scylladb:
  code: Enlighten wasm headers usage
  lang: Unfriend wasm context from manager
  lang, cql3, schema_tables: Don't mess with db::config
  lang: Don't use db::config to create lua context
  lang: Don't use db::config to create wasm context
  lang: Drop manager::precompile() method
  cql3, schema_tables: Generalize function creation
  wasm: Replace startup_context with wasm_config
  lang: Add manager::start() method
  lang: Move manager to lang namespace
  lang: Move wasm::manager to its .cc/.hh files
2024-06-09 19:32:26 +03:00
Avi Kivity
b2a500a9a1 Merge 'alternator: keep TTL work in the maintenance scheduling group' from Botond Dénes
Alternator has a custom TTL implementation. This is based on a loop, which scans existing rows in the table, then decides whether each row have reached its end-of-life and deletes it if it did. This work is done in the background, and therefore it uses the maintenance (streaming) scheduling group. However, it was observed that part of this work leaks into the statement scheduling group, competing with user workloads, negatively affecting its latencies. This was found to be causes by the reads and writes done on behalf of the alternator TTL, which looses its maintenance scheduling group when these have to go to a remote node. This is because the messaging service was not configured to recognize the streaming scheduling group, when statement verbs like read or writes are invoked. The messaging service currently recognizes two statement "tenants": the user tenant (statement scheduling group) and system (default scheduling group), as we used to have only user-initiated operations and sytsem (internal) ones. With alternator TTL, there is now a need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group).
This series adds a streaming tenant to the messaging service configuration and it adds a test which confirms that with this change, alternator TTL is entirely contained in the maintenance scheduling group.

Fixes: #18719

- [x] Scans executed on behalf of alternator TTL are running in the statement group, disturbing user-workloads, this PR has to be backported to fix this.

Closes scylladb/scylladb#18729

* github.com:scylladb/scylladb:
  alternator, scheduler: test reproducing RPC scheduling group bug
  main: add maintenance tenant to messaging_service's scheduling config
2024-06-09 19:20:18 +03:00
Nadav Har'El
13cf6c543d test/alternator: fix flaky test test_item_latency
The Alternator test test_metrics.py::test_item_latency confirms that
for several operation types (PutItem, GetItem, DeleteItem, UpdateItem)
we did not forget to measure their latencies.

The test checked that a latency was updated by checking that two metrics
increases:
    scylla_alternator_op_latency_count
    scylla_alternator_op_latency_sum

However, it turns out that the "sum" is only an approximate sum of all
latencies, and when the total sum grows large it sometimes does *not*
increase when a short latency is added to the statistics. When this
happens, this test fails on the assertion that the "sum" increases after
an operation. We saw this happening sometimes in CI runs.

The simple fix is to stop checking _sum at all, and only verify that
the _count increases - this is really an integer counter that
unconditionally increases when a latency is added to the histogram.

Don't worry that the strength of this test is reduced - this test was
never meant to check the accuracy or correctness of the histograms -
we should have different (and better) tests for that, unrelated to
Alternator. The purpose of *this* test is only to verify that for some
specific operation like PutItem, Alternator didn't forget to measure its
latency and update the histogram. We want to avoid a bug like we had
in counters in the past (#9406).

Fixes #18847.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19080
2024-06-09 19:19:09 +03:00
Kefu Chai
f4706be8a8 test: test_topology_ops: adapt to tablets
in e7d4e080, we reenabled the background writes in this test, but
when running with tablets enabled, background writes are still
disabled because of #17025, which was fixed last week. so we can
enable background writes with tablets.

in this change,

* background writes are enabled with tablets.
* increase the number of nodes by 1 so that we have enough nodes
  to fulfill the needs of tablets, which enforces that the number
  of replicas should always satisfy RF.
* pass rf to `start_writes()` explicitly, so we have less
  magic numbers in the test, and make the data dependencies
  more obvious.

Fixes #17589
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18707
2024-06-08 17:46:37 +02:00
Gleb Natapov
34cf5c81f6 group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group
Currently they both run in streaming group and it may become busy during
repair/mv building and affect group0 functionality. Move it to the
gossiper group where it should have more time to run.

Fixes scylladb/scylladb#18863

Closes scylladb/scylladb#19138
2024-06-07 15:31:44 +02:00
Pavel Emelyanov
b854bf4b83 lang: Don't use db::config to create lua context
Similarly to previous patch, lua context needs db::config for creation.
It's better to get the configurables via lang::manager::config.

One thing to note -- lua config carries updateable_values on board, but
respective db::config options and _not_ LiveUpdate-able, so the lua
config could just use simple data types. This patch keeps updateable
values intact for brevity.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 13:07:05 +03:00
Pavel Emelyanov
783ccc0a74 lang: Don't use db::config to create wasm context
The managerr needs to get two "fuel" configurables from db::config in
order to create context. Instead of carrying db config from callers,
keep the options on existing lang::manager::config and use them.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 13:07:05 +03:00
Pavel Emelyanov
fe7ff7172d wasm: Replace startup_context with wasm_config
The lang::manager starts with the help of a context because it needs to
have std::shared_ptr<> pointg to cross-shard shared wasm engine and
runner thread. For that a context is created in advance, that then helps
sharing the engine and runner across manager instances.

This patch removes the "context" and replaces it with classical
manager::config. With it, it's lang::manager who's now responsible for
initializing itself.

In order to have cross-shard engine and thread pointers, the start()
method uses invoke_on_others() facility to share the pointer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 12:35:57 +03:00
Pavel Emelyanov
0dad72b736 lang: Add manager::start() method
Just like any other sharded<> service, the lang::manager now starts and
stops in a classical sequence of

  await sharded<manager>::start()
  defer([] { await sharded<manager>::stop() })
  await sharded<manager>::invoke_on_all(&manager::start)

For now the method is no-op, next patches will start using it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 12:35:57 +03:00
Pavel Emelyanov
f950469af5 lang: Move manager to lang namespace
And, while at it, rename local variable to refer to it to as "manager"
not "wasm". Query processor and database also have getters named
"wasm()", these are not renamed yet to keep patch smaller (and those
getters are going to be reworked further anyway).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 12:35:57 +03:00
Pavel Emelyanov
1dec79e97d lang: Move wasm::manager to its .cc/.hh files
It's going to become a facade in front of both -- wasm and lua, so keep
it in files with language independent names.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 12:35:57 +03:00