In this patch, we port validation/entities/secondary_index_test.java,
resulting in 41 tests for various aspects of secondary indexes.
Some of the original Java tests required direct access to the Cassandra
internals not available through CQL, so those tests were omitted.
In porting these tests, I uncovered 9 previously-unknown bugs in Scylla:
Refs #8600: IndexInfo system table lists MV name instead of index name
Refs #8627: Cleanly reject updates with indexed values where value > 64k
Refs #8708: Secondary index is missing partitions with only a static row
Refs #8711: Finding or filtering with an empty string with a secondary
index seems to be broken
Refs #8714: Improve error message on unsupported restriction on partition
key
Refs #8717: Recent fix accidentally broke CREATE INDEX IF NOT EXISTS
Refs #8724: Wrong error message when attempting index of UDT column with
a duration
Refs #8744: Index-creation error message wrongly refers to "map" - it can
be any collection
Refs #8745: Secondary index CREATE INDEX syntax is missing the "values"
option
These tests also provide additional reproducers for already known issues:
Refs #2203: Add support for SASI
Refs #2962: Collection column indexing
Refs #2963: Static column indexing
Refs #4244: Add support for mixing token, multi- and single-column
restrictions
Due to these bugs, 15 out of the 41 tests here currently xfail. We actually
had more failing tests, but we fixed a few of the above issues before this
patch went in, so their tests are passing at the time of this submission.
All 41 tests pass when running against Cassandra.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210531112354.970028-1-nyh@scylladb.com>
This patch is not backward compatible with its original,
but it's considered fine, since the original workload types were not
yet part of any release.
The changes include:
- instead of using 'unspecified' for declaring that there's no workload
type for a particular service level, NULL is used for that purpose;
NULL is the standard way of representing lack of data
- introducing a delete marker, which accompanies NULL and makes it
possible to distinguish between wanting to forcibly reset a workload
type to unspecified and not wanting to change the previous value
- updating the tests accordingly
These changes come in as a single patch, because they're intertwined
with each other and the tests for workload types are already in place;
an attempt to split them proved to be more complicated than it's worth.
Tests: unit(release)
Closes#8763
Compaction manager can start tons of compaction of fully expired sstable in
parallel, which may consume a significant amount of resources.
This problem is caused by weight being released too early in compaction, after
data is all compacted but before table is called to update its state, like
replacing sstables and so on.
Fully expired sstables aren't actually compacted, so the following can happen:
- compaction 1 starts for expired sst A with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 2 starts for expired sst B with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 3 starts for expired sst C with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 1 is done updating table state, so it finally completes and
releases all the resources.
- compaction 2 is done updating table state, so it finally completes and
releases all the resources.
- compaction 3 is done updating table state, so it finally completes and
releases all the resources.
This happens because, with expired sstable, compaction will release weight
faster than it will update table state, as there's nothing to be compacted.
With my reproducer, it's very easy to reach 50 parallel compactions on a single
shard, but that number can be easily worse depending on the amount of sstables
with fully expired data, across all tables. This high parallelism can happen
only with a couple of tables, if there are many time windows with expired data,
as they can be compacted in parallel.
Prior to 55a8b6e3c9, weight was released earlier in compaction, before
last sstable was sealed, but right now, there's no need to release weight
earlier. Weight can be released in a much simpler way, after the compaction is
actually done. So such compactions will be serialized from now on.
Fixes#8710.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210527165443.165198-1-raphaelsc@scylladb.com>
[avi: drop now unneeded storage_service_for_tests]
user_defined_function_test fails sporadically in debug mode
due to lua timeout. Raise the timeout to avoid the failure, but
not so much that the test that expects timout becomes too slow.
Fixes#8746.
Closes#8747
This is another boring patch.
One of schema constructors has been deprecated for many years now but
was used in several places anyway. Usage of this constructor could
lead to data corruption when using MX sstables because this constructor
does not set schema version. MX reading/writing code depends on schema
version.
This patch replaces all the places the deprecated constructor is used
with schema_builder equivalent. The schema_builder sets the schema
version correctly.
Fixes#8507
Test: unit(dev)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <4beabc8c942ebf2c1f9b09cfab7668777ce5b384.1622357125.git.piotr@scylladb.com>
Sometimes the cql-pytest tests run extremely slowly. This can be
a combination of running the debug build (which is naturally slow)
and a test machine which is overcommitted, or experiencing some
transient swap storm or some similar event. We don't want tests, which
we run on a 100% reliable setups, to fail just because they run into
timeouts in Scylla when they run very slowly.
We already noticed this problem in the past, and increased the CQL client
timeout in conftest.py from the default of 10 seconds to 120 seconds -
the old default of 10 seconds was not enough for some long operations
(such as creating a table with multiple views) when the test ran very
slowly.
However, this only fixed the client-side timeout. We also have a bunch
of server-side timeouts, configured to all sorts of arbitrary (and
fairly small) numbers. For example, the server has a "write request
timeout" option, which defaults to just 2 seconds. We recently saw
this timeout exceeded in a slow run which tried to do a very large
write.
So this patch configures all the configurable server-side timeouts we
have to default to 300 seconds. This should be more than enough for even
the slowest runs (famous last words...). This default is not a good idea
on real multi-node clusters which are expected to deal with node loss,
but this is not the case in cql-pytest.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210529213648.856503-1-nyh@scylladb.com>
When destroying an perf_sstable_test_env, an assert in sstables_manager
destructor fails, because it hasn't been closed.
Fix by removing all references to sstables from perf_sstable_test_env,
and then closing the test_env(as well as the sstables_manager)
Fixes#8736
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
Closes#8737
Reopening #8286 since the token metadata fix that allows `Everywhere` strategy tables to work with RBO (#8536) has been merged.
---
Currently when a node wants to create and broadcast a new CDC generation
it performs the following steps:
1. choose the generation's stream IDs and mapping (how this is done is
irrelevant for the current discussion)
2. choose the generation's timestamp by taking the current time
(according to its local clock) and adding 2 * ring_delay
3. insert the generation's data (mapping and stream IDs) into
system_distributed.cdc_generation_descriptions, using the
generation's timestamp as the partition key (we call this table
the "old internal table" below)
4. insert the generation's timestamp into the "CDC_STREAMS_TIMESTAMP"
application state.
The timestamp spreads epidemically through the gossip protocol. When
nodes see the timestamp, they retrieve the generation data from the
old internal table.
Unfortunately, due to the schema of the old internal table, where
the entire generation data is stored in a single cell, step 3 may fail for
sufficiently large generations (there is a size threshold for which step
3 will always fail - retrying the operation won't help). Also the old
internal table lies in the system_distributed keyspace that uses
SimpleStrategy with replication factor 3, which is also problematic; for
example, when nodes restart, they must reach at least 2 out of these 3
specific replicas in order to retrieve the current generation (we write
and read the generation data with QUORUM, unless we're a single-node
cluster, where we use ONE). Until this happens, a restarting
node can't coordinate writes to CDC-enabled tables. It would be better
if the node could access the last known generation locally.
The commit introduces a new table for broadcasting generation data with
the following properties:
- it uses a better schema that stores the data in multiple rows, each
of manageable size
- it resides in a new keyspace that uses EverywhereStrategy so the
data will be written to every node in the cluster that has a token in
the token ring
- the data will be written using CL=ALL and read using CL=ONE; thanks
to this, restarting node won't have to communicate with other nodes
to retrieve the data of the last known generation. Note that writing
with CL=ALL does not reduce availability: creating a new generation
*requires* all nodes to be available anyway, because they must learn
about the generation before their clocks go past the generation's
timestamp; if they don't, partitions won't be mapped to stream IDs
consistently across the cluster
- the partition key is no longer the generation's timestamp. Because it
was that way in the old internal table, it forced the algorithm to
choose the timestamp *before* the generation data was inserted into
the table. What if the inserting took a long time? It increased the
chance that nodes would learn about the generation too late (after
their clocks moved past its timestamp). With the new schema we will
first insert the generation data using a randomly generated UUID as
the partition key, *then* choose the timestamp, then gossip both the
timestamp and the UUID.
Observe that after a node learns about a generation broadcasted using
this new method through gossip it will retrieve its data very quickly
since it's one of the replicas and it can use CL=ONE as it was
written using CL=ALL.
The generation's timestamp and the UUID mentioned in the last point form
a "generation identifier" for this new generation. For passing these new
identifiers around, we introduce the cdc::generation_id_v2 type.
Fixes#7961.
---
For optimal review experience it is best to first read the updated design notes (you can read them rendered here: https://github.com/kbr-/scylla/blob/cdc-gen-table/docs/design-notes/cdc.md), specifically the ["Generation switching"](https://github.com/kbr-/scylla/blob/cdc-gen-table/docs/design-notes/cdc.md#generation-switching) section followed by the ["Internal generation descriptions table V1 and upgrade procedure"](https://github.com/kbr-/scylla/blob/cdc-gen-table/docs/design-notes/cdc.md#internal-generation-descriptions-table-v1-and-upgrade-procedure) section, then read the commits in topological order.
dtest gating run (dev): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/1160/
unit tests (dev) passed locally
Closes#8643
* github.com:scylladb/scylla:
docs: update cdc.md with info about the new internal table
sys_dist_ks: don't create old CDC generations table on service initialization
sys_dist_ks: rename all_tables() to ensured_tables()
cdc: when creating new generations, use format v2 if possible
main: pass feature_service to cdc::generation_service
gms: introduce CDC_GENERATIONS_V2 feature
cdc: introduce retrieve_generation_data
test: cdc: include new generations table in permissions test
sys_dist_ks: increase timeout for create_cdc_desc
sys_dist_ks: new table for exchanging CDC generations
tree-wide: introduce cdc::generation_id_v2
This draft extends and obsoletes #8123 by introducing a way of determining the workload type from service level parameters, and then using this context to qualify requests for shedding.
The rough idea is that when the admission queue in the CQL server is hit, it might make more sense to start shedding surplus requests instead of accumulating them on the semaphore. The assumption that interactive workloads are more interested in the success rate of as many requests as possible, and hanging on a semaphore reduces the chances for a request to succeed. Thus, it may make sense to shed some requests to reduce the load on this coordinator and let the existing requests to finish.
It's a draft, because I only performed local guided tests. #8123 was followed by some experiments on a multinode cluster which I want to rerun first.
Closes#8680
* github.com:scylladb/scylla:
test: add a case for conflicting workload types
cql-pytest: add basic tests for service level workload types
docs: describe workload types for service levels
sys_dist_ks: fix redundant parsing in get_service_level
sys_dist_ks: make get_service_level exception-safe
transport: start shedding requests during potential overload
client_state: hook workload type from service levels
cql3: add listing service level workload type
cql3: add persisting service level workload type
qos: add workload_type service level parameter
The purpose of the class in question is to start sharded storage
service to make its global instance alive. I don't know when exactly
it happened but no code that instantiates this wrapper really needs
the global storage service.
Ref: #2795
tests: unit(dev), perf_sstable(dev)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210526170454.15795-1-xemul@scylladb.com>
The recent commit 0ef0a4c78d added helpful
error messages in case an index cannot be created because the intended
name of its materialized view is already taken - but accidentally broke
the "CREATE INDEX IF NOT EXISTS" feature.
The checking code was correct, but in the wrong place: we need to first
check maybe the index already exists and "IF NOT EXISTS" was chosen -
and only do this new error checking if this is not the case.
This patch also includes a cql-pytest test for reproducing this bug.
The bug is also reproduced by the translated Cassandra unit tests
cassandra_tests/validation/entities/secondary_index_test.py::
testCreateAndDropIndex
and this is how I found this bug. After these patch, all these tests
pass.
Fixes#8717.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210526143635.624398-1-nyh@scylladb.com>
This warning triggers when a range for ("for (auto x : range)") causes
non-trivial copies, prompting the developer to replace with a capture
by reference. A few minor violations in the test suite are corrected.
Closes#8699
Fixes#8270
If we have an allocation pattern where we leave large parts of segments "wasted" (typically because the segment has empty space, but cannot hold the mutation being added), we can have a disk usage that is below threshold, yet still get a disk _footprint_ that is over limit causing new segment allocation to stall.
We need to take a few things into account:
1.) Need to include wasted space in the threshold check. Whether or not disk is actually used does not matter here.
2.) If we stall a segment alloc, we should just flush immediately. No point in waiting for the timer task.
3.) Need to adjust the thresholds a bit. Depending on sizes, we should probably consider start flushing once we've used up space enough to be in the last available segment, so a new one is hopefully available by the time we hit the limit.
Also fix edge case (for tests), when we have too few segment to have an active one (i.e. need flush everything).
Closes#8695
* github.com:scylladb/scylla:
commitlog_test: Add test case for usage/disk size threshold mismatch
commitlog: Flush all segments if we only have one.
commitlog: Always force flush if segment allocation is waiting
commitlog: Include segment wasted (slack) size in footprint check
commitlog: Adjust (lower) usage threshold
The old table won't be created in clusters that are bootstrapped after
this commit. It will stay in clusters that were upgraded from a version
before this commit.
Note that a fully upgraded cluster doesn't automatically create a new
generation in the new format. Even if the last generation was created
before the upgrade, the cluster will keep using it.
A new generation will be created in the new format when either:
1. a new node bootstraps (in the new version),
2. or the user runs checkAndRepairCdcStreams, which has a new check: if
the current generation uses the old format, the command will decide
that repair is needed, even if the generation is completely fine
otherwise (also in the new version).
During upgrade, while the CDC_GENERATIONS_V2 feature is still not
enabled, the user may still bootstrap a node in the old version of
Scylla or run checkAndRepairCdcStreams on a not-yet-upgraded node. In
that case a new generation will be created in the old format,
using the old table definitions.
test_phased_barrier_reassignment has a timeout to prevent the test from
hanging on failure, but it occastionally triggers in debug mode since
the timeout is quite low (1ms). Increase the timeout to prevent false
positives. Since the timeout only expires if the test fails, it will
have no impact on execution time.
Ref #8613Closes#8692
Off-strategy compaction on a table using STCS is slow because of
the needless write amplification of 2. That's because STCS reshape
isn't taking advantage of the fact that sstables produced by
a repair-based operation are disjoint. So the ~256 input sstables
were compacted (in batches of 32) into larger sstables, which in
turn were compacted into even larger ones. That write amp is very
significant on large data sets, making the whole operation 2x
slower.
Fixes#8449.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210524213426.196407-1-raphaelsc@scylladb.com>
To keep our cql-pytest tests "correct", we should strive for them to pass on
Cassandra - unless they are testing a Scylla-only feature or a deliberate
difference between Scylla and Cassandra - in which case they should be marked
"scylla-only" and cause such tests to be skipped when running on Cassandra.
The following few small patches fix a few cases where our tests we failing on
Cassandra. In one case this even found a bug in the test (a trivial Python
mistake, but still).
Closes#8694
* github.com:scylladb/scylla:
test/cql-pytest: fix python mistake in an xfailing test
test/cql-pytest: mark some tests with scylla-only
test/cql-pytest: clean up test_create_large_static_cells_and_rows
This series fixes a minor validation issue with service level timeouts - negative values were not checked. This bug is benign because negative timeouts act just like a 0s timeout, but the original series claimed to validate against negative values, so it's hereby fixed.
More importantly however, this series follows by enabling cql-pytest to run service level tests and provides a first batch of them, including a missing test case for negative timeouts.
The idea is similar to what we already have in alternator test suite - authentication is unconditionally enabled, which doesn't affect any existing tests, but at the same time allows writing test cases which rely on authentication - e.g. service levels.
Closes#8645
* github.com:scylladb/scylla:
cql-pytest: introduce service level test suite
cql-pytest: add enabling authentication by default
qos: fix validating service level timeouts for negative values
Now RPC module has some basic testing coverage to
make sure RPC configuration is updated appropriately
on configuration changes (i.e. `add_server` and
`remove_server` are called when appropriate).
The test suite currenty consists of the following
test-cases:
* Loading server instance with configuration from a snapshot.
* Loading server instance with configuration from a log.
* Configuration changes (remove + add node).
* Leader elections don't lead to RPC configuration changes.
* Voter <-> learner node transitions also don't change RPC
configuration.
* Reverting uncommitted configuration changes updates
RPC configuration accordingly (two cases: revert to
snapshot config or committed state from the log).
A few more refactorings are made along the way to be
able to reuse some existing functions from
`replication_test` in `rpc_test` implementation.
Please note, though, that there are still some functions
that are borrowed from `replication_test` but not yet
extracted to common helpers.
This is mostly because RPC tests doesn't need all
the complexity that `replication_test` has, thus,
some helpers are copied in a reduced form.
It would take some effort to refactor these bits to
fit both `replication_test` and `rpc_test` without
sacrificing convenience.
This will probably be addressed in another series later.
* manmanson/raft-rpc-tests-v9-alt3:
raft: add tests for RPC module
test: add CHECK_EVENTUALLY_EQUAL utility macro
raft: replication_test: reset test rpc network between test runs
raft: replication_test: extract tickers initialization into a separate func
raft: replication_test: support passing custom `apply_fn` to `change_configuration()`
raft: replication_test: introduce `test_server` aggregate struct
raft: replication_test: support voter<->learner configuration changes
raft: remove duplicate `create_command` function from `replication_test`
raft: avoid 'using' statements in raft testing helpers header
"
The patch set is an assorted collection of header cleanups, e.g:
* Reduce number of boost includes in header files
* Switch to forward declarations in some places
A quick measurement was performed to see if these changes
provide any improvement in build times (ccache cleaned and
existing build products wiped out).
The results are posted below (`/usr/bin/time -v ninja dev-build`)
for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX).
Before:
Command being timed: "ninja dev-build"
User time (seconds): 28262.47
System time (seconds): 824.85
Percent of CPU this job got: 3979%
Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2129888
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1402838
Minor (reclaiming a frame) page faults: 124265412
Voluntary context switches: 1879279
Involuntary context switches: 1159999
Swaps: 0
File system inputs: 0
File system outputs: 11806272
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
After:
Command being timed: "ninja dev-build"
User time (seconds): 26270.81
System time (seconds): 767.01
Percent of CPU this job got: 3905%
Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2117608
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1400189
Minor (reclaiming a frame) page faults: 117570335
Voluntary context switches: 1870631
Involuntary context switches: 1154535
Swaps: 0
File system inputs: 0
File system outputs: 11777280
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
The observed improvement is about 5% of total wall clock time
for `dev-build` target.
Also, all commits make sure that headers stay self-sufficient,
which would help to further improve the situation in the future.
"
* 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla:
transport: remove extraneous `qos/service_level_controller` includes from headers
treewide: remove evidently unneded storage_proxy includes from some places
service_level_controller: remove extraneous `service/storage_service.hh` include
sstables/writer: remove extraneous `service/storage_service.hh` include
treewide: remove extraneous database.hh includes from headers
treewide: reduce boost headers usage in scylla header files
cql3: remove extraneous includes from some headers
cql3: various forward declaration cleanups
utils: add missing <limits> header in `extremum_tracking.hh`
The xfailing test cassandra_tests/validation/entities/collections_test.py::
testSelectionOfEmptyCollections had a Python mistake (using {} instead
of set() for an empty set), which resulted in its failure when run
against Cassandra. After this patch it passes on Cassandra and fails on
Scylla - as expected (this is why it is marked xfail).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Tests which are known to test a Scylla-only feature (such as CDC)
or to rely on a known and difference between Scylla and Cassandra
should be marked "scylla-only", so they are skipped when running
the tests against Cassandra (test/cql-pytest/run-cassandra) instead
of reporting errors.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The test test_create_large_static_cells_and_rows had its own
implementation of "nodetool flush" using Scylla's REST API.
Now that we have a nodetool.flush() function for general use in
cql-pytest, let's use it and save a bit of duplication.
Another benefit is that now this test can be run (and pass) against
Cassandra.
To allow this test to run on Cassandra, I had to remove a
"USING TIMEOUT" which wasn't necessary for this test, and is
not a feature supported by Cassandra.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Currently, gossip uses the updates of the gossip heartbeat from gossip
messages to decide if a node is up or down. This means if a node is
actually down but the gossip messages are delayed in the network, the
marking of node down can be delayed.
For example, a node sends 20 gossip messages in 20 seconds before it
is dead. Each message is delayed 15 seconds by the network for some
reason. A node receives those delayed messages one after another.
Those delayed messages will prevent this node from being marked as down.
Because heartbeat update is received just before the threshold to mark a
node down is triggered which is around 20 seconds by default.
As a result, this node will not be marked as down in 20 * 15 seconds =
300 seconds, much longer than the ~20 seconds node down detection time
in normal cases.
In this patch, a new failure detector is implemented.
- Direct detection
The existing failure detector can get gossip heartbeat updates
indirectly. For example:
Node A can talk to Node B
Node B can talk to Node C
Node A can not talk to Node C, due to network issues
Node A will not mark Node B to be down because Node A can get heart beat
of Node C from node B indirectly.
This indirect detection is not very useful because when Node A decides
if it should send requests to Node C, the requests from Node A to C will
fail while Node A thinks it can communicate with Node C.
This patch changes the failure detection to be direct. It uses the
existing gossip echo message to detect directly. Gossip echo messages
will be sent to peer nodes periodically. A peer node will be marked as
down if a timeout threshold has been meet.
Since the failure detection is peer to peer, it avoids the delayed
message issue mentioned above.
- Parallel detection
The old failure detector uses shard zero only. This new failure detector
utilizes all the shards to perform the failure detection, each shard
handling a subset of live nodes. For example, if the cluster has 32
nodes and each node has 16 shards, each shard will handle only 2 nodes.
With a 16 nodes cluster, each node has 16 shards, each shard will handle
only one peer node.
A gossip message will be sent to peer nodes every 2 seconds. The extra
echo messages traffic produced compared to the old failure detector is
negligible.
- Deterministic detection
Users can configure the failure_detector_timeout_in_ms to set the
threshold to mark a node down. It is the maximum time between two
successful echo message before gossip marks a node down. It is easier to
understand than the old phi_convict_threshold.
- Compatible
This patch only uses the existing gossip echo message. Nodes with or without
this patch can work together.
Fixes#8488Closes#8036
The -Wunused-private-field was squelched when we switched to
clang to make the change easier. But it is a useful warning, so
re-enable it.
It found a serious bug (#8682) and a few minor instances of waste.
Closes#8683
* github.com:scylladb/scylla:
build: enable -Wunused-private-field warning
test: drop unused fields
table: drop unused field database_sstable_write_monitor::_compaction_manager
streaming: drop unused fields
sstables: mx reader: drop unused _column_value_length field
sstables: index_consumer: drop unused max_quantity field
compaction: resharding_compaction: drop unused _shard field
compaction: compaction_read_monitor: drop unused _compaction_manager field
raft: raft_services: drop unused _gossiper field
repair: drop unused _nr_peer_nodes field
redis: drop unused fields _storage_proxy and _requests_blocked_memory
mutation_rebuilder: drop unused field _remaining_limit
db: data_listeners: remove unused field _db
cql3: insert_json_statement: note bug with unused _if_not_exists
cql3: authorized_prepared_statement_cache: drop unused field _logger
auth: service_level_resource_view: drop unused field _resource
This patch adds an Alternator test, test_batch_get_item_large,
which checks a BatchGetItem with a moderately large (1.5 MB) response.
The test passes - we do not have a bug in BatchGetItem - but it
does reproduce issue #8522 - the long response is stored in memory as
one long contiguous string and causes a warning about an over-sized
allocation:
WARN ... seastar_memory - oversized allocation: 2281472 bytes.
Incidentally, this test also reproduces a second contiguous
allocation problem - issue #8183 (in BatchWriteItem which we use
in this test to set up the item to read).
Refs #8522
Refs #8183
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210520161619.110941-1-nyh@scylladb.com>
The gdb self-tests fail on aarch64 due to a failure to use thread-local
variables. I filed [1] so it can get fixed.
Meanwhile, disable the test so the build passes. It is sad, but the aarch64
build is not impacted by these failures.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=27886Closes#8672
"
There are many global stuff in repair -- a bunch of pointers to
sharded services, tracker, map of metas (maybe more). This set
removes the first group, all those services had become main-local
recently. Along the way a call to global storage proxy is dropped.
To get there the repair_service is turned into a "classical"
sharded<> service, gets all the needed dependencies by references
from main and spreads them internally where needed. Tracker and other
stuff is left global, but tracker is now the candidate for merging
with the now sharded repair_service, since it emulates the sharded
concept internally.
Overall the change is
- make repair_service sharded and put all dependencies on it at start
- have sharded<repair_service> in API and storage service
- carry the service reference down to repair_info and repair_meta
constructions to give them the depedencies
- use needed services in _info and _meta methods
tests: unit(dev), dtest.repair(dev)
"
* 'br-repair-service' of https://github.com/xemul/scylla: (29 commits)
repair: Drop most of globals from repair
repair: Use local references in messaging handler checks
repair: Use local references in create_writer()
repair: Construct repair_meta with local references
repair: Keep more stuff on repair_info
repair: Kill bunch of global usages from insert_repair_meta
repair: Pass repair service down to meta insertion
repair: Keep local migration manager on repair_info
repair: Move unused db captures
repair: Remove unused ms captures
repair: Construct repair_info with service
repair: Loop over repair sharded container
repair: Make sync_data_using_repair a method
repair: Use repair from storage service
repair: Keep repair on storage service
repair: Make do_repair_start a method
repair: Pass repair_service through the API until do_repair_start
repair: Fix indentation after previous patch
repair: Split sync_data_using_repair
repair: Turn repair_range a repair_info method
...
Following Nadav's advice, instead of ignoring the test
in sanitize/debug modes, the allocator simply has a special path
of failing sufficiently large allocation requests.
With that, a problem with the address sanitizer is bypassed
and other debug mode sanitizers can inspect and check
if there are no more problems related to wrapping the original
rapidjson allocator.
Closes#8539
Now RPC module has some basic testing coverage to
make sure RPC configuration is updated appropriately
on configuration changes (i.e. `add_server` and
`remove_server` are called when appropriate).
The test suite currenty consists of the following
test-cases:
* Loading server instance with configuration from a snapshot.
* Loading server instance with configuration from a log.
* Configuration changes (remove + add node).
* Leader elections don't lead to RPC configuration changes.
* Voter <-> learner node transitions also don't change RPC
configuration.
* Reverting uncommitted configuration changes updates
RPC configuration accordingly (two cases: revert to
snapshot config or committed state from the log).
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
It would be good to have a `CHECK` variant in addition
to an existing `REQUIRE_EVENTUALLY_EQUAL` macro. Will be used
in raft RPC tests.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Currently, emulated rpc network is shared between all test cases
in `replication_test.cc` (see static `rpc::net` map).
Though, its value is not reset when executing a subsequent test
case, which opens a possibility for heap-use-after-free bugs.
Also, make all `send_*` functions in test rpc class to throw an
error if a node being contacted is not in the network instead of
past-the-end access. This allows to safely contact a non-existent
node, which will be used in RPC tests later.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The first patch adds a nodetool-like capability to the cql-pytest framework.
It is *not* meant to be used to test nodetool itself, but rather to give CQL
tests the ability to use nodetool operations - currently only one operation -
"nodetool flush".
We try to use Scylla's REST API, if possible, and only fall back to using an
external "nodetool" command when the REST API is not available - i.e., when
testing Cassandra. The benefit of using the REST API is that we don't need
to run the jmx server to test Scylla.
The second patch is an example of using the new nodetool flush feature
in a test that needs to flush data to reproduce a bug (which has already
been fixed).
Closes#8622
* github.com:scylladb/scylla:
cql-pytest: reproducer for issue #8138
cql-pytest: add nodetool flush feature
We add a reproducing test for issue #8138, were if we write to an
TWCS table, scanning it would yield no rows - and worse - crash the
debug build.
This test requires "nodetool flush" to force the read to happen from
sstables, hence the nodetool feature was implemented in the previous
patch (on Scylla, it uses the REST API - not actually running nodetool
or requiring JMX).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds a nodetool-compatible capability to the cql-pytest
framework. It is *not* meant to be used to test nodetool itself, but
rather to give CQL tests the ability to use nodetool operations -
currently one operation - "nodetool flush".
Use it in a test as:
import nodetool
nodetool.flush(cql, table)
I chose a functional API with parameters ("cql") instead of a fixture
with an implied connection so that in the future we may allow multiple
multiple nodes and this API will allow sending nodetool requests to
different nodes. However, multi-node support is not implemented yet,
nor used in any of the existing tests.
The implementation uses Scylla's REST API if available, or if not, falls
back to using an external "nodetool" command (which can be overridden
using the NODETOOL environment variable). This way, both cql-pytest/run
(Scylla) and cql-pytest/run-cassandra (Cassandra) now correctly support
these nodetool operations, and we still don't need to run JMX to test
Scylla.
The reason We want to support nodetool.flush() is to reproduce bugs that
depend on data reaching disk. We already had such a reproducer in
test_large_cells_rows.py - it too did something similar - but it was
Scylla-only (using only the REST API). Instead of copying such code to
multiple places, we better have a common nodetool.flush() function, as
done in this patch. The test in test_large_cells_rows.py can later be
changed to use the new function.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Todays alloc() accepts migrate-fn, size and alignment. All the callers
don't really need to provide anything special for the migrate-fn and
are just happy with default alignof() for alignment. The simplification
is in providing alloc() that only accepts size arg and does the rest
itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>