Commit Graph

26744 Commits

Author SHA1 Message Date
Alejo Sanchez
52188016af raft: replication test: create_server in raft_cluster
Remove the global create_raft_server() and replace with a
create_server() helper in replication_test().

This will allow not requiring the user of raft_cluster to create special
objects.

Note this does not move(apply) anymore as it's kept in raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:02 -04:00
Alejo Sanchez
1edcb6e647 raft: replication test: reset snapshots
When stopping a server also delete snapshots and persisted snapshots.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:46:11 -04:00
Alejo Sanchez
453f19cf0e raft: replication test: reset server helper
Add a helper to reset a server in raft_cluster.

Besides simplifying code and preventing errors, this will help move
create_raft_server logic to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:20 -04:00
Alejo Sanchez
d3b7f21b88 raft: replication test: pause tickers before stopping
Pause tickers before stopping servers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:20 -04:00
Alejo Sanchez
30c9daafd2 raft: replication test: tick helper
Move test tick handling to raft_cluster as helper method.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:20 -04:00
Alejo Sanchez
2e61c507d2 raft: replication test: tickers on raft_cluster
Move tickers to raft_cluster helper class. Ticker initialization and
pause is done automatically at start_all() and stop_all().

Add temporary helpers to manage specific tickers. These might be removed
later once proper node abort and reset are implemented.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:20 -04:00
Alejo Sanchez
aea77871c4 raft: replication test: cluster tracking leader
Track current leader inside helper class.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:20 -04:00
Alejo Sanchez
ca8e55613e raft: replication test: elect first leader in raft_cluster
Run first leader election inside raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:20 -04:00
Alejo Sanchez
322802308c raft: replication test: use id 0 for rpc tests
raft_cluster at the moment only allows sequential 0 based ids.

The code was generating ids over this and causing problems for code
changes.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:20 -04:00
Alejo Sanchez
c1a6e81002 raft: replication test: fix partition wait log
When partitioning, don't wait_log on servers outside configuration.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:20 -04:00
Alejo Sanchez
6db730c500 raft: replication test: partition helper
Add a partition handling helper to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
848c244932 raft: replication test: track in_configuration in raft_cluster
Keep track of servers in configuration inside raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
16728b8966 raft: replication test: use cluster saved apply function
Use apply function saved in cluster at creation time.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
3daed889b8 raft: replication test: change_configuration in raft_cluster
Move change_configuration to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
102b8e71bb raft: replication test: free_election in raft_cluster
Move free_election to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
60d4d06861 raft: replication test: wait_log_all in raft_cluster
Move wait_log_all to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
d1ba0fe719 raft: replication test: wait_log in raft_cluster
Move wait_log to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
3e4871b884 raft: replication test: elect_new_leader in raft_cluster
Move elect_new_leader to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
59b9642be5 raft: replication test: elapse_election in raft_cluster
Move elapse_election to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
b3e2b54913 raft: replication test: move add_entry up
Style.

Move definition of add_entry and add_remaining_entries with the rest of
raft_cluster definitions.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
8cd2abe72b raft: replication test: remove spurious check
Going forward the leader is always in configuration and up to date.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
2d51d1bbc5 raft: replication test: raft_cluster add_entries
Move add_entries() to raft_cluster and provide a helper to add remaining
entries.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
2a1e7a15a6 raft: replication test: calculate first value helper
Helper to calculate what's the value number to be added after snapshot
and leader initial log.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
e2f425e210 raft: replication test: initial state helper
Move initial_state preparation to its own helper function.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
d2c0308a85 raft: replication test: move declarations up
Move declarations near the top of the file for following refactors to
raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
a3700a6d0a raft: replication test: move up set_config
Move set_config above raft_cluster for a subsequent commit.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
57da05c986 raft: replication test: use disconnect() helper
For rpc tests, use raft_cluster::disconnect() instead of the local
connected reference.

This removes connected object use outside raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
54c919b726 raft: replication test: add connectivity helpers
Add connectivity helpers disconnect(server, except) and connect_all() to
so users of raft_cluster don't need to keep the a connectivity object
pointer.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
5e324f3438 raft: replication test: rpc with raft_cluster
Use raft_cluster for rpc tests.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
752d53a909 raft: replication test: use parallel start/stop
Start and stop servers in parallel.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
bcf5181697 raft: replication test: cluster class
Use raft_cluster class to handle servers.

First part of this change.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
5fc0a1251d raft: replication test: helper uuid to local id
Add a helper to convert from UUID to size_t id.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
7e93501d4c raft: replication test: use optional
Instead of tracking with a boolean use an optional for partition leader.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
ccb85bce02 raft: replication test: wait log on next leader only
When there's a defined next leader, only wait for log propagation for
this follower.

Splits wait_log() to waiting for one follower with wait_log() and
waiting for all followers with wait_log().

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
2aa1646e35 raft: replication test: remove wait after adding entries
Remove log wait after adding entries. It was added to handle some debug
hangs but it is not good for testing.

There are already wait logs at proper code locations.
(e.g. elect_new_leader, partition)

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
0216d0a7b0 raft: replication test: remove unused param
elect_new_leader doesn't need to know configuration anymore.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
effcb7c5f6 raft: tests: move conversion helpers to header
Move replication test helpers to header.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
7327cbd871 raft: replication test: use structs to avoid alias
Use structs for test commands.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Piotr Dulikowski
b0c22f2e39 repair: trigger repair abort_source only from shard 0
When user requests repair to be forcefully aborted, the `_abort_all_as`
abort source could be modified from multiple shards in parallel by the
`tracker::abort_all_repairs()` function, which can lead to undefined
behavior and to a crash. This commit makes sure that `_abort_all_as` is
used only from shard 0 when repair is aborted.

Fixes #8693

Closes #8734
2021-05-31 15:57:31 +03:00
Avi Kivity
e96ff3d82d dist: add new docker building process
The new process has the following differences from the Dockerfile
based image:

 - Using buildah commands instead of a Dockerfile. This is more flexible
   since we don't need to pack everything into a "build context" and
   transfer it to the container; instead we interact with the container
   as we build it.
 - Using packages instead of a remote yum repository. This makes it
   easy to create an image in one step (no need to create a repository,
   promote, then download the packages back via yum. It means that
   the image cannot be upgraded via yum, but container images are
   usually just replaced with a new version.
 - Build output is an OCI archive (e.g. a tarball), not a docker image
   in a local repoistory. This means the build process can later be
   integrated into ninja, since the artifact is just a file. The file
   can be uploaded into a repository or made available locally with
   skopeo.
 - any build mode is supported, not just release. This can be used
   for quick(er) testing with dev mode.

I plan to integrate it further into the build system, but currently
this is blocked on a buildah bug [1].

[1] https://github.com/containers/buildah/issues/3262

Closes #8730
2021-05-31 10:05:22 +03:00
Nadav Har'El
2440569984 secondary index: fix error message which erroneously refered to "map"
The value of a frozen collection may only be indexed (using a secondary
index) in full - it is not allowed to index only the keys for example -
"CREATE INDEX idx ON table (keys(v))" is not allowed.

The error message referred to a frozen<map>, but the problem can happen
on any frozen collection (e.g., a frozen set), not just a frozen map,
so can be confusing to a user who used a frozen set, and getting an
error about a frozen map.

So this patch fixes the error message to refer to a "frozen collection".

Note that the Cassandra error message in this case is different - it
reads: "Frozen collections are immutable and must be fully indexed".

Fixes #8744.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210529094056.825117-1-nyh@scylladb.com>
2021-05-30 23:23:20 +03:00
Botond Dénes
cd6bbd37a4 utils/utf8.c: move includes outside of namespaces
Including in the middle of a namespace is not a good practice.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210528142502.962947-1-bdenes@scylladb.com>
2021-05-30 23:23:20 +03:00
Raphael S. Carvalho
a7cdd846da compaction: Prevent tons of compaction of fully expired sstable from happening in parallel
Compaction manager can start tons of compaction of fully expired sstable in
parallel, which may consume a significant amount of resources.
This problem is caused by weight being released too early in compaction, after
data is all compacted but before table is called to update its state, like
replacing sstables and so on.
Fully expired sstables aren't actually compacted, so the following can happen:
- compaction 1 starts for expired sst A with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 2 starts for expired sst B with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 3 starts for expired sst C with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 1 is done updating table state, so it finally completes and
releases all the resources.
- compaction 2 is done updating table state, so it finally completes and
releases all the resources.
- compaction 3 is done updating table state, so it finally completes and
releases all the resources.

This happens because, with expired sstable, compaction will release weight
faster than it will update table state, as there's nothing to be compacted.

With my reproducer, it's very easy to reach 50 parallel compactions on a single
shard, but that number can be easily worse depending on the amount of sstables
with fully expired data, across all tables. This high parallelism can happen
only with a couple of tables, if there are many time windows with expired data,
as they can be compacted in parallel.

Prior to 55a8b6e3c9, weight was released earlier in compaction, before
last sstable was sealed, but right now, there's no need to release weight
earlier. Weight can be released in a much simpler way, after the compaction is
actually done. So such compactions will be serialized from now on.

Fixes #8710.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210527165443.165198-1-raphaelsc@scylladb.com>

[avi: drop now unneeded storage_service_for_tests]
2021-05-30 23:22:51 +03:00
Benny Halevy
1c0769d789 table: clear: make exception safe
It is currently possible that _memtables->add_memtable()
will throw after _memtables->clear(), leaving the memtables
list completely empty.  However, we do rely on always
having at least one allocated in the memtables list
as active_memtable() references a lw_shared_ptr<memtable>
at the back of the memtables vector, and it expected
to always be allocated via add_memtable() upon construction
and after clear().

This change moves the implementation of this convention
to memtable_list::clear() and makes the latter exception safe
by first allocating the to-be-added empty memtable and
only then clearing the vector.

Refs #8749

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210530100232.2104051-1-bhalevy@scylladb.com>
2021-05-30 13:22:52 +03:00
Avi Kivity
791412b046 test: user_defined_function_test: raise Lua timeout
user_defined_function_test fails sporadically in debug mode
due to lua timeout. Raise the timeout to avoid the failure, but
not so much that the test that expects timout becomes too slow.

Fixes #8746.

Closes #8747
2021-05-30 13:10:57 +03:00
Piotr Jastrzebski
76d7c761d1 schema: Stop using deprecated constructor
This is another boring patch.

One of schema constructors has been deprecated for many years now but
was used in several places anyway. Usage of this constructor could
lead to data corruption when using MX sstables because this constructor
does not set schema version. MX reading/writing code depends on schema
version.

This patch replaces all the places the deprecated constructor is used
with schema_builder equivalent. The schema_builder sets the schema
version correctly.

Fixes #8507

Test: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <4beabc8c942ebf2c1f9b09cfab7668777ce5b384.1622357125.git.piotr@scylladb.com>
2021-05-30 11:58:27 +03:00
Nadav Har'El
1507bbb35a cql-pytest: increase default server-side timeouts
Sometimes the cql-pytest tests run extremely slowly. This can be
a combination of running the debug build (which is naturally slow)
and a test machine which is overcommitted, or experiencing some
transient swap storm or some similar event. We don't want tests, which
we run on a 100% reliable setups, to fail just because they run into
timeouts in Scylla when they run very slowly.

We already noticed this problem in the past, and increased the CQL client
timeout in conftest.py from the default of 10 seconds to 120 seconds -
the old default of 10 seconds was not enough for some long operations
(such as creating a table with multiple views) when the test ran very
slowly.

However, this only fixed the client-side timeout. We also have a bunch
of server-side timeouts, configured to all sorts of arbitrary (and
fairly small) numbers. For example, the server has a "write request
timeout" option, which defaults to just 2 seconds. We recently saw
this timeout exceeded in a slow run which tried to do a very large
write.

So this patch configures all the configurable server-side timeouts we
have to default to 300 seconds. This should be more than enough for even
the slowest runs (famous last words...). This default is not a good idea
on real multi-node clusters which are expected to deal with node loss,
but this is not the case in cql-pytest.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210529213648.856503-1-nyh@scylladb.com>
2021-05-30 01:20:14 +03:00
Avi Kivity
d23bebf5c2 Merge "Unexport storage service dependencies" from Pavel E
"
Right now storage service is used as "provider" of another
services -- database, feature service and tokens. This set
unexports the first pair. This dropps a bunch of calls for
global storage service instances from the places that don't
really need it.

tests: unit(dev), start-stop
"

* 'br-pupate-storage-service' of https://github.com/xemul/scylla:
  storage-service: Don't export features
  api: Get features from proxy
  storage-service: Don't export database
  storage-service: Turn some global helpers into methods
  storage-service: Open-code simple config getters
  view: Get database from stprage_proxy
  main: Use local database instance
  api: Use database from http_ctx
2021-05-29 20:52:47 +03:00
Pavel Emelyanov
598bbfab15 storage-service: Don't export features
Now storage service uses the feature service instance internally
and doesn't need to provide getter for it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-05-28 18:16:12 +03:00
Pavel Emelyanov
651568318d api: Get features from proxy
The reset_local_schema call needs proxy and feature service to do its
job. Right now the features are retrived from global storage service,
but they are present on the proxy as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-05-28 18:15:15 +03:00