Commit Graph

26734 Commits

Author SHA1 Message Date
Alejo Sanchez
6db730c500 raft: replication test: partition helper
Add a partition handling helper to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
848c244932 raft: replication test: track in_configuration in raft_cluster
Keep track of servers in configuration inside raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
16728b8966 raft: replication test: use cluster saved apply function
Use apply function saved in cluster at creation time.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
3daed889b8 raft: replication test: change_configuration in raft_cluster
Move change_configuration to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
102b8e71bb raft: replication test: free_election in raft_cluster
Move free_election to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
60d4d06861 raft: replication test: wait_log_all in raft_cluster
Move wait_log_all to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
d1ba0fe719 raft: replication test: wait_log in raft_cluster
Move wait_log to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
3e4871b884 raft: replication test: elect_new_leader in raft_cluster
Move elect_new_leader to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
59b9642be5 raft: replication test: elapse_election in raft_cluster
Move elapse_election to raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
b3e2b54913 raft: replication test: move add_entry up
Style.

Move definition of add_entry and add_remaining_entries with the rest of
raft_cluster definitions.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
8cd2abe72b raft: replication test: remove spurious check
Going forward the leader is always in configuration and up to date.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
2d51d1bbc5 raft: replication test: raft_cluster add_entries
Move add_entries() to raft_cluster and provide a helper to add remaining
entries.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
2a1e7a15a6 raft: replication test: calculate first value helper
Helper to calculate what's the value number to be added after snapshot
and leader initial log.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
e2f425e210 raft: replication test: initial state helper
Move initial_state preparation to its own helper function.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
d2c0308a85 raft: replication test: move declarations up
Move declarations near the top of the file for following refactors to
raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
a3700a6d0a raft: replication test: move up set_config
Move set_config above raft_cluster for a subsequent commit.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
57da05c986 raft: replication test: use disconnect() helper
For rpc tests, use raft_cluster::disconnect() instead of the local
connected reference.

This removes connected object use outside raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
54c919b726 raft: replication test: add connectivity helpers
Add connectivity helpers disconnect(server, except) and connect_all() to
so users of raft_cluster don't need to keep the a connectivity object
pointer.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
5e324f3438 raft: replication test: rpc with raft_cluster
Use raft_cluster for rpc tests.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
752d53a909 raft: replication test: use parallel start/stop
Start and stop servers in parallel.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
bcf5181697 raft: replication test: cluster class
Use raft_cluster class to handle servers.

First part of this change.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
5fc0a1251d raft: replication test: helper uuid to local id
Add a helper to convert from UUID to size_t id.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
7e93501d4c raft: replication test: use optional
Instead of tracking with a boolean use an optional for partition leader.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
ccb85bce02 raft: replication test: wait log on next leader only
When there's a defined next leader, only wait for log propagation for
this follower.

Splits wait_log() to waiting for one follower with wait_log() and
waiting for all followers with wait_log().

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
2aa1646e35 raft: replication test: remove wait after adding entries
Remove log wait after adding entries. It was added to handle some debug
hangs but it is not good for testing.

There are already wait logs at proper code locations.
(e.g. elect_new_leader, partition)

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
0216d0a7b0 raft: replication test: remove unused param
elect_new_leader doesn't need to know configuration anymore.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
effcb7c5f6 raft: tests: move conversion helpers to header
Move replication test helpers to header.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Alejo Sanchez
7327cbd871 raft: replication test: use structs to avoid alias
Use structs for test commands.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 21:50:19 -04:00
Piotr Dulikowski
b0c22f2e39 repair: trigger repair abort_source only from shard 0
When user requests repair to be forcefully aborted, the `_abort_all_as`
abort source could be modified from multiple shards in parallel by the
`tracker::abort_all_repairs()` function, which can lead to undefined
behavior and to a crash. This commit makes sure that `_abort_all_as` is
used only from shard 0 when repair is aborted.

Fixes #8693

Closes #8734
2021-05-31 15:57:31 +03:00
Avi Kivity
e96ff3d82d dist: add new docker building process
The new process has the following differences from the Dockerfile
based image:

 - Using buildah commands instead of a Dockerfile. This is more flexible
   since we don't need to pack everything into a "build context" and
   transfer it to the container; instead we interact with the container
   as we build it.
 - Using packages instead of a remote yum repository. This makes it
   easy to create an image in one step (no need to create a repository,
   promote, then download the packages back via yum. It means that
   the image cannot be upgraded via yum, but container images are
   usually just replaced with a new version.
 - Build output is an OCI archive (e.g. a tarball), not a docker image
   in a local repoistory. This means the build process can later be
   integrated into ninja, since the artifact is just a file. The file
   can be uploaded into a repository or made available locally with
   skopeo.
 - any build mode is supported, not just release. This can be used
   for quick(er) testing with dev mode.

I plan to integrate it further into the build system, but currently
this is blocked on a buildah bug [1].

[1] https://github.com/containers/buildah/issues/3262

Closes #8730
2021-05-31 10:05:22 +03:00
Nadav Har'El
2440569984 secondary index: fix error message which erroneously refered to "map"
The value of a frozen collection may only be indexed (using a secondary
index) in full - it is not allowed to index only the keys for example -
"CREATE INDEX idx ON table (keys(v))" is not allowed.

The error message referred to a frozen<map>, but the problem can happen
on any frozen collection (e.g., a frozen set), not just a frozen map,
so can be confusing to a user who used a frozen set, and getting an
error about a frozen map.

So this patch fixes the error message to refer to a "frozen collection".

Note that the Cassandra error message in this case is different - it
reads: "Frozen collections are immutable and must be fully indexed".

Fixes #8744.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210529094056.825117-1-nyh@scylladb.com>
2021-05-30 23:23:20 +03:00
Botond Dénes
cd6bbd37a4 utils/utf8.c: move includes outside of namespaces
Including in the middle of a namespace is not a good practice.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210528142502.962947-1-bdenes@scylladb.com>
2021-05-30 23:23:20 +03:00
Raphael S. Carvalho
a7cdd846da compaction: Prevent tons of compaction of fully expired sstable from happening in parallel
Compaction manager can start tons of compaction of fully expired sstable in
parallel, which may consume a significant amount of resources.
This problem is caused by weight being released too early in compaction, after
data is all compacted but before table is called to update its state, like
replacing sstables and so on.
Fully expired sstables aren't actually compacted, so the following can happen:
- compaction 1 starts for expired sst A with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 2 starts for expired sst B with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 3 starts for expired sst C with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 1 is done updating table state, so it finally completes and
releases all the resources.
- compaction 2 is done updating table state, so it finally completes and
releases all the resources.
- compaction 3 is done updating table state, so it finally completes and
releases all the resources.

This happens because, with expired sstable, compaction will release weight
faster than it will update table state, as there's nothing to be compacted.

With my reproducer, it's very easy to reach 50 parallel compactions on a single
shard, but that number can be easily worse depending on the amount of sstables
with fully expired data, across all tables. This high parallelism can happen
only with a couple of tables, if there are many time windows with expired data,
as they can be compacted in parallel.

Prior to 55a8b6e3c9, weight was released earlier in compaction, before
last sstable was sealed, but right now, there's no need to release weight
earlier. Weight can be released in a much simpler way, after the compaction is
actually done. So such compactions will be serialized from now on.

Fixes #8710.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210527165443.165198-1-raphaelsc@scylladb.com>

[avi: drop now unneeded storage_service_for_tests]
2021-05-30 23:22:51 +03:00
Benny Halevy
1c0769d789 table: clear: make exception safe
It is currently possible that _memtables->add_memtable()
will throw after _memtables->clear(), leaving the memtables
list completely empty.  However, we do rely on always
having at least one allocated in the memtables list
as active_memtable() references a lw_shared_ptr<memtable>
at the back of the memtables vector, and it expected
to always be allocated via add_memtable() upon construction
and after clear().

This change moves the implementation of this convention
to memtable_list::clear() and makes the latter exception safe
by first allocating the to-be-added empty memtable and
only then clearing the vector.

Refs #8749

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210530100232.2104051-1-bhalevy@scylladb.com>
2021-05-30 13:22:52 +03:00
Avi Kivity
791412b046 test: user_defined_function_test: raise Lua timeout
user_defined_function_test fails sporadically in debug mode
due to lua timeout. Raise the timeout to avoid the failure, but
not so much that the test that expects timout becomes too slow.

Fixes #8746.

Closes #8747
2021-05-30 13:10:57 +03:00
Piotr Jastrzebski
76d7c761d1 schema: Stop using deprecated constructor
This is another boring patch.

One of schema constructors has been deprecated for many years now but
was used in several places anyway. Usage of this constructor could
lead to data corruption when using MX sstables because this constructor
does not set schema version. MX reading/writing code depends on schema
version.

This patch replaces all the places the deprecated constructor is used
with schema_builder equivalent. The schema_builder sets the schema
version correctly.

Fixes #8507

Test: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <4beabc8c942ebf2c1f9b09cfab7668777ce5b384.1622357125.git.piotr@scylladb.com>
2021-05-30 11:58:27 +03:00
Nadav Har'El
1507bbb35a cql-pytest: increase default server-side timeouts
Sometimes the cql-pytest tests run extremely slowly. This can be
a combination of running the debug build (which is naturally slow)
and a test machine which is overcommitted, or experiencing some
transient swap storm or some similar event. We don't want tests, which
we run on a 100% reliable setups, to fail just because they run into
timeouts in Scylla when they run very slowly.

We already noticed this problem in the past, and increased the CQL client
timeout in conftest.py from the default of 10 seconds to 120 seconds -
the old default of 10 seconds was not enough for some long operations
(such as creating a table with multiple views) when the test ran very
slowly.

However, this only fixed the client-side timeout. We also have a bunch
of server-side timeouts, configured to all sorts of arbitrary (and
fairly small) numbers. For example, the server has a "write request
timeout" option, which defaults to just 2 seconds. We recently saw
this timeout exceeded in a slow run which tried to do a very large
write.

So this patch configures all the configurable server-side timeouts we
have to default to 300 seconds. This should be more than enough for even
the slowest runs (famous last words...). This default is not a good idea
on real multi-node clusters which are expected to deal with node loss,
but this is not the case in cql-pytest.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210529213648.856503-1-nyh@scylladb.com>
2021-05-30 01:20:14 +03:00
Avi Kivity
d23bebf5c2 Merge "Unexport storage service dependencies" from Pavel E
"
Right now storage service is used as "provider" of another
services -- database, feature service and tokens. This set
unexports the first pair. This dropps a bunch of calls for
global storage service instances from the places that don't
really need it.

tests: unit(dev), start-stop
"

* 'br-pupate-storage-service' of https://github.com/xemul/scylla:
  storage-service: Don't export features
  api: Get features from proxy
  storage-service: Don't export database
  storage-service: Turn some global helpers into methods
  storage-service: Open-code simple config getters
  view: Get database from stprage_proxy
  main: Use local database instance
  api: Use database from http_ctx
2021-05-29 20:52:47 +03:00
Pavel Emelyanov
598bbfab15 storage-service: Don't export features
Now storage service uses the feature service instance internally
and doesn't need to provide getter for it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-05-28 18:16:12 +03:00
Pavel Emelyanov
651568318d api: Get features from proxy
The reset_local_schema call needs proxy and feature service to do its
job. Right now the features are retrived from global storage service,
but they are present on the proxy as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-05-28 18:15:15 +03:00
Pavel Emelyanov
b990b764ca storage-service: Don't export database
Now storage service uses the database instance internally and
doesn't need to provide getter for it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-05-28 18:13:27 +03:00
Pavel Emelyanov
0651038f29 storage-service: Turn some global helpers into methods
There are two static helpers used by storage service that grab
global storage service. To simplify these two turn both into
storage service methods and use 'this' inside.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-05-28 18:12:25 +03:00
Pavel Emelyanov
5ae8accfed storage-service: Open-code simple config getters
There are two db::config getters in storage_service.cc that
are used only once. Both call for global storage service, but
since they are called from storage service it's simpler to break
this loop and make storage service get needed config options
directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-05-28 18:11:24 +03:00
Pavel Emelyanov
1ce0682821 view: Get database from stprage_proxy
The db::view code already uses proxy rather actively, so instead of
depending on the storage service to be at hands it's better to make
db::view require the proxy. For now -- via global instance.

There's one dependency on storage service left after this patch --
to get the tokens. This piece is to be fixed later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-05-28 18:09:32 +03:00
Pavel Emelyanov
6d53ddaa5f main: Use local database instance
All start-stop code in main has the sharded<database> at hands, there's
no need in getting it from global storage service.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-05-28 18:08:57 +03:00
Pavel Emelyanov
e476247763 api: Use database from http_ctx
Instead of getting database from global storage service it's simpler
and better to grab it from the http context at hands.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-05-28 18:08:25 +03:00
Asias He
e86d39faf0 storage_service: Update peer table only if the peer is part of the ring
Consider the following procedure:

- n1, n2, n3

- n3 is network partitioned from the cluster

- n4 replaces n3

- n3 has the network partition fixed

- n1 learns n3 as NORMAL status and calls
  storage_service::handle_state_normal which in turn calls
  update_peer_info, all columns except tokens column in system.peers are
  written

- n1 restarts before figure out n4 is the new owner and deletes the
  entry for n3 in system.peers

- n3 is removed from gossip by all the nodes in the cluster
  automatically because they detect the collision and removes n3

- n1 restarts, leaving the entry in system.peers for n3 forever

To fix, we can update peer tables only if the node is part of the ring.

Fixes #8729

Closes #8742
2021-05-28 15:03:26 +02:00
Avi Kivity
b6c49fd320 Update seastar submodule
> Merge "memory: optimize thread-local initialization" from Avi
  > Merge "Move priority classes manipulations from io-queue" from Pavel E
  > gate: add default move assignment operator
2021-05-28 11:47:54 +03:00
Pavel Emelyanov
526d31734c scylla-gdb: scylla_io_queues: Support new registered classes layout
Starting from seastar commit 5dae0cf3c48159990f51e5d38495af5ae224c2f8
all the registered classes info was moved into io_priority_class::_infos
array.

tests: scylla-gdb(release, old and new seastars)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210528083941.27990-1-xemul@scylladb.com>
2021-05-28 11:47:38 +03:00
Avi Kivity
0acf5bfca6 build: enable -Wreturn-std-move
Clang warns when "return std::move(x)" is needed to elide a copy,
but the call to std::move() is missing. We disabled the warning during
the migration to clang. This patch re-enables the warning and fixes
the places it points out, usually by adding std::move() and in one
place by converting the returned variable from a reference to a local,
so normal copy elision can take place.

Closes #8739
2021-05-27 21:16:26 +03:00