Style.
Move definition of add_entry and add_remaining_entries with the rest of
raft_cluster definitions.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Helper to calculate what's the value number to be added after snapshot
and leader initial log.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
For rpc tests, use raft_cluster::disconnect() instead of the local
connected reference.
This removes connected object use outside raft_cluster.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Add connectivity helpers disconnect(server, except) and connect_all() to
so users of raft_cluster don't need to keep the a connectivity object
pointer.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
When there's a defined next leader, only wait for log propagation for
this follower.
Splits wait_log() to waiting for one follower with wait_log() and
waiting for all followers with wait_log().
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Remove log wait after adding entries. It was added to handle some debug
hangs but it is not good for testing.
There are already wait logs at proper code locations.
(e.g. elect_new_leader, partition)
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
When user requests repair to be forcefully aborted, the `_abort_all_as`
abort source could be modified from multiple shards in parallel by the
`tracker::abort_all_repairs()` function, which can lead to undefined
behavior and to a crash. This commit makes sure that `_abort_all_as` is
used only from shard 0 when repair is aborted.
Fixes#8693Closes#8734
The new process has the following differences from the Dockerfile
based image:
- Using buildah commands instead of a Dockerfile. This is more flexible
since we don't need to pack everything into a "build context" and
transfer it to the container; instead we interact with the container
as we build it.
- Using packages instead of a remote yum repository. This makes it
easy to create an image in one step (no need to create a repository,
promote, then download the packages back via yum. It means that
the image cannot be upgraded via yum, but container images are
usually just replaced with a new version.
- Build output is an OCI archive (e.g. a tarball), not a docker image
in a local repoistory. This means the build process can later be
integrated into ninja, since the artifact is just a file. The file
can be uploaded into a repository or made available locally with
skopeo.
- any build mode is supported, not just release. This can be used
for quick(er) testing with dev mode.
I plan to integrate it further into the build system, but currently
this is blocked on a buildah bug [1].
[1] https://github.com/containers/buildah/issues/3262Closes#8730
The value of a frozen collection may only be indexed (using a secondary
index) in full - it is not allowed to index only the keys for example -
"CREATE INDEX idx ON table (keys(v))" is not allowed.
The error message referred to a frozen<map>, but the problem can happen
on any frozen collection (e.g., a frozen set), not just a frozen map,
so can be confusing to a user who used a frozen set, and getting an
error about a frozen map.
So this patch fixes the error message to refer to a "frozen collection".
Note that the Cassandra error message in this case is different - it
reads: "Frozen collections are immutable and must be fully indexed".
Fixes#8744.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210529094056.825117-1-nyh@scylladb.com>
Compaction manager can start tons of compaction of fully expired sstable in
parallel, which may consume a significant amount of resources.
This problem is caused by weight being released too early in compaction, after
data is all compacted but before table is called to update its state, like
replacing sstables and so on.
Fully expired sstables aren't actually compacted, so the following can happen:
- compaction 1 starts for expired sst A with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 2 starts for expired sst B with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 3 starts for expired sst C with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 1 is done updating table state, so it finally completes and
releases all the resources.
- compaction 2 is done updating table state, so it finally completes and
releases all the resources.
- compaction 3 is done updating table state, so it finally completes and
releases all the resources.
This happens because, with expired sstable, compaction will release weight
faster than it will update table state, as there's nothing to be compacted.
With my reproducer, it's very easy to reach 50 parallel compactions on a single
shard, but that number can be easily worse depending on the amount of sstables
with fully expired data, across all tables. This high parallelism can happen
only with a couple of tables, if there are many time windows with expired data,
as they can be compacted in parallel.
Prior to 55a8b6e3c9, weight was released earlier in compaction, before
last sstable was sealed, but right now, there's no need to release weight
earlier. Weight can be released in a much simpler way, after the compaction is
actually done. So such compactions will be serialized from now on.
Fixes#8710.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210527165443.165198-1-raphaelsc@scylladb.com>
[avi: drop now unneeded storage_service_for_tests]
It is currently possible that _memtables->add_memtable()
will throw after _memtables->clear(), leaving the memtables
list completely empty. However, we do rely on always
having at least one allocated in the memtables list
as active_memtable() references a lw_shared_ptr<memtable>
at the back of the memtables vector, and it expected
to always be allocated via add_memtable() upon construction
and after clear().
This change moves the implementation of this convention
to memtable_list::clear() and makes the latter exception safe
by first allocating the to-be-added empty memtable and
only then clearing the vector.
Refs #8749
Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210530100232.2104051-1-bhalevy@scylladb.com>
user_defined_function_test fails sporadically in debug mode
due to lua timeout. Raise the timeout to avoid the failure, but
not so much that the test that expects timout becomes too slow.
Fixes#8746.
Closes#8747
This is another boring patch.
One of schema constructors has been deprecated for many years now but
was used in several places anyway. Usage of this constructor could
lead to data corruption when using MX sstables because this constructor
does not set schema version. MX reading/writing code depends on schema
version.
This patch replaces all the places the deprecated constructor is used
with schema_builder equivalent. The schema_builder sets the schema
version correctly.
Fixes#8507
Test: unit(dev)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <4beabc8c942ebf2c1f9b09cfab7668777ce5b384.1622357125.git.piotr@scylladb.com>
Sometimes the cql-pytest tests run extremely slowly. This can be
a combination of running the debug build (which is naturally slow)
and a test machine which is overcommitted, or experiencing some
transient swap storm or some similar event. We don't want tests, which
we run on a 100% reliable setups, to fail just because they run into
timeouts in Scylla when they run very slowly.
We already noticed this problem in the past, and increased the CQL client
timeout in conftest.py from the default of 10 seconds to 120 seconds -
the old default of 10 seconds was not enough for some long operations
(such as creating a table with multiple views) when the test ran very
slowly.
However, this only fixed the client-side timeout. We also have a bunch
of server-side timeouts, configured to all sorts of arbitrary (and
fairly small) numbers. For example, the server has a "write request
timeout" option, which defaults to just 2 seconds. We recently saw
this timeout exceeded in a slow run which tried to do a very large
write.
So this patch configures all the configurable server-side timeouts we
have to default to 300 seconds. This should be more than enough for even
the slowest runs (famous last words...). This default is not a good idea
on real multi-node clusters which are expected to deal with node loss,
but this is not the case in cql-pytest.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210529213648.856503-1-nyh@scylladb.com>
"
Right now storage service is used as "provider" of another
services -- database, feature service and tokens. This set
unexports the first pair. This dropps a bunch of calls for
global storage service instances from the places that don't
really need it.
tests: unit(dev), start-stop
"
* 'br-pupate-storage-service' of https://github.com/xemul/scylla:
storage-service: Don't export features
api: Get features from proxy
storage-service: Don't export database
storage-service: Turn some global helpers into methods
storage-service: Open-code simple config getters
view: Get database from stprage_proxy
main: Use local database instance
api: Use database from http_ctx
Now storage service uses the feature service instance internally
and doesn't need to provide getter for it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The reset_local_schema call needs proxy and feature service to do its
job. Right now the features are retrived from global storage service,
but they are present on the proxy as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now storage service uses the database instance internally and
doesn't need to provide getter for it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are two static helpers used by storage service that grab
global storage service. To simplify these two turn both into
storage service methods and use 'this' inside.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are two db::config getters in storage_service.cc that
are used only once. Both call for global storage service, but
since they are called from storage service it's simpler to break
this loop and make storage service get needed config options
directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The db::view code already uses proxy rather actively, so instead of
depending on the storage service to be at hands it's better to make
db::view require the proxy. For now -- via global instance.
There's one dependency on storage service left after this patch --
to get the tokens. This piece is to be fixed later.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All start-stop code in main has the sharded<database> at hands, there's
no need in getting it from global storage service.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Instead of getting database from global storage service it's simpler
and better to grab it from the http context at hands.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Consider the following procedure:
- n1, n2, n3
- n3 is network partitioned from the cluster
- n4 replaces n3
- n3 has the network partition fixed
- n1 learns n3 as NORMAL status and calls
storage_service::handle_state_normal which in turn calls
update_peer_info, all columns except tokens column in system.peers are
written
- n1 restarts before figure out n4 is the new owner and deletes the
entry for n3 in system.peers
- n3 is removed from gossip by all the nodes in the cluster
automatically because they detect the collision and removes n3
- n1 restarts, leaving the entry in system.peers for n3 forever
To fix, we can update peer tables only if the node is part of the ring.
Fixes#8729Closes#8742
> Merge "memory: optimize thread-local initialization" from Avi
> Merge "Move priority classes manipulations from io-queue" from Pavel E
> gate: add default move assignment operator
Starting from seastar commit 5dae0cf3c48159990f51e5d38495af5ae224c2f8
all the registered classes info was moved into io_priority_class::_infos
array.
tests: scylla-gdb(release, old and new seastars)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210528083941.27990-1-xemul@scylladb.com>
Clang warns when "return std::move(x)" is needed to elide a copy,
but the call to std::move() is missing. We disabled the warning during
the migration to clang. This patch re-enables the warning and fixes
the places it points out, usually by adding std::move() and in one
place by converting the returned variable from a reference to a local,
so normal copy elision can take place.
Closes#8739