configure.py uses the deprecated Python function tempfile.mktemp().
Because this function is labeled a "security risk" it is also a magnet
for automated security scanners... So let's replace it with the
recommended tempfile.mkstemp() and avoid future complaints.
The actual security implications of this mktemp() call is negligible to
non-existent: First it's just the build process (configure.py), not
the build product itself. Second, the worst that an attacker (which
needs to run in the build machine!) can do is to cause a compilation
test in configure.py to fail because it can't write to its output file.
Reported by @srikanthprathi
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220111121924.615173-1-nyh@scylladb.com>
"
The first patch introduces evictable_reader_v2, and the second one
further simplifies it. We clone instead of converting because there
is at least one downstream (by way of multishard_combining_reader) use
that is not itself straightforward to convert at the moment
(multishard_mutation_query), and because evictable_reader instances
cannot be {up,down}graded (since users also access the undelying
buffers). This also means that shard_reader, reader_lifecycle_policy
and multishard_combining_reader have to be cloned.
"
* tag 'clone-evictable-reader-to-v2/v3' of https://github.com/cmm/scylla:
convert make_multishard_streaming_reader() to flat_mutation_reader_v2
convert table::make_streaming_reader() to flat_mutation_reader_v2
convert make_flat_multi_range_reader() to flat_mutation_reader_v2
view_update_generator: remove unneeded call to downgrade_to_v1()
introduce multishard_combining_reader_v2
introduce shard_reader_v2
introduce the reader_lifecycle_policy_v2 abstract base
evictable_reader_v2: further code simplifications
introduce evictable_reader_v2 & friends
"
The fix is in keeping shared proxy pointer on query_pager.
tests: unit(dev)
"
* 'br-keep-proxy-on-pager-2' of https://github.com/xemul/scylla:
pager: Use local proxy pointer
pager: Keep shared pointer to proxy onboard
After the rewrite of the test/scylla-gdb, the test for "scylla fiber"
was disabled - and this patch brings it back.
For the "scylla fiber" operation to do something interesting (and not just
print an error message and seem to succeed...) it needs a real task pointer.
The old code interrupted Scylla in a breakpoint and used get_local_tasks(),
but in the new test framework we attach to Scylla while it's idle, so
there are no ready tasks. So in this patch we use the find_vptrs()
function to find a continuation from http_server::do_accept_one() - it
has an interesting fiber of 5 continuations.
After this patch all 33 tests in test/scylla-gdb/test_misc.py pass.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220110211813.581807-1-nyh@scylladb.com>
distributed_loader is replica-side thing, so it belongs in the
replica module ("distributed" refers to its ability to load
sstables in their correct shards). So move it to the replica
module.
The change exposes a dependency on the construction
order of static variables (which isn't defined), so we remove
the dependency in the first two patches.
Closes#9891
* github.com:scylladb/scylla:
replica: move distributed_loader into replica module
tracing: make sure keyspace and table names are available to static constructors
auth: make sure keyspace and table names are available to static constructors
This series gets rid of the global repair_tracker
and thread-local node_ops_metrics instances.
It does so by first, make the repair_tracker sharded,
with an instance per repair_service shard.
The, exposing the repair_service::repair_tracker
and keeping a reference to the repair_service in repair_info.
Then the node_ops_metrics instances are moved from
thread-local global variables to class repair_service.
The motivation for this series is two fold:
1. There is a global effor the get rid of global services
and instantiate all services on the stack of main() or cql_test_env.
2. As part of https://github.com/scylladb/scylla/issues/9809,
we would like to eventually use a generci job tracer for both repair
and compaction, so this would be one of the prelimanry steps to get there.
Refs #9809
Test: unit(release) (including scylla-gdb)
Dtest: repair_additional_test.py::TestRepairAdditional::{test_repair_disjoint_row_2nodes,test_repair_joint_row_3nodes_2_diff_shard_count} replace_address_test.py::TestReplaceAddress::test_serve_writes_during_bootstrap[rbo_enabled]
(Still seeing https://github.com/scylladb/scylla/issues/9785 but nothing worse)
* github.com:bhalevy/scylla.git deglobalize-repair-tracker-v4
repair: repair_tracker: get rid of _the_tracker
repair: repair_service: move free abort_repair_node_ops function to repair_service
repair_service: deglobalize node_ops_metrics
repair: node_ops_metrics: fixup indentation
repair: node_ops_metrics: declare in header file
repair: repair_info: add check_in_shutdown method
repair: use repair_info to get to the repair tracker
repair: move tracker-dependent free functions to repair_service
repair: tracker: mark get function const
repair_service: add repair_tracker getter
repair: make repair_tracker sharded
repair: repair_tracker: get rid of unused abort_all_abort_source
repair: repair_tracker: get rid of unused shutdown abort source
Mechanical changes and a resulting downgrade in one caller (which is
itself converted later).
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Almost all mechanical: not passing a `reader` parameter around when we
know it's the `_reader` member, folding a short one-use method into
its caller.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Cloning instead of converting because there is at least one
downstream (via multishard_combining_reader) use that is not
straightforward to convert (multishard_mutation_query).
The clone is mostly mechanical and much simpler than the original,
because it does not have to deal with range tombstones when deciding
if it is safe to pause the wrapped reader, and also does not have to
trim any range tombstones.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
run_with_compaction_disabled() is used to temporarily disable compaction
for a table T. Not only regular compaction, but all types.
Turns out it's stopping all types but it's only preventing new regular
compactions from starting. So major for example can start even with
compaction temporarily disabled. This is fixed by not allowing
compaction of any type if disabled. This wasn't possible before as
scrub incorrectly ran entirely with compaction disabled, so it wouldn't
be able to start, but now it only disables compaction while retrieving
its candidate list.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220107154942.59800-1-raphaelsc@scylladb.com>
Reshape isn't mandatory for correctness, unlike resharding.
So we can allow boot to continue even in face of reshape
failure. Without this, boot will fail right away due to
unhandled exception. This is intended to make population
more resilient as any exception, even "benign" ones,
may cause boot to fail. It's better to allow boot to
continue from where it left off, as if there's an exception
like io error, or OOM, population will be unable to
complete anyway.
This patch was written based on observation that dangling
errors in interposer consumer used by compaction can cause
a different exception to be triggered, like broken_promise,
when user asked reshape to stop. This can no longer happen
now, but better safe than sorry.
So regular compaction can now pick on backlog once node is
online.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220107130539.14899-1-raphaelsc@scylladb.com>
distributed_loader is replica-side thing, so it belongs in the
replica module ("distributed" refers to its ability to load
sstables in their correct shards). So move it to the replica
module.
Static constructors (specifically for the `system_keyspaces` global variable)
need their dependencies to be already constructed when their own
construction begins. Because tracing uses seastar::sstring, which is not
constexpr, we must change it to std::string_view (which is). Change
the type and perform the required adjustments. The definition is moved
to the header file for simplicity.
Do not depend on the_repair_tracker().
With that, the_repair_tracker() is no longer used
and should be deleted.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Embed the node_ops_metrics instance in
a sharded repair_service member.
Test: curl -silent http://127.0.0.1:9180/metrics | grep node_ops | grep -v "^#"
on a freshly started scylla instance.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
For de-globalizing its thread-local instance
by placing a node_ops_metrics member in repair_service.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
These functions are called from the api layer.
Continue to hide the repair tracker from the caller
but use the repair_service already available
at the api layer to invoke the respective high-level
methods without requiring `the_repair_tracker()`.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And rename the global repair_tracker getter to
`the_repair_tracker` as the first step to get rid of it.
repair_service methods now use the repair_service::repair_tracker
method.
The global getter was renamed to `the_repair_tracker()`
temporarily while eliminating it in this series
to help distinguish it from repair_service::repair_tracker().
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than keeping all shards' semaphore
and repair_info:s on the tracker's single-shard instance,
instantiate it on all shards, tracking the local
repair jobs on its local shard.
For now, until it's deglobalized, turn
_the_tracker into static thread_local pointer.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
There are few places that need storage proxy and that use
global method to acheive it. Since previous patch there's
a pager local non-null pointer.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Pagers are created by alternator and select statement, both
have the proxy reference at hands. Next, the pager's unique_ptr
is put on the lambda of its fetch_page() continuation and thus
it survives the fetch_page execution and then gets destroyed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
* seastar 655078dfdb...28fe4214e5 (2):
> program_options: avoid including boost/program_options.hpp when possible
> smp: split smp_options out of smp.hh
sstables/sstables.cc uses seastar::metrics but was missing an include of
<seastar/core/metrics.hh>. It probably received this include through
some other random included Seastar header (e.g., smp.hh).
Now that we're reducing the unnecessary inclusions in Seastar (an ongoing
effort of Seastar patches), it is no longer included implicitly, and we
need to include it explicitly in sstables.cc.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220109162823.511781-1-nyh@scylladb.com>
As already noted in commit eac6fb8, many of the scylla-gdb tests fail on
aarch64 for various reasons. The solution used in that commit was to
have test/scylla-gdb/run pretend to succeed - without testing anything -
when not running on x86_64. This workaround was accidentally lost when
scylla-gdb/run was recently rewritten.
This patch brings this workaround back, but in a slightly different form -
Instead of the run script not doing anything, the tests do get called, but
the "gdb" fixture in test/scylla-gdb/conftest.py causes each individual
test to be skipped.
The benefit of this approach is that it can easily be improved in the
future to only skip (or xfail) specific tests which are known to fail on
aarch64, instead of all of them - as half of the tests do pass on aarch64.
Fixes#9892.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220109152630.506088-1-nyh@scylladb.com>
This is a consolidation of #9714 and #9709 PRs by @elcallio that were reviewed by @asias
The last comment on those was that they should be consolidated in order not to create a security degradation for
ec2 setups.
For some cases it is impossible to determine dc or rack association for nodes on outgoing connections.
One example is when some IPs are hidden behind Nat layer.
In some cases this creates problems where one side of the connection is aware of the rack/dc association where the
other doesn't.
The solution here is a two stage one:
1. First add a gossip reverse lookup that will help us determine the rack/dc association for a broader (hopefully all) range
of setups and NAT situations.
2. When this fails - be more strict about downgrading a node which tries to ensure that both sides of the connection will at least
downgrade the connection instead of just fail to start when it is not possible for one side to determine rack/dc association.
Fixes#9653
/cc @elcallio @asias
Closes#9822
* github.com:scylladb/scylla:
messaging_service: Add reverse mapping of private ip -> public endpoint
production_snitch_base: Do reverse lookup of endpoint for info
messaging_service: Make dc/rack encryption check for connection more strict
init.hh relies on boost::program_options but forgot to include the
header file <boost/program_options.hpp> for it. Today, this doesn't
matter, because Seastar unnecessarily includes <boost/program_options.hpp>
from unrelated header files (such as smp.hh) - so it ends up not being
missing.
But we plan to clean up Seastar from those unnecessary includes, and
then including what we need in init.hh will become important.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220109123152.492466-1-nyh@scylladb.com>
Static constructors (specifically for the `system_keyspaces` global variable)
need their dependencies to be already constructed when their own
construction begins. Enforce that for auth keyspace and table names
using the constinit keyword.
Move the ::database, ::keyspace, and ::table classes to a new replica
namespace and replica/ directory. This designates objects that only
have meaning on a replica and should not be used on a coordinator
(but note that not all replica-only classes should be in this module,
for example compaction and sstables are lower-level objects that
deserve their own modules).
The module is imperfect - some additional classes like distributed_loader
should also be moved, but there is only one way to untie Gordian knots.
Closes#9872
* github.com:scylladb/scylla:
replica: move ::database, ::keyspace, and ::table to replica namespace
database: Move database, keyspace, table classes to replica/ directory
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.
References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.
scylla-gdb.py is adjusted to look for both the new and old names.
Tables waiting for a chance to run reshape wouldn't trigger stop
exception, as the exception was only being triggered for ongoing
compactions. Given that stop reshape API must abort all ongoing
tasks and all pending ones, let's change run_custom_job() to
trigger the exception if it found that the pending task was
asked to stop.
Tests:
dtest: compaction_additional_test.py::TestCompactionAdditional::test_stop_reshape_with_multiple_keyspaces
unit: dev
Fixes#9836.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211223002157.215571-1-raphaelsc@scylladb.com>
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.
As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
This patch is an almost complete rewrite of the test/scylla-gdb
framework for testing Scylla's gdb commands.
The goals of this rewrite are described in issue #9864. In short, the
goals are:
1. Use pytest to define individual test cases instead one long Python
script. This will make it easier to add more tests, to run only
individual tests (e.g., test/scylla-gdb/run somefile.py::sometest),
to understand which test failed when it fails - and a lot of other
pytest conveniences.
2. Instead of an ad-hoc shell script to run Scylla, gdb, and the test,
use the same Python code which is used in other test suites (alternator,
cql-pytest, redis, and more). The resulting handling of the temporary
resources (processes, directories, IP address) is more robust, and
interrupting test/scylla-gdb/run will correctly kill its child
processes (both Scylla and gdb).
All existing gdb tests (except one - more on this below...) were
easily rewritten in the new framework.
The biggest change in this patch is who starts what. Before this patch,
"run" starts gdb, which in turn starts Scylla, stops it on a breakpoint,
and then runs various tests. After this patch, "run" starts Scylla on
its own (like it does in test/cql-pytest/run, et al.), and then gdb runs
pytest - and in a pytest fixture attaches to the running Scylla process.
The biggest benefit of this approach is that "run" is aware of both gdb
and Scylla, and can kill both with abruptly with SIGKILL to end the test.
But there's also a downside to this change: One of the tests (of "scylla
fiber") needs access to some task object. Before this patch, Scylla was
stopped on a breakpoint, and a task was available at that point. After
this patch, we attach gdb to an idle Scylla, and the test cannot find
any task to use. So the test_fiber() test fails for now.
One way we could perhaps fix it is to add a breakpoint and "continue"
Scylla a bit more after attaching to it. However, I could find the right
breakpoint - and we may also need to send a request to Scylla to
get it to reach that breakpoint. I'm still looking for a better way
to have access to some "task" object we can test on.
Fixes#9864.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220102221534.1096659-1-nyh@scylladb.com>