Merged patch series from Piotr Sarna:
This series adds HTTPS support for Alternator.
The series comes with --https option added to alternator-test, which makes
the test harness run all the tests with HTTPS instead of HTTP. All the tests
pass, albeit with security warnings that a self-signed x509 certificate was
used and it should not be trusted.
Fixes#5042
Refs scylladb/seastar#685
Patches:
docs: update alternator entry on HTTPS
alternator-test: suppress the "Unverified HTTPS request" warning
alternator-test: add HTTPS info to README.md
alternator-test: add HTTPS to test_describe_endpoints
alternator-test: add --https parameter
alternator: add HTTPS support
config: add alternator HTTPS port
"
Fix races that may lead to use-after-free events and file system level exceptions
during shutdown and drain.
The root cause of use-after-free events in question is that space_watchdog blocks on
end_point_hints_manager::file_update_mutex() and we need to make sure this mutex is alive as long as
it's accessed even if the corresponding end_point_hints_manager instance
is destroyed in the context of manager::drain_for().
File system exceptions may occur when space_watchdog attempts to scan a
directory while it's being deleted from the drain_for() context.
In case of such an exception new hints generation is going to be blocked
- including for materialized views, till the next space_watchdog round (in 1s).
Issues that are fixed are #4685 and #4836.
Tested as follows:
1) Patched the code in order to trigger the race with (a lot) higher
probability and running slightly modified hinted handoff replace
dtest with a debug binary for 100 times. Side effect of this
testing was discovering of #4836.
2) Using the same patch as above tested that there are no crashes and
nodes survive stop/start sequences (they were not without this series)
in the context of all hinted handoff dtests. Ran the whole set of
tests with dev binary for 10 times.
"
* 'hinted_handoff_race_between_drain_for_and_space_watchdog_no_global_lock-v2' of https://github.com/vladzcloudius/scylla:
hinted handoff: fix a race on a directory removal between space_watchdog and drain_for()
hinted handoff: make taking file_update_mutex safe
db::hints::manager::drain_for(): fix alignment
db::hints::manager: serialize calls to drain_for()
db::hints: cosmetics: identation and missing method qualifier
"
This is a collection of assorted patches that will be needed for LWT.
Most of them are trivial, but one touches a lot of files, so have a
good chance to cause rebase headache (I already had to rebase it on
top of Alternator). Lets push them earlier instead of carrying them in
the lwt branch.
"
* 'gleb/lwt-prepare-v2' of github.com:scylladb/seastar-dev:
lwt: make _last_timestamp_micros static
lwt: Add client_state::get_timestamp_for_paxos() function
lwt: Pass client_state reference all the way to storage_proxy::query
exceptions: Add a constructor for unavailable_exception that allows providing a custom message
serializer: Add std::variant support
lwt: Add missing functions to utils/UUID_gen.hh
When the toppartitions operation gathers results, it copies partition
keys with their schema_ptr:s. When these schema_ptr:s are copies
or destroyed, they can cause leaks or premature frees of the schema
in its original shard since reference count operations in are not atomic.
Fix that by converting the schema_ptr to a global_schema_ptr during
transportation.
Fixes#5104 (direct bug)
Fixes#5018 (schema prematurely freed, toppartitions previously executed on that node)
Fixes#4973 (corrupted memory pool of the same size class as schema, toppartitions previously executed on that node)
Tests: new test added that fails with the existing code in debug mode,
manual toppartitions test
Copying schema_ptrs across shards results in memory corruption since
lw_shared_ptr does not use atomic operations for reference counts.
Prevent that by converting schema_ptr:s to global_schema_ptr:s before
shipping them across shards in the map operation, and converting them
back to local schema_ptr:s in the reduce operation.
This allows keys from different stages in the schema's like to compare equal.
This is safe since the partition key cannot change, unlike the rest of the schema.
More importantly, it will allow us to compare keys made local after a pass through
global_schema_ptr, which does not guarantee that the schema_ptr conversion will be
the same even when starting with the same global_schema_ptr.
client_state holds a state to generate monotonically increasing unique
timestamp. Queries with a SERIAL consistency level need it to generate
a paxos round.
In the system.types table a user type refers to another by name. When
a user type is modified, only its entry in the table is changed.
At runtime a user type has direct pointer to the types it uses. To
handle the discrepancy we need to recreate any dependent types when a
entry in system.types changes.
Fixes#5049
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
With this patch db::cql_type_parser::raw_builder creates a local copy
of the list of existing types and uses that internally. By doing that
build() should have no observable behavior other than returning the
new types.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
We were never passing a null pointer and never saving a copy of the
lw_shared_ptr. Passing a reference is more flexible as not all callers
are required to hold the user_types_metadata in a lw_shared_ptr.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Before this patch, if the _gate is closed, with_gate throws and
forward_to is not executed. When the promise<> p is destroyed it marks
its _task as a broken promise.
What happens next depends on the branch.
On master, we warn when the shared_future is destroyed, so this patch
changes the warning from a broken_promise to a gate closed.
On 3.1, we warn when the promises in shared_future::_peers are
destroyed since they no longer have a future attached: The future that
was attached was the "auto f" just before the with_gate call, and it
is destroyed when with_gate throws. The net result is that this patch
fixes the warning in 3.1.
I will send a patch to seastar to make the warning on master more
consistent with the warning in 3.1.
Fixes#4394
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190917211915.117252-1-espindola@scylladb.com>
Currently, if updating bookkeeping operations for view building fails,
we log the error message and continue. However, during shutdown,
some errors are more likely to happen due to existing issues
like #4384. To differentiate actual errors from semi-expected
errors during shutdown, the latter are now logged with a warning
level instead of error.
Fixes#4954
Commit log replay was bypassing memtable space back-pressure, and if
replay was faster than memtable flush, it could lead to OOM.
The fix is to call database::apply_in_memory() instead of
table::apply(). The former blocks when memtable space is full.
Fixes#4982.
Tests:
- unit (release)
- manual, replay with memtable flush failin and without failing
Message-Id: <1568381952-26256-1-git-send-email-tgrabiec@scylladb.com>
This reverts commit 7f64a6ec4b.
Fixes#5011
The reverted commit exposes #3760 for all schemas, not only those
which have UDTs.
The problem is that table schema deserialization now requires keyspace
to be present. If the replica hasn't received schema changes which
introduce the keyspace yet, the write will fail.
So far we had the "--alternator-port" option allowing to configure the port
on which the Alternator server listens on, but the server always listened
to any address. It is important to also be able to configure the listen
address - it is useful in tests running several instances of Scylla on
the same machine, and useful in multi-homed machines with several interfaces.
So this patch adds the "--alternator-address" option, defaulting to 0.0.0.0
(to listen on all interfaces). It works like the many other "--*-address"
options that Scylla already has.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190808204641.28648-1-nyh@scylladb.com>
Until now, we always opened the Alternator port along with Scylla's
regular ports (CQL etc.). This should really be made optional.
With this patch, by default Alternator does NOT start and does not
open a port. Run Scylla with --alternator-port=8000 to open an Alternator
API port on port 8000, as was the default until now. It's also possible
to set this in scylla.yaml.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Introduce mutation_fragment_stream_validator class and use it as a
Filter to flat_mutation_reader::consume_in_thread from
sstable::write_components to validate partition region and optionally
clustering key monotonicity.
Fixes#4803
key monotonicity validation requires an overhead to store the last key and also to compare
therefore provide an option to enable/disable it (disabled by default).
Refs #4804
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In preparation to the schema push/pull migrating to use canonical
mutations, convert the method producing the schema mutations to return a
vector of canonical mutations. The only user, MIGRATION_REQUEST verb,
converts the canonical mutations back to frozen mutations. This is very
inefficient, but this path will only be used in mixed clusters. After
all nodes are upgraded the verb will be sending the canonical mutations
directly instead.
Stopping services which occurs in a destructor of deferred_action
should not throw, or it will end the program with
terminate(). View builder breaks a semaphore during its shutdown,
which results in propagating a broken_semaphore exception,
which in turn results in throwing an exception during stop().get().
In order to fix that issue, semaphore exceptions are explicitly
ignored, since they're expected to appear during shutdown.
Fixes#4875
This patches silences the remaining discarded future warnings, those
where it cannot be determined with reasonable confidence that this was
indeed the actual intent of the author, or that the discarding of the
future could lead to problems. For all those places a FIXME is added,
with the intent that these will be soon followed-up with an actual fix.
I deliberately haven't fixed any of these, even if the fix seems
trivial. It is too easy to overlook a bad fix mixed in with so many
mechanical changes.
This patch silences those future discard warnings where it is clear that
discarding the future was actually the intent of the original author,
*and* they did the necessary precautions (handling errors). The patch
also adds some trivial error handling (logging the error) in some
places, which were lacking this, but otherwise look ok. No functional
changes.
The endpoint directories scanned by space_watchdog may get deleted
by the manager::drain_for().
If a deleted directory is given to a lister::scan_dir() this will end up
in an exception and as a result a space_watchdog will skip this round
and hinted handoff is going to be disabled (for all agents including MVs)
for the whole space_watchdog round.
Let's make sure this doesn't happen by serializing the scanning and deletion
using end_point_hints_manager::file_update_mutex.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
end_point_hints_manager::file_update_mutex is taken by space_watchdog
but while space_watchdog is waiting for it the corresponding
end_point_hints_manager instance may get destroyed by manager::drain_for()
or by manager::stop().
This will end up in a use-after-free event.
Let's change the end_point_hints_manager's API in a way that would prevent
such an unsafe locking:
- Introduce the with_file_update_mutex().
- Make end_point_hints_manager::file_update_mutex() method private.
Fixes#4685Fixes#4836
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
If drain_for() is running together with itself: one instance for the local
node and one for some other node, erasing of elements from the _ep_managers
map may lead to a use-after-free event.
Let's serialize drain_for() calls with a semaphore.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
"
Current admission control takes a permit when cql requests starts and
releases it when reply is sent, but some requests may leave background
work behind after that point (some because there is genuine background
work to do like complete a write or do a read repair, and some because
a read/write may stuck in a queue longer than the request's timeout), so
after Scylla replies with a timeout some resources are still occupied.
The series fixes this by passing the permit down to storage_proxy where
it is held until all background work is completed.
Fixes#4768
"
* 'gleb/admission-v3' of github.com:scylladb/seastar-dev:
transport: add a metric to follow memory available for service permit.
storage_proxy: store a permit in a read executor
storage_proxy: store a permit in a write response handler
Pass service permit to storage_proxy
transport: introduce service_permit class and use it instead of semaphore_units
transport: hold admission a permit until a reply is sent
transport: remove cql server load balancer
Current cql transport code acquire a permit before processing a query and
release it when the query gets a reply, but some quires leave work behind.
If the work is allowed to accumulate without any limit a server may
eventually run out of memory. To prevent that the permit system should
account for the background work as well. The patch is a first step in
this direction. It passes a permit down to storage proxy where it will
be later hold by background work.
The handler is intended to be called when internal invariants are
violated and the operation cannot safely continue. The handler either
throws (default) or aborts, depending on configuration option.
Passing --abort-on-internal-error on the command line will switch to
aborting.
The reason we don't abort by default is that it may bring the whole
cluster down and cause unavailability, while it may not be necessary
to do so. It's safer to fail just the affected operation,
e.g. repair. However, failing the operation with an exception leaves
little information for debugging the root cause. So the idea is that the
user would enable aborts on only one of the nodes in the cluster to
get a core dump and not bring the whole cluster down.
Data listener reads are implemented as flat_mutation_readers, which
take a reference to the listener and then execute asynchronously.
The listener can be removed between the time when the reference is
taken and actual execution, resulting in a dangling pointer
dereference.
Fix by using a weak_ptr to avoid writing to a destroyed object. Note that writes
don't need protection because they execute atomically.
Fixes#4661.
Tests: unit (dev)
We were using segment::_closed to decide whether _file was already
closed. Unfortunately they are not exactly the same thing. As far as
I understand it, segments can be closed and reused without actually
closing the file.
Found with a seastar patch that asserts on destroying an open
append_challenged_posix_file_impl.
Fixes#4745.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190721171332.7995-1-espindola@scylladb.com>
If a schema was created before computed columns were implemented,
its token column may not have been marked as computed.
To remedy this, if no computed column is found, the schema
will be recreated.
The code will work correctly even without this patch in order to support
upgrading from legacy versions, but it's still important: it transforms
token columns from the legacy format to new computed format, which will
eventually (after a few release cycles) allow dropping the support for
legacy format altogether.
Computed columns feature should be checked before creating
index schemas the new way - by adding computed column names
to system_schema.computed_columns.
Fixes a segfault when querying for an empty keyspace.
Also, fixes an infinite loop on smp > 1. Queries to
system.size_estimates table which are not single-partition queries
caused Scylla to go into an infinite loop inside
multishard_combining_reader::fill_buffer. This happened because
multishard_combinind_reader assumes that shards return rows belonging
to separate partitions, which was not the case for
size_estimates_mutation_reader.
Fixes#4689.