Commit Graph

67 Commits

Author SHA1 Message Date
Gleb Natapov
1c04b95c68 paxos: do not signal semaphore if it was not acquired
The guard signals a semaphore during destruction if it is marked as
locked, but currently it may be marked as locked even if locking failed.
Fix this by using semaphore_units instead of managing the locked flag
manually.

Fixes: https://github.com/scylladb/scylladb/issues/19698
(cherry picked from commit 87beebeed0)
2024-07-18 15:34:16 +00:00
Tomasz Grabiec
c9294b1642 lwt: Avoid deprecated sharder::shard_of()
Instead, use shard_for_reads(). The justification is that:

 1) In cas_shard(), we need to pick a single request coordinator.
    shard_for_reads() gives that, which is equivalent to shard_of()
    if there is no intra-node migration.

 2) In paxos handler for prepare(), the shard we execute it on is
    the shard from which we read, so shard_for_reads() is the one.

 3) Updates of paxos state are separate CQL requests, and use their
    own sharding.

 4) Handler for learn is executing updates using calls to
    storage_proxy::mutate_locally() which will use the right sharder for writes

However, the code is still not prepared for intra-node migration, and
possibly regular migration too in case of abandoned requests, because
the locking of paxos state assumes that the shard is static. That
would have to be fixed separately, e.g. by locking both shards
(shard_for_writes()) during migration, so that the set of locked
shards always intersects during migration and local serialization of
paxos state updates is achieved. I left FIXMEs for that.
2024-05-16 00:28:47 +02:00
Kefu Chai
168ade72f8 treewide: replace formatter<std::string_view> with formatter<string_view>
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.

this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:

```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
  254 |     return formatter<std::string_view>::format(it->second, ctx);
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
 2759 |   FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
      |                      ^      ~~~~~~~~~~~~
```

because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18299
2024-04-19 07:44:07 +03:00
Kefu Chai
185b503b73 service/paxos: add fmt::formatter for paxos::promise
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `service::paxos::promise`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-08 14:26:58 +08:00
Kefu Chai
cb6c7bb9bf service/paxos: add fmt::formatter for paxos::proposal
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `service::paxos::proposal`,
but its operator<< is preserved, as it is still used by our generic
formatter for std::tuple<> which uses operator<< for printing the
elements in it, so operator<< of this class is indirectly used.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-08 14:26:58 +08:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Kefu Chai
ece2bd2f6e service: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16764
2024-01-15 13:29:33 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Petr Gusev
da1e6751e9 table: rename cache_truncation_record -> set_truncation_time
This is a refactoring commit without observable
changes in behaviour.

There is a truncation_record struct, but in this method we
only care about time, so rename it (and other related methods)
appropriately to avoid confusion.
2023-10-03 17:11:35 +04:00
Pavel Emelyanov
c9530eae4e migration_manager: Add system_keyspace argument to get_schema_mapping()
It will need one to pass to db::schema_tables code. The caller is paxos
code with sys_ks local variable at hand

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-28 15:54:19 +03:00
Pavel Emelyanov
b9ef16c06f db/system_keyspace: Make paxos methods non-static
The service::paxos_state methods that call those already have system
keyspace reference at hand and can call method on an object

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-19 19:32:10 +03:00
Pavel Emelyanov
d9ba8eb8df service/paxos: Add db::system_keyspace& argument to some methods
The paxos_state's .prepare(), .accept(), .learn() and .prune() methods
access system keyspace via its static methods. The only caller of those
(storage_proxy::remote) already has the sharded system k.s. reference
and can pass its .local() one as argument

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-19 19:32:10 +03:00
Kefu Chai
bab16eb30e treewide: remove #includes not use directly
for faster build times and clear inter-module dependencies, we
should not #includes headers not directly used. instead, we should
only #include the headers directly used by a certain compilation
unit.

in this change, the source files under "/compaction" directories
are checked using clangd, which identifies the cases where we have
an #include which is not directly used. all the #includes identified
by clangd are removed. because some source files rely on the incorrectly
included header file, those ones are updated to #include the header
file they directly use.

if a forward declaration suffice, the declaration is added instead.

see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warning

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-18 17:36:31 +08:00
Gleb Natapov
1ce7ad1ee6 lwt: do not destroy capture in upgrade_if_needed lambda since the lambda is used more then once
If on the first call the capture is destroyed the second call may crash.

Fixes: #12958

Message-Id: <Y/sks73Sb35F+PsC@scylladb.com>
2023-02-26 16:13:16 +02:00
Avi Kivity
e2f6e0b848 utils: move hashing related files to utils/ module
Closes #12884
2023-02-17 07:19:52 +02:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Avi Kivity
c5e4bf51bd Introduce mutation/ module
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.

mutation_reader remains in the readers/ module.

mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.

This is a step forward towards librarization or modularization of the
source base.

Closes #12788
2023-02-14 11:19:03 +02:00
Gleb Natapov' via ScyllaDB development
15ebd59071 lwt: upgrade stored mutations to the latest schema during prepare
Currently they are upgraded during learn on a replica. The are two
problems with this.  First the column mapping may not exist on a replica
if it missed this particular schema (because it was down for instance)
and the mapping history is not part of the schema. In this case "Failed
to look up column mapping for schema version" will be thrown. Second lwt
request coordinator may not have the schema for the mutation as well
(because it was freed from the registry already) and when a replica
tries to retrieve the schema from the coordinator the retrieval will fail
causing the whole request to fail with "Schema version XXXX not found"

Both of those problems can be fixed by upgrading stored mutations
during prepare on a node it is stored at. To upgrade the mutation its
column mapping is needed and it is guarantied that it will be present
at the node the mutation is stored at since it is pre-request to store
it that the corresponded schema is available. After that the mutation
is processed using latest schema that will be available on all nodes.

Fixes #10770

Message-Id: <Y7/ifraPJghCWTsq@scylladb.com>
2023-01-17 11:14:46 +01:00
Amnon Heiman
99a060126d database: Reduce the number of per-table metrics
This patch reduces the number of metrics that is reported per table, when
the per-table flag is on.

When possible, it moves from time_estimated_histogram and
timed_rate_moving_average_and_histogram to use the unified timer.

Instead of a histogram per shard, it will now report a summary per shard
and a histogram per node.

Counters, histograms, and summaries will not be reported if they were
never used.

The API was updated accordingly so it would not break.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2022-07-27 16:58:52 +03:00
Avi Kivity
5937b1fa23 treewide: remove empty comments in top-of-files
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.

Closes #10562
2022-05-13 07:11:58 +02:00
Avi Kivity
582802825a treewide: use system-#include (angle brackets) for seastar
Seastar is an external library from Scylla's point of view so
we should use the angle bracket #include style. Most of the source
follows this, this patch fixes a few stragglers.

Also fix cases of #include which reached out to seastar's directory
tree directly, via #include "seastar/include/sesatar/..." to
just refer to <seastar/...>.

Closes #10433
2022-04-26 14:46:42 +03:00
Mikołaj Sielużycki
6f1b6da68a compile: Fix headers so that *-headers targets compile cleanly.
Closes #10273
2022-03-25 16:19:26 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Avi Kivity
ae3a360725 database: Move database, keyspace, table classes to replica/ directory
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.

As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
2022-01-06 17:07:30 +02:00
Avi Kivity
7bdc999bba service: paxos_state: wean off get_local_storage_proxy()
Instead of calling get_local_storage_proxy in paxos_state, get it from the
caller (who is, in fact, storage_proxy or one of its components).

Some of the callers, although they are storage_proxy components, don't
have a storage_proxy reference handy and so they ignomiously call
get_local_storage_proxy() themselves. This will be adjusted later.

The other callers who are, in fact, storage_proxy, have to take special
care not to cross a shard boundary. When they do, smp::submit_to()
is converted to sharded::invoke_on() in order to get the correct local instance.

Test: unit (dev)

Closes #9824
2021-12-20 00:31:13 +02:00
Pavel Emelyanov
5cfeac0c90 paxos: Drop forward declarations of seastar pointers
They will break compilation after next seastar update, but the
good news is that scylla compiles even without them.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211202173643.1070-1-xemul@scylladb.com>
2021-12-02 19:49:03 +02:00
Benny Halevy
ce3fcc121e paxos_state: prepare: handle exception getting data or digest
This exception is ignored by design, but if it's left unhandled,
it generates `Exceptional future ignored` warnings, like the following.

Also, ignore f2 if f1 failed since we return early in this case.

```
[shard 5] seastar - Exceptional future ignored: seastar::named_semaphore_timed_out (Semaphore timed out: _read_concurrency_sem), backtrace: 0x431689e 0x4316d40 0x43170e8 0x3f35486 0x218d14a 0x3f8002f 0x3f81217 0x3f9f868 0x3f4b76a /opt/scylladb/libreloc/libpthread.so.0+0x93f8 /opt/scylladb/libreloc/libc.so.6+0x101902#012
N7seastar12continuationINS_8internal22promise_base_with_typeISt7variantIJN5utils4UUIDEN7service5paxos7promiseEEEEEZZZZNS7_11paxos_state7prepareEN7tracing15trace_state_ptrENS_13lw_shared_ptrIK6schemaEERKN5query12read_commandERK13partition_keyS5_bNSI_16digest_algorithmENSt6chrono10time_pointINS_12lowres_clockENSQ_8durationIlSt5ratioILl1ELl1000EEEEEEENK3$_0clEvENUlvE_clEvENKUlSB_E_clESB_EUlT_E_ZNS_6futureISt5tupleIJNS13_IvEENS13_IS14_IJNSE_INSI_6resultEEE17cache_temperatureEEEEEEE14then_impl_nrvoIS12_NS13_IS9_EEEET0_OS11_EUlOSA_RS12_ONS_12future_stateIS1B_EEE_S1B_EE#012
seastar::continuation<seastar::internal::promise_base_with_type<std::variant<utils::UUID, service::paxos::promise> >, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::finally_body<seastar::with_semaphore<seastar::semaphore_default_exception_factory, seastar::lowres_clock, service::paxos::paxos_state::prepare(tracing::trace_state_ptr, seastar::lw_shared_ptr<schema const>, query::read_command const&, partition_key const&, utils::UUID, bool, query::digest_algorithm, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::$_0::operator()() const::{lambda()#1}>(seastar::basic_semaphore<seastar::semaphore_default_exception_factory, seastar::lowres_clock>&, unsigned long, seastar::lowres_clock::duration, std::result_of&&)::{lambda(seastar::basic_semaphore)#1}::operator()<seastar::semaphore_units<seastar::semaphore_default_exception_factory, seastar::lowres_clock> >(seastar::basic_semaphore)::{lambda()#1}, false>, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::then_wrapped_nrvo<seastar::future<std::variant<utils::UUID, service::paxos::promise> >, seastar::semaphore_units<seastar::semaphore_default_exception_factory, seastar::lowres_clock> >(seastar::future<std::variant<utils::UUID, service::paxos::promise> >&&)::{lambda(seastar::internal::promise_base_with_type<std::variant<utils::UUID, service::paxos::promise> >&&, seastar::semaphore_units<seastar::semaphore_default_exception_factory, seastar::lowres_clock>&, seastar::future_state<std::variant<utils::UUID, service::paxos::promise> >&&)#1}, std::variant<utils::UUID, service::paxos::promise> >#012
seastar::continuation<seastar::internal::promise_base_with_type<std::variant<utils::UUID, service::paxos::promise> >, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::finally_body<service::paxos::paxos_state::key_lock_map::with_locked_key<service::paxos::paxos_state::prepare(tracing::trace_state_ptr, seastar::lw_shared_ptr<schema const>, query::read_command const&, partition_key const&, utils::UUID, bool, query::digest_algorithm, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::$_0::operator()() const::{lambda()#1}>(dht::token const&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >, std::result_of)::{lambda()#1}, false>, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::then_wrapped_nrvo<seastar::future<std::variant<utils::UUID, service::paxos::promise> >, {lambda()#1}>({lambda()#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::variant<utils::UUID, service::paxos::promise> >&&, {lambda()#1}&, seastar::future_state<std::variant<utils::UUID, service::paxos::promise> >&&)#1}, std::variant<utils::UUID, service::paxos::promise> >#012
seastar::continuation<seastar::internal::promise_base_with_type<std::variant<utils::UUID, service::paxos::promise> >, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::finally_body<service::paxos::paxos_state::prepare(tracing::trace_state_ptr, seastar::lw_shared_ptr<schema const>, query::read_command const&, partition_key const&, utils::UUID, bool, query::digest_algorithm, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::$_0::operator()() const::{lambda()#2}, false>, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::then_wrapped_nrvo<seastar::future<std::variant<utils::UUID, service::paxos::promise> >, service::paxos::paxos_state::prepare(tracing::trace_state_ptr, seastar::lw_shared_ptr<schema const>, query::read_command const&, partition_key const&, utils::UUID, bool, query::digest_algorithm, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::$_0::operator()() const::{lambda()#2}>(service::paxos::paxos_state::prepare(tracing::trace_state_ptr, seastar::lw_shared_ptr<schema const>, query::read_command const&, partition_key const&, utils::UUID, bool, query::digest_algorithm, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::$_0::operator()() const::{lambda()#2}&&)::{lambda(seastar::internal::promise_base_with_type<std::variant<utils::UUID, service::paxos::promise> >&&, service::paxos::paxos_state::prepare(tracing::trace_state_ptr, seastar::lw_shared_ptr<schema const>, query::read_command const&, partition_key const&, utils::UUID, bool, query::digest_algorithm, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::$_0::operator()() const::{lambda()#2}&, seastar::future_state<std::variant<utils::UUID, service::paxos::promise> >&&)#1}, std::variant<utils::UUID, service::paxos::promise> >#012
seastar::continuation<seastar::internal::promise_base_with_type<seastar::foreign_ptr<std::unique_ptr<std::variant<utils::UUID, service::paxos::promise>, std::default_delete<std::variant<utils::UUID, service::paxos::promise> > > > >, service::storage_proxy::init_messaging_service()::$_51::operator()(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, query::read_command, partition_key, utils::UUID, bool, query::digest_algorithm, std::optional<tracing::trace_info>) const::{lambda(seastar::lw_shared_ptr<schema const>)#1}::operator()(seastar::lw_shared_ptr<schema const>)::{lambda()#1}::operator()() const::{lambda(std::variant<utils::UUID, service::paxos::promise>)#1}, seastar::future<std::variant<utils::UUID, service::paxos::promise> >::then_impl_nrvo<{lambda()#1}, {lambda()#1}<seastar::foreign_ptr<std::unique_ptr<std::variant<utils::UUID, service::paxos::promise>, std::default_delete<std::variant<utils::UUID, service::paxos::promise> > > > > >({lambda()#1}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::foreign_ptr<std::unique_ptr<std::variant<utils::UUID, service::paxos::promise>, std::default_delete<std::variant<utils::UUID, service::paxos::promise> > > > >&&, {lambda()#1}&, seastar::future_state<std::variant<utils::UUID, service::paxos::promise> >&&)#1}, std::variant<utils::UUID, service::paxos::promise> >#012
seastar::continuation<seastar::internal::promise_base_with_type<seastar::foreign_ptr<std::unique_ptr<std::variant<utils::UUID, service::paxos::promise>, std::default_delete<std::variant<utils::UUID, service::paxos::promise> > > > >, seastar::future<seastar::foreign_ptr<std::unique_ptr<std::variant<utils::UUID, service::paxos::promise>, std::default_delete<std::variant<utils::UUID, service::paxos::promise> > > > >::finally_body<seastar::smp::submit_to<service::storage_proxy::init_messaging_service()::$_51::operator()(seastar::rpc::client_info const&, seastar::rpc::opt_time_point, query::read_command, partition_key, utils::UUID, bool, query::digest_algorithm, std::optional<tracing::trace_info>) const::{lambda(seastar::lw_shared_ptr<schema const>)#1}::operator()(seastar::lw_shared_ptr<schema const>)::{lambda()#1}>(unsigned int, se
```

Refs #7779
Refs #9331

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210919053007.13960-1-bhalevy@scylladb.com>
2021-09-19 11:58:21 +03:00
Pavel Emelyanov
c39f04fa6f code: Remove storage-service header from irrelevant places
Some .cc files over the code include the storage service
for no real need. Drop the header and include (in some)
what's really needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:50:19 +03:00
Avi Kivity
4d70f3baee storage_proxy: change unordered_set<inet_address> to small_vector in write path
The write paths in storage_proxy pass replica sets as
std::unordered_set<gms::inet_address>. This is a complex type, with
N+1 allocations for N members, so we change it to a small_vector (via
inet_address_vector_replica_set) which requires just one allocation, and
even zero when up to three replicas are used.

This change is more nuanced than the corresponding change to the read path
abe3d7d7 ("Merge 'storage_proxy: use small_vector for vectors of
inet_address' from Avi Kivity"), for two reasons:

 - there is a quadratic algorithm in
   abstract_write_response_handler::response(): it searches for a replica
   and erases it. Since this happens for every replica, it happens N^2/2
   times.
 - replica sets for writes always include all datacenters, while reads
   usually involve just one datacenter.

So, a write to a keyspace that has 5 datacenters will invoke 15*(15-1)/2
=105 compares.

We could remove this by sending the index of the replica in the replica
set to the replica and ask it to include the index in the response, but
I think that this is unnecessary. Those 105 compares need to be only
105/15 = 7 times cheaper than the corresponding unordered_set operation,
which they surely will. Handling a response after a cross-datacenter round
trip surely involves L3 cache misses, and a small_vector reduces these
to a minimum compared to an unordered_set with its bucket table, linked
list walking and managent, and table rehashing.

Tests using perf_simple_query --write --smp 1 --operations-per-shard 1000000
 --task-quota-ms show two allocations removed (as expected) and a nice
reduction in instructions executed.

before: median 204842.54 tps ( 54.2 allocs/op,  13.2 tasks/op,   49890 insns/op)
after:  median 206077.65 tps ( 52.2 allocs/op,  13.2 tasks/op,   49138 insns/op)

Closes #8847
2021-06-17 13:46:40 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Pavel Solodovnikov
2187a59089 treewide: move service::cas_request out from storage_proxy.hh
And remove all remaining inclusions of `storage_proxy.hh` in the
headers.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-06-06 19:18:49 +03:00
Konstantin Osipov
c83cf1f965 uuid: switch the API to use std::chrono
A follow up for the patch for #7611. This change was requested
during review and moved out of #7611 to reduce its scope.

The patch switches UUID_gen API from using plain integers to
hold time units to units from std::chrono.

For one, we plan to switch the entire code base to std::chrono units,
to ensure type safety. Secondly, using std::chrono units allows to
increase code reuse with template metaprogramming and remove a few
of UUID_gen functions that beceme redundant as a result.

* switch  get_time_UUID(), unix_timestamp(), get_time_UUID_raw(), switch
  min_time_UUID(), max_time_UUID(), create_time_safe() to
  std::chrono
* remove unused variant of from_unix_timestamp()
* remove unused get_time_UUID_bytes(), create_time_unsafe(),
  redundant get_adjusted_timestamp()
* inline get_raw_UUID_bytes()
* collapse to similar implementations of get_time_UUID()
* switch internal constants to std::chrono
* remove unnecessary unique_ptr from UUID_gen::_instance
Message-Id: <20210406130152.3237914-2-kostja@scylladb.com>
2021-04-06 17:12:54 +03:00
Benny Halevy
088f92e574 paxos_state: learn: fix injected error description
It was copy-pasted from another injection point.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201220091439.3604201-1-bhalevy@scylladb.com>
2021-01-24 11:51:23 +02:00
Gleb Natapov
85cffd1aeb lwt: rewrite storage_proxy::cas using coroutings
Makes code much simpler to understand.

Message-Id: <20201201160213.GW1655743@scylladb.com>
2020-12-17 18:15:35 +01:00
Pavel Solodovnikov
055fd3d8ad lwt: store column_mapping's for each table schema version upon a DDL change
This patch introduces a new system table: `system.scylla_table_schema_history`,
which is used to keep track of column mappings for obsolete table
schema versions (i.e. schema becomes obsolete when it's being changed
by means of `CREATE TABLE` or `ALTER TABLE` DDL operations).

It is populated automatically when a new schema version is being
pulled from a remote in get_schema_definition() at migration_manager.cc
and also when schema change is being propagated to system schema tables
in do_merge_schema() at schema_tables.cc.

The data referring to the most recent table schema version is always
present. Other entries are garbage-collected when the corresponding
table schema version is obsoleted (they will be updated with a TTL equal
to `DEFAULT_GC_GRACE_SECONDS` on `ALTER TABLE`).

In case we failed to persist column mapping after a schema change,
missing entries will be recreated on node boot.

Later, the information from this table is used in `paxos_state::learn`
callback in case we have a mismatch between the most recent schema
version and the one that is stored inside the `frozen_mutation`
for the accepted proposal.

Such situation may arise under following circumstances:
 1. The previous LWT operation crashed on the "accept" stage,
    leaving behind a stale accepted proposal, which waits to be
    repaired.
 2. The table affected by LWT operation is being altered, so that
    schema version is now different. Stored proposal now references
    obsolete schema.
 3. LWT query is retried, so that Scylla tries to repair the
    unfinished Paxos round and apply the mutation in the learn stage.

When such mismatch happens, prior to that patch the stored
`frozen_mutation` is able to be applied only if we are lucky enough
and column_mapping in the mutation is "compatible" with the new
table schema.

It wouldn't work if, for example, the columns are reordered, or
some columns, which are referenced by an LWT query, are dropped.

With this patch we try to look up the column mapping for
the obsolete schema version, then upgrade the stored mutation
using obtained column mapping and apply an upgraded mutation instead.

In case we don't find a column_mapping we just return an error
from the learn stage.

Tests: unit(dev, debug), dtests(paxos_tests.py:TestPaxos.schema_mismatch_*_test)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2020-10-15 19:24:30 +03:00
Pavel Solodovnikov
88ba184247 paxos: use schema_registry when applying accepted proposal if there is schema mismatch
Try to look up and use schema from the local schema_registry
in case when we have a schema mismatch between the most recent schema
version and the one that is stored inside the `frozen_mutation` for
the accepted proposal.

When such situation happens the stored `frozen_mutation` is able
to be applied only if we are lucky enough and column_mapping in
the mutation is "compatible" with the new table schema.

It wouldn't work if, for example, the columns are reordered, or
some columns, which are referenced by an LWT query, are dropped.

With the patch we are able to mitigate these cases as long as the
referenced schema is still present in the node cache (e.g.
it didn't restart/crash or the cache entry is not too old
to be evicted).

Tests: unit(dev, debug), dtest(paxos_tests.schema_mismatch_*_test)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200827150844.624017-1-pa.solodovnikov@scylladb.com>
2020-08-27 19:04:09 +02:00
Pavel Solodovnikov
9aa4712270 lwt: introduce paxos_grace_seconds per-table option to set paxos ttl
Previously system.paxos TTL was set as max(3h, gc_grace_seconds).

Introduce new per-table option named `paxos_grace_seconds` to set
the amount of seconds which are used to TTL data in paxos tables
when using LWT queries against the base table.

Default value is equal to `DEFAULT_GC_GRACE_SECONDS`,
which is 10 days.

This change allows to easily test various issues related to paxos TTL.

Fixes #6284

Tests: unit (dev, debug)

Co-authored-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200816223935.919081-1-pa.solodovnikov@scylladb.com>
2020-08-17 16:44:14 +02:00
Botond Dénes
159d37053d storage_proxy: use read_command::max_result_size to pass max result size around
Use the recently added `max_result_size` field of `query::read_command`
to pass the max result size around, including passing it to remote
nodes. This means that the max result size will be sent along each read,
instead of once per connection.
As we want to select the appropriate `max_result_size` based on the type
of the query as well as based on the query class (user or internal) the
previous method won't do anymore. If the remote doesn't fill this
field, the old per-connection value is used.
2020-07-28 18:00:29 +03:00
Amnon Heiman
186301aff8 per table metrics: change estimated_histogram to time_estimated_histogram
This patch changes the per table latencies histograms: read, write,
cas_prepare, cas_accept, and cas_learn.

Beside changing the definition type and the insertion method, the API
was changed to support the new metrics.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-07-14 11:17:43 +03:00
Rafael Ávila de Espíndola
555d8fe520 build: Be consistent about system versus regular headers
We were not consistent about using '#include "foo.hh"' instead of
'#include <foo.hh>' for scylla's own headers. This patch fixes that
inconsistency and, to enforce it, changes the build to use -iquote
instead of -I to find those headers.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200608214208.110216-1-espindola@scylladb.com>
2020-06-10 15:49:51 +03:00
Alejo Sanchez
59d60ae672 paxos: fix indentation
Fix indentation

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-06-03 14:47:18 +02:00
Alejo Sanchez
019c96cfda paxos: add error injections
Adds error injections on critical points for:

    prepare
    accept
    learn
    release_semaphore_for_key

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-06-03 14:44:53 +02:00
Piotr Sarna
92aadb94e5 treewide: propagate trace state to write path
In order to add tracing to places where it can be useful,
e.g. materialized view updates and hinted handoff, tracing state
is propagated to all applicable call sites.
2020-05-18 16:05:23 +02:00
Avi Kivity
beaeda5234 database: remove variadic future from query() and query_mutations()
Variadic futures are deprecated; replace with future<std::tuple<...>>.

Tests: unit (dev)
2020-05-17 18:45:38 +02:00
Gleb Natapov
97af6bb0bd lwt: make load_paxos_state to take partition_key_view instead of a deference
Some caller have partition_key_view, but not partition_key, so thy need
to create a temporary and copy just to pass a reference. Change it by
accepting a view.
2020-04-22 13:51:43 +03:00
Gleb Natapov
c970da3811 lwt: do not copy proposal in paxos_state::accept
A proposal is passed as a reference and all callers have it in stable
memory until the call ends, so it is safe to use the reference
everywhere.
2020-04-22 13:51:43 +03:00
Konstantin Osipov
18b9bb57ac lwt: rename metrics to match accepted terminology
Rename inherited metrics cas_propose and cas_commit
to cas_accept and cas_learn respectively.

A while ago we made a decision to stick to widely accepted
terms for Paxos rounds: prepare, accept, learn. The rest
of the code is using these terms, so rename the metrics
to avoid confusion/technical debt.

While at it, rename a few internal methods and functions.

Fixes #6169

Message-Id: <20200414213537.129547-1-kostja@scylladb.com>
2020-04-15 12:20:30 +02:00
Pavel Solodovnikov
3206c1bf66 paxos_state: introduce error injections for testing timeouts in paxos stages
The following sleep injections are added to paxos_state:
 * paxos_state_prepare_timeout (timeouts in paxos_state::prepare)
 * paxos_state_accept_timeout (timeouts in paxos_state::accept)
 * paxos_state_learn_timeout (timeouts in paxos_state::learn)

Tests: unit ({dev}), unit ({debug})

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20200403092107.181057-1-alejo.sanchez@scylladb.com>
2020-04-08 10:47:15 +02:00
Gleb Natapov
8a408ac5a8 lwt: remove entries from system.paxos table after successful learn stage
The learning stage of PAXOS protocol leaves behind an entry in
system.paxos table with the last learned value (which can be large). In
case not all participants learned it successfully next round on the same
key may complete the learning using this info. But if all nodes learned
the value the entry does not serve useful purpose any longer.

The patch adds another round, "prune", which is executed in background
(limited to 1000 simultaneous instances) and removes the entry in
case all nodes replied successfully to the "learn" round.  It uses the
ballot's timestamp to do the deletion, so not to interfere with the
next round. Since deletion happens very close to previous writes it will
likely happen in memtable and will never reach sstable, so that reduces
memtable flush and compaction overhead.

Fixes #5779

Message-Id: <20200330154853.GA31074@scylladb.com>
2020-03-30 21:02:14 +03:00