Commit Graph

555 Commits

Author SHA1 Message Date
Kefu Chai
0b69a1badc transport: cast unaligned<T> to T for formatting it
in fmt v10, it does not cast unaligned<T> to T when formatting it,
instead it insists on finding a matched fmt::formatter<> specialization for it.
that's why we have FTBFS with fmt v10 when printing
these packed<T> variables with fmtlib v10.

in this change, we just cast them to the underlying types before
formatting them. because seastar::unaligned<T> does not provide
a method for accessing the raw value, neither does it provide
a type alias of the type of the underlying raw value, we have
to cast to the type without deducing it from the printed value.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16167
2023-11-27 15:26:13 +02:00
Kefu Chai
a9c1a435ec result_message: add formatter for result_message::rows
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for
`cql_transport::messages::result_message::rows`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16143
2023-11-23 11:12:55 +02:00
Tomasz Grabiec
b06a0078fb Merge 'Support for sending tablet info to the drivers' from Sylwia Szunejko
There is a need for sending tablet info to the drivers so they can be tablet aware. For the best performance we want to get this info lazily only when it is needed.

The info is send when driver asks about the information that the specific tablet contains and it is directed to the wrong node/shard so it could use that information for every subsequent query. If we send the query to the wrong node/shard, we want to send the RESULT message with additional information about the tablet (replicas and token range) in custom_payload.

Mechanism for sending custom_payload added.

Sending custom_payload tested using three node cluster and cqlsh queries. I used RF=1 so choosing wrong node was testable.

I also manually tested it with the python-driver and confirmed that the tablet info can be deserialized properly.

Automatic tests added.

Closes scylladb/scylladb#15410

* github.com:scylladb/scylladb:
  docs: add documentation about sending tablet info to protocol extensions
  Add tests for sending tablet info
  cql3: send tablet if wrong node/shard is used during modification statement
  cql3: send tablet if wrong node/shard is used during select statement
  locator: add function to check locality
  locator: add function to check if host is local
  transport: add function to add tablet info to the result_message
  transport: add support for setting custom payload
2023-11-22 17:44:07 +02:00
sylwiaszunejko
93420353f4 transport: add function to add tablet info to the result_message 2023-11-21 15:15:20 +01:00
sylwiaszunejko
75b3dbf7ea transport: add support for setting custom payload
A custom payload can now be added to response_message.
If it is set, it will be sent to client and the custom_payload
flag will be set.

write_string_bytes_map method is added to response class
and a missing custom_payload flag is added to
cql_frame_flags.
2023-11-21 15:09:36 +01:00
Kefu Chai
15bfa09454 treewide: do not mark return value const if this has no effect
this change is a cleanup.

to mark a return value without value semantics has no effect. these
`const` specifier useless. so let's drop them.

and, if we compile the tree with `-Wignore-qualifiers`, the compiler
would warn like:

```
/home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers]
  245 |     const index_metadata_kind kind() const;
      |     ^~~~~
```
so this change also silences the above warnings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-17 17:46:19 +08:00
Kefu Chai
efd65aebb2 build: cmake: add check-header target
to have feature parity with `configure.py`. we won't need this
once we migrate to C++20 modules. but before that day comes, we
need to stick with C++ headers.

we generate a rule for each .hh files to create a corresponding
.cc and then compile it, in order to verify the self-containness of
that header. so the number of rule is quite large, to avoid the
unnecessary overhead. the check-header target is enabled only if
`Scylla_CHECK_HEADERS` option is enabled.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15913
2023-11-13 10:27:06 +02:00
Botond Dénes
844a0e426f Merge 'Mark counters with skip when empty' from Amnon Heiman
This series mark multiple high cardinality counters with skip_when_empty flag.
After this patch the following counters will not be reported if they were never used:
```
scylla_transport_cql_errors_total
scylla_storage_proxy_coordinator_reads_local_node
scylla_storage_proxy_coordinator_completed_reads_local_node
scylla_transport_cql_errors_total
```
Also marked, the CAS related CQL operations.
Fixes #12751

Closes scylladb/scylladb#13558

* github.com:scylladb/scylladb:
  service/storage_proxy.cc: mark counters with skip_when_empty
  cql3/query_processor.cc: mark cas related metrics with skip_when_empty
  transport/server.cc: mark metric counter with skip_when_empty
2023-09-19 15:02:39 +03:00
Avi Kivity
ab6988c52f Merge "auth: do not grant permissions to creator without actually creating" from Wojciech Mitros
Currently, when creating the table, permissions may be mistakenly
granted to the user even if the table is already existing. This
can happen in two cases:

The query has a IF NOT EXISTS clause - as a result no exception
is thrown after encountering the existing table, and the permission
granting is not prevented.
The query is handled by a non-zero shard - as a result we accept
the query with a bounce_to_shard result_message, again without
preventing the granting of permissions.
These two cases are now avoided by checking the result_message
generated when handling the query - now we only grant permissions
when the query resulted in a schema_change message.

Additionally, a test is added that reproduces both of the mentioned
cases.

CVE-2023-33972

Fixes #15467.

* 'no-grant-on-no-create' of github.com:scylladb/scylladb-ghsa-ww5v-p45p-3vhq:
  auth: do not grant permissions to creator without actually creating
  transport: add is_schema_change() method to result_message
2023-09-18 21:47:28 +03:00
Pavel Emelyanov
b42391bfbe transport: Shutdown server on disablebinary
... and do the real "sharded::stop" in the background. On node shutdown
it needs to pick up all dangling background stopping.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:37:48 +03:00
Pavel Emelyanov
bc2d44994a transport/controller: Coroutinize do_stop_server()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:32:07 +03:00
Pavel Emelyanov
7701aa0789 transport/controller: Coroutinize stop_server()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:32:07 +03:00
Nadav Har'El
548386a0bb treewide: reduce include of cql_statement.hh
ClangBuildAnalyzer reports cql3/cql_statement.hh as being one of the
most expensive header files in the project - being included (mostly
indirectly) in 129 source files, and costing a total of 844 CPU seconds
of compilation.

This patch is an attempt, only *partially* successful, to reduce the
number of times that cql_statement.hh is included. It succeeds in
lowering the number 129 to 99, but not less :-( One of the biggest
difficulties in reducing it further is that query_processor.hh includes
a lot of templated code, which needs stuff from cql_statement.hh.
The solution should be to un-template the functions in
query_processor.hh and move them from the header to a source file, but
this is beyond the scope of this patch and query_processor.hh appears
problematic in other respects as well.

Unfortunately the compilation speedup by this patch is negligible
(the `du -bc build/dev/**/*.o` metric shows less than 0.01% reduction).
Beyond the fact that this patch only removes 30% of the inclusions of
this header, it appears that most of the source files that no longer
include cql_statement.hh after this patch, included anyway many of the
other headers that cql_statement.hh included, so the saving is minimal.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15212
2023-09-08 13:23:50 +03:00
Amnon Heiman
1abcd4bb11 transport/server.cc: mark metric counter with skip_when_empty
This patch mark scylla_transport_cql_errors_total with skip_when_empty
flag.

It reduces the overhead for metrics for types that are never reported.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2023-08-23 09:30:35 -04:00
Gleb Natapov
4ffc39d885 cql3: Extend the scope of group0_guard during DDL statement execution
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.

Fixes: #13942

Message-ID: <ZNsynXayKim2XAFr@scylladb.com>
2023-08-17 15:52:48 +03:00
Avi Kivity
d57a951d48 Revert "cql3: Extend the scope of group0_guard during DDL statement execution"
This reverts commit 70b5360a73. It generates
a failure in group0_test .test_concurrent_group0_modifications in debug
mode with about 4% probability.

Fixes #15050
2023-08-15 00:26:45 +03:00
Gleb Natapov
70b5360a73 cql3: Extend the scope of group0_guard during DDL statement execution
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.

Fixes: #13942

Message-ID: <ZNSWF/cHuvcd+g1t@scylladb.com>
2023-08-13 14:19:39 +03:00
Kefu Chai
565f5c7380 transport: correct format string when printing logging message
we print the stream id in the logging messages, but in this case,
we forgot to pass `stream` to `log::debug()`. but the placeholder
for `stream` was added. if the underlying fmtlib actually formats
the argument with the format string, it would throw.

fortunately, we don't enable debug level logging often, guess that's
why we haven't spotted this issue yet.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14620
2023-07-13 11:21:43 +03:00
Calle Wilund
20e9619bb1 transport: Try to do early, transport based auth if possible
Bypassing the need for an AUTH message+response. I.e. do auth _without_ client having
login specified.
2023-06-26 15:00:21 +00:00
Wojciech Mitros
7883a88abd transport: add is_schema_change() method to result_message
In the next patch, we will want to observe when the result
message is a schema change and handle it differently than
when it is not. This patch adds a helper method for that,
which should be more readable than a dynamic_pointer_cast
and a comparison with nullptr.
2023-06-26 12:22:14 +02:00
Kefu Chai
c3d91f5190 tracing: drop trace(.., std::string&&) overload
this change is a follow-up of 4f5fcb02fd,
the goal is to avoid the programming oversights like

```c++
trace(trace_ptr, "foo {} with {} but {} is {}");
```

as `trace(const trace_state_ptr& p, const std::string& msg)` is
a better match than the templated one, i.e.,
`trace(const trace_state_ptr& p, fmt::format_string<T...> fmt, T&&...
args)`. so we cannot detect this with the compile-time format checking.

so let's just drop this overload, and update its callers to use
the other overload.

The change was suggested by Avi. the example also came from him.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14188
2023-06-10 20:09:35 +03:00
Avi Kivity
26c8470f65 treewide: use #include <seastar/...> for seastar headers
We treat Seastar as an external library, so fix the few places
that didn't do so to use angle brackets.

Closes #14037
2023-06-06 08:36:09 +03:00
Avi Kivity
42a1ced73b cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt
The expression system uses managed_bytes_opt for values, but result_set
uses bytes_opt. This means that processing values from the result set
in expressions requires a copy.

Out of the two, managed_bytes_opt is the better choice, since it prevents
large contiguous allocations for large blobs. So we switch result_set
to use managed_bytes_opt. Users of the result_set API are adjusted.

The db::function interface is not modified to limit churn; instead we
convert the types on entry and exit. This will be adjusted in a following
patch.
2023-05-07 17:17:36 +03:00
Kefu Chai
b76877fd99 transport: capture reference to temp value by value
`current_scheduling_group()` returns a temporary value, and `name()`
returns a reference, so we cannot capture the return value by reference,
and use the reference after this expression is evaluated. this would
cause undefined behavior. so let's just capture it by value.

this change also silence following warning from GCC-13:

```
/home/kefu/dev/scylladb/transport/server.cc:204:11: error: possibly dangling reference to a temporary [-Werror=dangling-reference]
  204 |     auto& cur_sg_name = current_scheduling_group().name();
      |           ^~~~~~~~~~~
/home/kefu/dev/scylladb/transport/server.cc:204:56: note: the temporary was destroyed at the end of the full expression ‘seastar::current_scheduling_group().seastar::scheduling_group::name()’
  204 |     auto& cur_sg_name = current_scheduling_group().name();
      |                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
```

Fixes #13719
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13724
2023-05-01 22:40:36 +03:00
Kamil Braun
30cc07b40d Merge 'Introduce tablets' from Tomasz Grabiec
This PR introduces an experimental feature called "tablets". Tablets are
a way to distribute data in the cluster, which is an alternative to the
current vnode-based replication. Vnode-based replication strategy tries
to evenly distribute the global token space shared by all tables among
nodes and shards. With tablets, the aim is to start from a different
side. Divide resources of replica-shard into tablets, with a goal of
having a fixed target tablet size, and then assign those tablets to
serve fragments of tables (also called tablets). This will allow us to
balance the load in a more flexible manner, by moving individual tablets
around. Also, unlike with vnode ranges, tablet replicas live on a
particular shard on a given node, which will allow us to bind raft
groups to tablets. Those goals are not yet achieved with this PR, but it
lays the ground for this.

Things achieved in this PR:

  - You can start a cluster and create a keyspace whose tables will use
    tablet-based replication. This is done by setting `initial_tablets`
    option:

    ```
        CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
                        'replication_factor': 3,
                        'initial_tablets': 8};
    ```

    All tables created in such a keyspace will be tablet-based.

    Tablet-based replication is a trait, not a separate replication
    strategy. Tablets don't change the spirit of replication strategy, it
    just alters the way in which data ownership is managed. In theory, we
    could use it for other strategies as well like
    EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy
    is augmented to support tablets.

  - You can create and drop tablet-based tables (no DDL language changes)

  - DML / DQL work with tablet-based tables

    Replicas for tablet-based tables are chosen from tablet metadata
    instead of token metadata

Things which are not yet implemented:

  - handling of views, indexes, CDC created on tablet-based tables
  - sharding is done using the old method, it ignores the shard allocated in tablet metadata
  - node operations (topology changes, repair, rebuild) are not handling tablet-based tables
  - not integrated with compaction groups
  - tablet allocator piggy-backs on tokens to choose replicas.
    Eventually we want to allocate based on current load, not statically

Closes #13387

* github.com:scylladb/scylladb:
  test: topology: Introduce test_tablets.py
  raft: Introduce 'raft_server_force_snapshot' error injection
  locator: network_topology_strategy: Support tablet replication
  service: Introduce tablet_allocator
  locator: Introduce tablet_aware_replication_strategy
  locator: Extract maybe_remove_node_being_replaced()
  dht: token_metadata: Introduce get_my_id()
  migration_manager: Send tablet metadata as part of schema pull
  storage_service: Load tablet metadata when reloading topology state
  storage_service: Load tablet metadata on boot and from group0 changes
  db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()
  migration_notifier: Introduce before_drop_keyspace()
  migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
  test: perf: Introduce perf-tablets
  test: Introduce tablets_test
  test: lib: Do not override table id in create_table()
  utils, tablets: Introduce external_memory_usage()
  db: tablets: Add printers
  db: tablets: Add persistence layer
  dht: Use last_token_of_compaction_group() in split_token_range_msb()
  locator: Introduce tablet_metadata
  dht: Introduce first_token()
  dht: Introduce next_token()
  storage_proxy: Improve trace-level logging
  locator: token_metadata: Fix confusing comment on ring_range()
  dht, storage_proxy: Abstract token space splitting
  Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries"
  db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
  db: Introduce get_non_local_vnode_based_strategy_keyspaces()
  service: storage_proxy: Avoid copying keyspace name in write handler
  locator: Introduce per-table replication strategy
  treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type
  locator: Introduce effective_replication_map
  locator: Rename effective_replication_map to vnode_effective_replication_map
  locator: effective_replication_map: Abstract get_pending_endpoints()
  db: Propagate feature_service to abstract_replication_strategy::validate_options()
  db: config: Introduce experimental "TABLETS" feature
  db: Log replication strategy for debugging purposes
  db: Log full exception on error in do_parse_schema_tables()
  db: keyspace: Remove non-const replication strategy getter
  config: Reformat
2023-04-27 09:40:18 +02:00
Botond Dénes
3e92bcaa20 Merge 'utils: redesign reusable_buffer' from Michał Chojnowski
Common compression libraries work on contiguous buffers.
Contiguous buffers are a problem for the allocator. However, as long as they are short-lived,
we can avoid the expensive allocations by reusing buffers across tasks.

This idea is already applied to the compression of CQL frames, but with some deficiencies.
`utils: redesign reusable_buffer` attempts to improve upon it in a few ways. See its commit message for an extended discussion.

Compression buffer reuse also happens in the zstd SSTable compressor, but the implementation is misguided. Every `zstd_processor` instance reuses a buffer, but each instance has its own buffer. This is very bad, because a healthy database might have thousands of concurrent instances (because there is one for each sstable reader). Together, the buffers might require gigabytes of memory, and the reuse actually *increases* memory pressure significantly, instead of reducing it.
`zstd: share buffers between compressor instances` aims to improve that by letting a single buffer be shared across all instances on a shard.

Closes #13324

* github.com:scylladb/scylladb:
  zstd: share buffers between compressor instances
  utils: redesign reusable_buffer
2023-04-27 09:09:09 +03:00
Michał Chojnowski
bf26a8c467 utils: redesign reusable_buffer
Large contiguous buffers put large pressure on the allocator
and are a common source of reactor stalls. Therefore, Scylla avoids
their use, replacing it with fragmented buffers whenever possible.
However, the use of large contiguous buffers is impossible to avoid
when dealing with some external libraries (i.e. some compression
libraries, like LZ4).

Fortunately, calls to external libraries are synchronous, so we can
minimize the allocator impact by reusing a single buffer between calls.

An implementation of such a reusable buffer has two conflicting goals:
to allocate as rarely as possible, and to waste as little memory as
possible. The bigger the buffer, the more likely that it will be able
to handle future requests without reallocation, but also the memory
memory it ties up.

If request sizes are repetitive, the near-optimal solution is to
simply resize the buffer up to match the biggest seen request,
and never resize down.

However, if we anticipate pathologically large requests, which are
caused by an application/configuration bug and are never repeated
again after they are fixed, we might want to resize down after such
pathological requests stop, so that the memory they took isn't tied
up forever.

The current implementation of reusable buffers handles this by
resizing down to 0 every 100'000 requests.

This patch attempts to solve a few shortcomings of the current
implementation.
1. Resizing to 0 is too aggressive. During regular operation, we will
surely need to resize it back to the previous size again. If something
is allocated in the hole left by the old buffer, this might cause
a stall. We prefer to resize down only after pathological requests.
2. When resizing, the current implementation allocates the new buffer
before freeing the old one. This increases allocator pressure for no
reason.
3. When resizing up, the buffer is resized to exactly the requested
size. That is, if the current size is 1MiB, following requests
of 1MiB+1B and 1MiB+2B will both cause a resize.
It's preferable to limit the set of possible sizes so that every
reset doesn't tend to cause multiple resizes of almost the same size.
The natural set of sizes is powers of 2, because that's what the
underlying buddy allocator uses. No waste is caused by rounding up
the allocation to a power of 2.
4. The interval of 100'000 uses is both too low and too arbitrary.
This is up for discussion, but I think that it's preferable to base
the dynamics of the buffer on time, rather than the number of uses.
It's more predictable to humans.

The implementation proposed in this patch addresses these as follows:
1. Instead of resizing down to 0, we resize to the biggest size
   seen in the last period.
   As long as at least one maximal (up to a power of 2) "normal" request
   appears each period, the buffer will never have to be resized.
2. The capacity of the buffer is always rounded up to the nearest
   power of 2.
3. The resize down period is no longer measured in number of requests
   but in real time.

Additionally, since a shared buffer in asynchronous code is quite a
footgun, some rudimentary refcounting is added to assert that only
one reference to the buffer exists at a time, and that the buffer isn't
downsized while a reference to it exists.

Fixes #13437
2023-04-26 22:09:17 +02:00
Tomasz Grabiec
41e69836fd db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata() 2023-04-24 10:49:37 +02:00
Kefu Chai
ebf5e138e8 redis,thrift,transport: make timeout_config live-updateable
* timeout_config
  - add `updated_timeout_config` which represents an always-updated
    options backed by `utils::updateable_value<>`. this class is
    used by servers which need to access the latest timeout related
    options. the existing `timeout_config` is more like a snapshot
    of the `updated_timeout_config`. it is used in the use case where
    we don't need to most updated options or we update the options
    manually on demand.
* redis, thrift, transport: s/timeout_config/updated_timeout_config/
  when appropriate. use the improved version of timeout_config where
  we need to have the access to the most-updated version of the timeout
  options.

Fixes #10172
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:17:45 +08:00
Kefu Chai
1cc28679bc transport: mark cql_server::timeout_config() const
this function returns a const reference to member variable, so we
can mark it with the `const` specifier for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:02 +08:00
Kefu Chai
d72ab78ffd transport: drop unused member function
since `cql_server::connection::timeout_config()` is used nowhere,
let's just drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:02 +08:00
Kefu Chai
c642ca9e73 redis,thrift,transport: initialize _config with std::move(config)
instead of copying the `config` parameter, move away from it.

this change also prepares for a non-copyable config. if the class
of `config` is not copyable, we will not be able to initialize
the member variable by copying from the given `config` parameter.
after the live-updateable config change, the `_config` member
variable will contain instances of utils::observer<>, which is
not copyable, but is move-constructable, hence in this change,
we just move away from the give `config`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:02 +08:00
Kefu Chai
e0ac2eb770 redis,thrift,transport: pass config via sharded_parameter
* pass config via sharded_parameter
* initialize config using designated initializer

this change paves the road to servers with live-updateable timeout
options.

before this change, the servers initialize a domain specific combo
config, like `redis_server_config`,  with the same instance of a
timeout_config, and pass the combox config as a ctor parameter to
construct each sharded service instance. but this design assumes
the value semantic of the config class, say, it should be copyable.
but if we want to use utils::updateable_value<> to get updated
option values, we would have to postpone the instantiation of the
config until the sharded service is about to be initialized.

so, in this change, instead of taking a domain specific config created
before hand, all services constructed with a `timeout_config` will
take a `sharded_parameter()` for creating the config. also, take
this opportunity to initialize the config using designated initializer.
for two reasons:

* less repeatings this way. we don't have to repeat the variable
  name of the config being initialized for each member variable.
* prepare for some member variables which do not have a default
  constructor. this applies to the timeout_config's updater which
  will not have a default constructor, as it should be initialized
  by db::config and a reference to the timeout_config to be updated.

we will update the `timeout_config` side in a follow-up commit.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:00 +08:00
Botond Dénes
19560419d2 Merge 'treewide: improve compatibility with gcc 13' from Avi Kivity
An assortment of patches that reduce our incompatibilities with the upcoming gcc 13.

Closes #13243

* github.com:scylladb/scylladb:
  transport: correctly format unknown opcode
  treewide: catch by reference
  test: raft: avoid confusing string compare
  utils, types, test: extract lexicographical compare utilities
  test: raft: fsm_test: disambiguate raft::configuration construction
  test: reader_concurrency_semaphore_test: handle all enum values
  repair: fix signed/unsigned compare
  repair: fix incorrect signed/unsigned compare
  treewide: avoid unused variables in if statements
  keys: disambiguate construction from initializer_list<bytes>
  cql3: expr: fix serialize_listlike() reference-to-temporary with gcc
  compaction: error on invalid scrub type
  treewide: prevent redefining names
  api: task_manager: fix signed/unsigned compare
  alternator: streams: fix signed/unsigned comparison
  test: fix some mismatched signed/unsigned comparisons
2023-03-24 15:16:05 +02:00
Vlad Zolotarov
f94bbc5b34 transport: add per-scheduling-group CQL opcode-specific metrics
This patch extends a previous patch that added these metrics globally:
 - cql_requests_count
 - cql_request_bytes
 - cql_response_bytes

This patch adds a "scheduling_group_name" label to these metrics and changes corresponding
counters to be accounted on a per-scheduling-group level.

As a bonus this patch also marks all 3 metrics as 'skip_when_empty'.

Ref #13061

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20230321201412.3004845-1-vladz@scylladb.com>
2023-03-22 13:27:48 +02:00
Avi Kivity
19810cfc5e transport: correctly format unknown opcode
gcc allows an enum to contain values outside its members. For extra
safety, as this can be user visible, format the unknown opcode and
return it.
2023-03-21 15:43:00 +02:00
Avi Kivity
e75009cd49 treewide: catch by reference
gcc rightly warns about capturing by value, so capture by
reference.
2023-03-21 15:43:00 +02:00
Kefu Chai
21a7c439bb build: cmake: find Snappy before using it
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 22:53:42 +08:00
Vlad Zolotarov
ae6724f155 transport: refactor CQL metrics
This patch reorganizes and extends CQL related metrics.
Before this patch we only had counters for specific CQL requests.

However, many times we need to reason about the size of CQL queries: corresponding
requests and response sizes.

This patch adds corresponding metrics:
  - Arranges all 3 per-opcode statistics counters in a single struct.
  - Defines a vector of such structs for each CQL opcode.
  - Adjusts statistics updates accordingly - the code is much simpler
    now.
  - Removes old metrics that were accounting some CQL opcodes.
  - Adds new per-opcode metrics for requests number, request and response sizes:
     - New metrics are of a derived kind - rate() should be applied to them.
     - There are 3 new metrics names:
       - 'cql_requests_count'
       - 'cql_request_bytes'
       - 'cql_response_bytes'
     - New metrics have a per-opcode label - 'kind'.

 For example:

 A number of response bytes for an EXECUTE opcode on shard 0 looks as follows:

 scylla_transport_cql_response_bytes{kind="EXECUTE",shard="0"}

Ref #13061

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20230302154816.299721-1-vladz@scylladb.com>
2023-03-07 12:02:34 +02:00
Kefu Chai
563fbb2d11 build: cmake: extract more subsystem out into its own CMakeLists.txt
namely, cdc, compaction, dht, gms, lang, locator, mutation_writer, raft, readers, replica,
service, tools, tracing and transport.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 10:15:25 +08:00
Kefu Chai
412953fdd5 compress, transport: do not detect LZ4_compress_default()
`LZ4_compress_default()` was introduced in liblz4 v1.7.3, despite
that the release note (https://github.com/lz4/lz4/releases/tag/v1.7.3)
of v1.7.3 didn't mention this. if we check the commit which added
this API, we can find all releases including it: see
```
$ git tag --contains 1b17bf2ab8cf66dd2b740eca376e2d46f7ad7041
lz4-r130
r129
r130
r131
rc129v0
v1.7.3
v1.7.4
v1.7.4.2
v1.7.5
v1.8.0
v1.8.1
v1.8.1.2
v1.8.2
v1.8.3
v1.9.0
v1.9.1
v1.9.2
v1.9.3
v1.9.4
```

and v1.7.3 was released in Nov 17, 2016. some popular distros
releases also package new enough liblz4:

- fedora 35 ships lz4-devel 1.9.3,
- CentOS 7 ships lz4-devel 1.8.3
- debian 10 ships liblz4-dev 1.8.3
- ubuntu 18.04 ships liblz4-dev r131

so, in this change, we drop the support of liblz4 < 1.7.3 for better
code readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12971
2023-02-23 14:39:20 +02:00
Kefu Chai
0cb842797a treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-15 22:57:18 +02:00
Petr Gusev
3263523b54 transport server: fix "request size too large" handling
Calling _read_buf.close() doesn't imply eof(), some data
may have already been read into kernel or client buffers
and will be returned next time read() is called.
When the _server._max_request_size limit was exceeded
and the _read_buf was closed, the process_request method
finished and we started processing the next request in
connection::process. The unread data from _read_buf was
treated as the header of the next request frame, resulting
in "Invalid or unsupported protocol version" error.

The existing test_shed_too_large_request was adjusted.
It was originally written with the assumption that the data
of a large query would simply be dropped from the socket
and the connection could be used to handle the
next requests. This behaviour was changed in scylladb#8800,
now the connection is closed on the Scylla side and
can no longer be used. To check there are no errors
in this case, we use Scylla metrics, getting them
from the Scylla Prometheus API.
2023-02-08 00:07:08 +04:00
Petr Gusev
0904f98ebf transport server: log failed requests with debug level
These logs can be helpful for debugging, e.g. if an error
was not handled correctly by the client driver, or another
error occurred while handling it.
2023-02-08 00:07:08 +04:00
Petr Gusev
a4cf509c3d transport server: fix unexpected server errors handling
If request processing ended with an error, it is worth
sending the error to the client through
make_error/write_response. Previously in this case we
just wrote a message to the log and didn't handle the
client connection in any way. As a result, the only
thing the client got in this case was timeout error.

A new test_batch_with_error is added. It is quite
difficult to reproduce error condition in a test,
so we use error injection instead. Passing injection_key
in the body of the request ensures that the exception
will be thrown only for this test request and
will not affect other requests that
the driver may send in the background.

Closes: scylladb#12104
2023-02-08 00:07:02 +04:00
Petr Gusev
bd80a449d5 transport server: log client errors with debug level
Ideally, these errors should be transparently delivered
to the client, but in practice, due to various
flaws/bugs in scylla and/or the driver,
they can be lost, which enormously complicates troubleshooting.

const socket_address& get_remote_address() is needed for its
convenient conversion to string, which includes ip and port.
2023-02-07 13:53:38 +04:00
Avi Kivity
0b418fa7cf cql3, transport, tests: remove "unset" from value type system
The CQL binary protocol introduced "unset" values in version 4
of the protocol. Unset values can be bound to variables, which
cause certain CQL fragments to be skipped. For example, the
fragment `SET a = :var` will not change the value of `a` if `:var`
is bound to an unset value.

Unsets, however, are very limited in where they can appear. They
can only appear at the top-level of an expression, and any computation
done with them is invalid. For example, `SET list_column = [3, :var]`
is invalid if `:var` is bound to unset.

This causes the code to be littered with checks for unset, and there
are plenty of tests dedicated to catching unsets. However, a simpler
way is possible - prevent the infiltration of unsets at the point of
entry (when evaluating a bind variable expression), and introduce
guards to check for the few cases where unsets are allowed.

This is what this long patch does. It performs the following:

(general)

1. unset is removed from the possible values of cql3::raw_value and
   cql3::raw_value_view.

(external->cql3)

2. query_options is fortified with a vector of booleans,
   unset_bind_variable_vector, where each boolean corresponds to a bind
   variable index and is true when it is unset.
3. To avoid churn, two compatiblity structs are introduced:
   cql3::raw_value{,_view}_vector_with_unset, which can be constructed
   from a std::vector<raw_value{,_view/}>, which is what most callers
   have. They can also be constructed with explicit unset vectors, for
   the few cases they are needed.

(cql3->variables)

4. query_options::get_value_at() now throws if the requested bind variable
   is unset. This replaces all the throwing checks in expression evaluation
   and statement execution, which are removed.
5. A new query_options::is_unset() is added for the users that can tolerate
   unset; though it is not used directly.
6. A new cql3::unset_operation_guard class guards against unsets. It accepts
   an expression, and can be queried whether an unset is present. Two
   conditions are checked: the expression must be a singleton bind
   variable, and at runtime it must be bound to an unset value.
7. The modification_statement operations are split into two, via two
   new subclasses of cql3::operation. cql3::operation_no_unset_support
   ignores unsets completely. cql3::operation_skip_if_unset checks if
   an operand is unset (luckily all operations have at most one operand that
   tolerates unset) and applies unset_operation_guard to it.
8. The various sites that accept expressions or operations are modified
   to check for should_skip_operation(). This are the loops around
   operations in update_statement and delete_statement, and the checks
   for unset in attributes (LIMIT and PER PARTITION LIMIT)

(tests)

9. Many unset tests are removed. It's now impossible to enter an
   unset value into the expression evaluation machinery (there's
   just no unset value), so it's impossible to test for it.
10. Other unset tests now have to be invoked via bind variables,
   since there's no way to create an unset cql3::expr::constant.
11. Many tests have their exception message match strings relaxed.
   Since unsets are now checked very early, we don't know the context
   where they happen. It would be possible to reintroduce it (by adding
   a format string parameter to cql3::unset_operation_guard), but it
   seems not to be worth the effort. Usage of unsets is rare, and it is
   explicit (at least with the Python driver, an unset cannot be
   introduced by ommission).

I tried as an alternative to wrap cql3::raw_value{,_view} (that doesn't
recognize unsets) with cql3::maybe_unset_value (that does), but that
caused huge amounts of churn, so I abandoned that in favor of the
current approach.

Closes #12517
2023-01-16 21:10:56 +02:00
Avi Kivity
7a8a442c1e transport: drop some dead code around v1 and v2 protocols
In 424dbf43f ("transport: drop cql protocol versions 1 and 2"),
we dropped support for protocols 1 and 2, but some code remains
that checks for those versions. It is now dead code, so remove it.

Closes #12497
2023-01-12 12:52:19 +02:00
Avi Kivity
2739ac66ed treewide: drop cql_serialization_format
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).

A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.

Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.

One test checking the 16-bit serialization format is removed.
2023-01-03 19:54:13 +02:00
Avi Kivity
424dbf43f3 transport: drop cql protocol versions 1 and 2
Version 3 was introduced in 2014 (Cassandra 2.1) and was supported
in the very first version of Scylla (2a7da21481 "CQL binary protocol").

Cassandra 3.0 (2015) dropped protocols 1 and 2 as well.
It's safe enough to drop it now, 9 years after introduction of v3
and 7 years after Cassandra stopped supporting it.

Dropping it allows dropping cql_serialization_format, which causes
quite a lot of pain, and is probably broken. This will be dropped in the
following patch.
2023-01-03 19:47:49 +02:00