Commit Graph

4972 Commits

Author SHA1 Message Date
Nadav Har'El
87f29d8fd2 db/tags: drop unsafe update_tags() utility function
The previous patches introduced the function modify_tags() as a
safe version of update_tags(), and switched all uses of update_tags()
to use modify_tags().

So now that the unsafe update_tags() is no longer use, we can drop it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-03-13 13:35:17 +02:00
Nadav Har'El
fbdf52acf6 db/tags: add safe modify_tags() utility functions
The existing utility function update_tags() for modifying tags in a
schema (used mainly by Alternator) is not safe for concurrent operations:
The function first reads the old tags, then modifies them and writes
them back. If two such calls happen concurrently, both calls may read
the same old tags, make different modifications, and then both write
the new tags, with one's write overwriting the other's.

So in this patch, we introduce a new utility function, modify_tags(),
to provide a concurrency-safe read-modify-write operation on tags.
The new function takes a modification function and calls the read,
modify and write steps together under a single lock. The new function
also takes a table name instead of a schema object - because we need
to read the schema under the lock, because might have already been
changed by some other concurrent operation.

This patch only introduces the new function, it doesn't change any
code to use it yet, and doesn't remove the unsafe update_tags() function.
We'll do those things in the next patches.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-03-13 11:51:01 +02:00
Avi Kivity
beaa5a9117 Merge 'wasm: move compilation to an alien thread' from Wojciech Mitros
The compilation of wasm UDFs is performed by a call to a foreign
function, which cannot be divided with yielding points and, as a
result, causes long reactor stalls for big UDFs.
We avoid them by submitting the compilation task to a non-seastar
std::thread, and retrieving the result using seastar::alien.

The thread is created at the start of the program. It executes
tasks from a queue in an infinite loop.

All seastar shards reference the thread through a std::shared_ptr
to a `alien_thread_runner`.

Considering that the compilation takes a long time anyway, the
alien_thread_runner is implemented with focus on simplicity more
than on performance. The tasks are stored in an std::queue, reading
and writing to it is synchronized using an std::mutex for reading/
writing to the queue, and an std::condition_variable waiting until
the queue has elements.

When the destructor of the alien runner is called, an std::nullopt
sentinel is pushed to the queue, and after all remaining tasks are
finished and the sentinel is read, the thread finishes.

Fixes #12904

Closes #13051

* github.com:scylladb/scylladb:
  wasm: move compilation to an alien thread
  wasm: convert compilation to a future
2023-03-12 19:29:11 +02:00
Nadav Har'El
843a5dfc15 Merge 'Allow setting permissions for user-defined functions' from Wojciech Mitros
This series aims to allow users to set permissions on user-defined functions.

The implementation is based on Cassandra's documentation and should be fully compatible: https://cassandra.apache.org/doc/latest/cassandra/cql/security.html#cql-permissions

Fixes: #5572
Fixes: #10633

Closes #12869

* github.com:scylladb/scylladb:
  cql3: allow UDTs in permissions on UDFs
  cql3: add type_parser::parse() method taking user_types_metadata
  schema_change_test: stop using non-existent keyspace
  cql3: fix parameter names in function resource constructors
  cql3: handle complex types as when decoding function permissions
  cql3: enforce permissions for ALTER FUNCTION
  cql-pytest: add a (failing) test case for UDT in UDF
  cql-pytest: add a test case for user-defined aggregate permissions
  cql-pytest: add tests for function permissions
  cql3: enforce permissions on function calls
  selection: add a getter for used functions
  abstract_function_selector: expose underlying function
  cql3: enforce permissions on DROP FUNCTION
  cql3: enforce permissions for CREATE FUNCTION
  client_state: add functions for checking function permissions
  cql-pytest: add a case for serializing function permissions
  cql3: allow specifying function permissions in CQL
  auth: add functions_resource to resources
2023-03-12 14:04:34 +02:00
Pavel Emelyanov
2f316880ae large_data_handler: Increase verbosity on shutdown
It may hang waiting for background handlers, so it's good to know if
they exist at all

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-10 19:45:18 +03:00
Pavel Emelyanov
2000494881 large_data_handler: Coroutinize .stop() method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-10 19:06:14 +03:00
Wojciech Mitros
4f0b3539c5 cql3: add type_parser::parse() method taking user_types_metadata
In a future patch, we don't have access to a `user_types_storage`
while we want to parse a type, but we do have access to a
`user_types_metadata`, which is enough to parse the type.
We add a variant of the `type_parser::parse()` that takes
a `user_types_metadata` instead of a `user_types_storage` to be
able to parse a type also in the described context.
2023-03-10 11:02:33 +01:00
Wojciech Mitros
2fd6d495fa wasm: move compilation to an alien thread
The compilation of wasm UDFs is performed by a call to a foreign
function, which cannot be divided with yielding points and, as a
result, causes long reactor stalls for big UDFs.
We avoid them by submitting the compilation task to a non-seastar
std::thread, and retrieving the result using seastar::alien.

The thread is created at the start of the program. It executes
tasks from a queue in an infinite loop.

All seastar shards reference the thread through a std::shared_ptr
to a `alien_thread_runner`.

Considering that the compilation takes a long time anyway, the
alien_thread_runner is implemented with focus on simplicity more
than on performance. The tasks are stored in an std::queue, reading
and writing to it is synchronized using an std::mutex for reading/
writing to the queue, and an std::condition_variable waiting until
the queue has elements.

When the destructor of the alien runner is called, an std::nullopt
sentinel is pushed to the queue, and after all remaining tasks are
finished and the sentinel is read, the thread finishes.
2023-03-09 11:54:38 +01:00
Kefu Chai
0b3d25ab1b build: cmake: add missing linkages
these dependencies were found when trying to compile
`user_function_test`. whenever a library libfoo references another one,
say, libbar, the corresponding linkage from libfoo to libbar is added.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 22:53:42 +08:00
Nadav Har'El
cdedc79050 cql: add configurable restriction of minimum RF
We have seen users unintentionally use RF=1 or RF=2 for a keyspace.
We would like to have an option for a minimal RF that is allowed.

Cassandra recently added, in Cassandra 4.1 (see apache/cassandra@5fdadb2
and https://issues.apache.org/jira/browse/CASSANDRA-14557), exactly such
a option, called "minimum_keyspace_rf" - so we chose to use the same option
name in Scylla too. This means that unlike the previous "safe mode"
options, the name of this option doesn't start with "restrict_".

The value of the minimum_keyspace_rf option is a number, and lower
replication factors are rejected with an error like:

  cqlsh> CREATE KEYSPACE x WITH REPLICATION = { 'class' : 'SimpleStrategy',
         'replication_factor': 2 };

  ConfigurationException: Replication factor replication_factor=2 is
  forbidden by the current configuration setting of minimum_keyspace_rf=3.
  Please increase replication factor, or lower minimum_keyspace_rf set in
  the configuration.

This restriction applies to both CREATE KEYSPACE and ALTER KEYSPACE
operations. It applies to both SimpleStrategy and NetworkTopologyStrategy,
for all DCs or a specific DC. However, a replication factor of zero (0)
is *not* forbidden - this is the way to explicitly request not to
replicate (at all, or in a specific DC).

For the time being, minimum_keyspace_rf=0 is still the default, which
means that any replication factor is allowed, as before. We can easily
change this default in a followup patch.

Note that in the current implementation, trying to use RF below
minimum_keyspace_rf is always an error - we don't have a syntax
to make into just a warning. In any case the error message explains
exactly which configuration option is responsible for this restriction.

Fixes #8891.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #9830
2023-03-07 19:04:06 +02:00
Kamil Braun
2b44631ded Merge 'storage_service: Make node operations safer by detecting asymmetric abort' from Tomasz Grabiec
This patch fixes a problem which affects decommission and removenode
which may lead to data consistency problems under conditions which
lead one of the nodes to unliaterally decide to abort the node
operation without the coordinator noticing.

If this happens during streaming, the node operation coordinator would
proceed to make a change in the gossiper, and only later dectect that
one of the nodes aborted during sending of decommission_done or
removenode_done command. That's too late, because the operation will
be finalized by all the nodes once gossip propagates.

It's unsafe to finalize the operation while another node aborted. The
other node reverted to the old topolgy, with which they were running
for some time, without considering the pending replica when handling
requests. As a result, we may end up with consistency issues. Writes
made by those coordinators may not be replicated to CL replicas in the
new topology. Streaming may have missed to replicate those writes
depending on timing.

It's possible that some node aborts but streaming succeeds if the
abort is not due to network problems, or if the network problems are
transient and/or localized and affect only heartbeats.

There is no way to revert after we commit the node operation to the
gossiper, so it's ok to close node_ops sessions before making the
change to the gossiper, and thus detect aborts and prevent later aborts
after the change in the gossiper is made. This is already done during
bootstrap (RBNO enabled) and replacenode. This patch canges removenode
to also take this approach by moving sending of remove_done earlier.

We cannot take this approach with decommission easily, because
decommission_done command includes a wait for the node to leave the
ring, which won't happen before the change to the gossiper is
made. Separating this from decommission_done would require protocol
changes. This patch adds a second-best solution, which is to check if
sessions are still there right before making a change to the gossiper,
leaving decommission_done where it was.

The race can still happen, but the time window is now much smaller.

The PR also lays down infrastructure which enables testing the scenarios. It makes node ops
watchdog periods configurable, and adds error injections.

Fixes #12989
Refs #12969

Closes #13028

* github.com:scylladb/scylladb:
  storage_service: node ops: Extract node_ops_insert() to reduce code duplication
  storage_service: Make node operations safer by detecting asymmetric abort
  storage_service: node ops: Add error injections
  service: node_ops: Make watchdog and heartbeat intervals configurable
2023-03-07 17:36:51 +01:00
Wojciech Mitros
4609a45ce3 wasm: convert compilation to a future
After we move the compilation to a alien thread, the completion
of the compilation will be signaled by fulfilling a seastar promise.
As a result, the `precompile` function will return a future, and
because of that, other functions that use the `precompile` functions
will also become futures.
We can do all the neccessary adjustments beforehand, so that the actual
patch that moves the compilation will contain less irrelevant changes.
2023-03-07 14:27:38 +01:00
Avi Kivity
6aa91c13c5 Merge 'Optimize topology::compare_endpoints' from Benny Halevy
The code for compare_endpoints originates at the dawn of time (bc034aeaec)
and is called on the fast path from storage_proxy via `sort_by_proximity`.

This series considerably reduces the function's footprint by:
1. carefully coding the many comparisons in the function so to reduce the number of conditional banches (apparently the compiler isn't doing a good enough job at optimizing it in this case)
2. avoid sstring copy in topology::get_{datacenter,rack}

Closes #12761

* github.com:scylladb/scylladb:
  topology: optimize compare_endpoints
  to_string: add print operators for std::{weak,partial}_ordering
  utils: to_sstring: deinline std::strong_ordering print operator
  move to_string.hh to utils/
  test: network_topology: add test_topology_compare_endpoints
2023-03-07 15:17:19 +02:00
Wojciech Mitros
d4851ccae7 treewide: rename the "xwasm" UDF language to "wasm"
When the WASM UDFs were first introduced, the LANGUAGE required in
the CQL statements to use them was "xwasm", because the ABI for the
UDFs was still not specified and changes to it could be backwards
incompatible.
Now, the ABI is stabilized, but if backwards incompatible changes
are made in the future, we will add a new ABI version for them, so
the name "xwasm" is no longer needed and we can finally
change it to "wasm".

Closes #13089
2023-03-07 10:21:11 +02:00
Botond Dénes
d1619eb38a Merge 'Remove qctx from helpers that retrieve truncation record' from Pavel Emelyanov
There are two places that do it -- commitlog and batchlog replayers. Both can have local system-keyspace reference and use system-keyspace local query-processor for it. The peering save_truncation_record() is not that simple and is not patched by this PR

Closes #13087

* github.com:scylladb/scylladb:
  system_keyspace: Unstatic get_truncation_record()
  system_keyspace: Unstatic get_truncated_at()
  batchlog_manager: Add system_keyspace dependency
  main: Swap batchlog manager and system keyspace starts
  system_keyspace: Unstatic get_truncated_position()
  system_keyspace: Remove unused method
  commitlog: Create commitlog_replayer with system keyspace
  test: Make cql_test_env::get_system_keyspace() return sharded
  commiltlog: Line-up field definitions
2023-03-07 10:19:55 +02:00
Kefu Chai
020483aa59 Update seastar submodule and main
this change also includes change to main, to make this commit compile.
see below:

* seastar 9b6e181e42...9cbc1fe889 (46):
  > Merge 'Make io-tester jobs share sched classes' from Pavel Emelyanov
  > io_tester.md: Update the `rps` configuration option description
  > io_tester: Add option to limit total number of requests sent
  > Merge 'Keep outgoing queue all cancellable while negotiating (again)' from Pavel Emelyanov
  > io_tester: Add option to share classes between jobs
  > rpc: Abort connection if send_entry() fails
  > Merge 'build: build dpdk with `-fPIC` if BUILD_SHARED_LIBS' from Kefu Chai
  > build: cooking.sh: use the same BUILD_SHARED_LIBS when building ingredients
  > build: cooking.sh: use the same generator when building ingredients
  > core/memory: handle `strerror_r` returning static string
  > Merge 'build, rpc: lz4 related cleanups' from Kefu Chai
  > build, rpc: do not support lz4 < 1.7.3
  > build: set the correct version when finding lz4
  > build: include CheckSymbolExists
  > rpc: do not include lz4.h in header
  > build: set CMP0135 for Cooking.cmake
  > docs: drop building-*.md
  > Merge 'seastar-addr2line: cleanups' from Kefu Chai
  > seastar-addr2line: refactor tests using unittest
  > seastar-addr2line: extract do_test() and main()
  > seastar-addr2line: do not import unused modules
  > scheduling: add a `rename` callback to scheduling_group_key_config
  > reactor: syscall thread: wakeup up reactor with finer granularity
  > build: build dpdk with `-fPIC` if BUILD_SHARED_LIBS
  > build: extract dpdk_extra_cflags out
  > core/sstring: remove a temporary variable
  > Merge 'treewide: include what we use, and add a checkheaders target' from Kefu Chai
  > perftune.py: auto-select the same number of IRQ cores on each NUMA
  > prometheus: remove unused headers
  > core/sstring: define <=> operator for sstring
  > Merge 'core: s/reserve_additional_memory/reserve_additional_memory_per_shard/' from Kefu Chai
  > include: do not include <concepts> directly
  > coding_style: note on self-contained header requirement
  > circileci: build checkheaders in addition to default target
  > build: add checkheaders target
  > net/toeplitz: s/u_int/unsigned/
  > net/tcp-stack: add forward declaration for seastar::socket
  > core, net, util: include used headers

* main: set reserved memory for wasm on per-shard basis

  this change is a follow-up of
  f05d612da8 and
  4a0134a097.

  this change depends on the related change in Seastar to reserve
  additional memory on a per-shard basis.

  per Wojciech Mitros's comment:

  > it should have probably been 50MB per shard

  in other words, as we always execute the same set of udf on all
  shards. and since one cannot predict the number of shards, but she
  could have a rough estimation on the size of memory a regular (set
  of) udf could use. so a per-shard setting makes more sense.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-06 18:41:34 +02:00
Pavel Emelyanov
1be9b0df50 system_keyspace: Unstatic get_truncation_record()
Now when both callers of this method are non-static, it can be made
non-static too. While at it make two more changes:

1. move the thing to private
2. remove explicit cql3::query_processor::cache_internal::yes argument,
   the system_keyspace::execute_cql() applies it on itw own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:40 +03:00
Pavel Emelyanov
109e032f61 system_keyspace: Unstatic get_truncated_at()
It's called from batchlog replayer which now has local system keyspace
reference and can use it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:40 +03:00
Pavel Emelyanov
1907518034 batchlog_manager: Add system_keyspace dependency
The manager will need system ks to get truncation record from, so add it
explicitly. Start-stop sequence no allows that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:40 +03:00
Pavel Emelyanov
dcbe3e467b system_keyspace: Unstatic get_truncated_position()
It's called from commitlog replayer which has system keyspace instance
on board and can use it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:40 +03:00
Pavel Emelyanov
2501ba3887 system_keyspace: Remove unused method
The get_truncated_position() overload that filters records by shard is
nowadays unused. Drop one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:40 +03:00
Pavel Emelyanov
47b61389b5 commitlog: Create commitlog_replayer with system keyspace
The replayer code needs system keyspace to fetch truncation records
from, thus it needs this explicit dependency. By the time it runs system
keyspace is fully initialized already

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:36 +03:00
Pavel Emelyanov
73ab1bd74b commiltlog: Line-up field definitions
Just a cosmetic change, so that next patch adding a new member to the
class looks nice

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:15:27 +03:00
Botond Dénes
e70be47276 Merge 'commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off' from Calle Wilund
Fixes #12810

We did not update total_size_on_disk in commitlog totals when use o_dsync was off.
This means we essentially ran with no registered footprint, also causing broken comparisons in delete_segments.

Closes #12950

* github.com:scylladb/scylladb:
  commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off
  commitlog: change type of stored size
2023-03-02 12:39:11 +02:00
Botond Dénes
46efdfa1a1 Merge 'readers/nonforwarding: don't emit partition_end on next_partition,fast_forward_to' from Gusev Petr
The series fixes the `make_nonforwardable` reader, it shouldn't emit `partition_end` for previous partition after `next_partition()` and `fast_forward_to()`

Fixes: #12249

Closes #12978

* github.com:scylladb/scylladb:
  flat_mutation_reader_test: cleanup, seastar::async -> SEASTAR_THREAD_TEST_CASE
  make_nonforwardable: test through run_mutation_source_tests
  make_nonforwardable: next_partition and fast_forward_to when single_partition is true
  make_forwardable: fix next_partition
  flat_mutation_reader_v2: drop forward_buffer_to
  nonforwardable reader: fix indentation
  nonforwardable reader: refactor, extract reset_partition
  nonforwardable reader: add more tests
  nonforwardable reader: no partition_end after fast_forward_to()
  nonforwardable reader: no partition_end after next_partition()
  nonforwardable reader: no partition_end for empty reader
  row_cache: pass partition_start though nonforwardable reader
2023-03-01 09:58:14 +02:00
Botond Dénes
84e26ed9c3 Merge 'Enable RBNO by default' from Asias He
This pr fixes the seastar::rpc::closed_error error in the test_topology suite and enables RBNO by default.

Closes #12970

* github.com:scylladb/scylladb:
  Revert "Revert "storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops""
  storage_service: Wait for normal state handler to finish in replace
  storage_service: Wait for normal state handler to finish in bootstrap
2023-03-01 07:55:46 +02:00
Botond Dénes
eb10623dd2 Merge 'build: cmake: sync with configure.py (7/n)' from Kefu Chai
- build: cmake: support thrift < 0.11.0
- build: cmake: find Thrift before using it
- build: cmake: find libxcrypt before using it
- build: cmake: expose scylla_gen_build_dir from "interface"
- build: cmake: extract thrift out
- build: cmake: link the whole auth
- build: cmake: extract mutation,db,replica,streaming out

Closes #12990

* github.com:scylladb/scylladb:
  build: cmake: extract mutation,db,replica,streaming out
  build: cmake: link the whole auth
  build: cmake: extract thrift out
  build: cmake: expose scylla_gen_build_dir from "interface"
  build: cmake: find libxcrypt before using it
  build: cmake: find Thrift before using it
  build: cmake: support thrift < 0.11.0
2023-03-01 07:35:21 +02:00
Petr Gusev
64427b9164 flat_mutation_reader_v2: drop forward_buffer_to
This is just a strange method I came across.
It effectively does nothing but clear_buffer().
2023-02-28 23:00:02 +04:00
Kefu Chai
af3968bf6e build: cmake: extract mutation,db,replica,streaming out
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:28:46 +08:00
Tomasz Grabiec
5c8ad2db3c service: node_ops: Make watchdog and heartbeat intervals configurable
Will be useful for writing tests which trigger failures, and for
warkarounds in production.
2023-02-28 11:31:55 +01:00
Kefu Chai
5bf6e9ba97 db::commitlog::replay_position: use defaulted operator<=>
the default generated operator<=> is exactly the same as the
handcrafted one. so let compiler do its job. also, since
operator<=> is defaulted, there is no need to define operator==
anymore, so drop it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:25:30 +08:00
Kefu Chai
56c9c9d29e db: schema_tables: use defaulted operator<=>
the default generated operator<=> is exactly the same as the
handcrafted one. so let compiler do its job. also, since
operator<=> is defaulted, there is no need to define operator==
anymore, so drop it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:25:30 +08:00
Kefu Chai
ab5d772d63 db/view: use operator<=> to define comparison operators
also, there is no need to define operator!=() if
operator==() is defined, so drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:25:30 +08:00
Asias He
8fb786997a Revert "Revert "storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops""
This reverts commit fd4ee4878a.
2023-02-28 09:00:13 +08:00
Pavel Emelyanov
f51762c72a headers: Refine view_update_generator.hh and around
The initial intent was to reduce the fanout of shared_sstable.hh through
v.u.g.hh -> cql_test_env.hh chain, but it also resulted in some shots
around v.u.g.hh -> database.hh inclusion.

By and large:
- v.u.g.hh doesn't need database.hh
- cql_test_env.hh doesn't need v.u.g.hh (and thus -- the
  shared_sstable.hh) but needs database.hh instead
- few other .cc files need v.u.g.hh directly as they pulled it via
  cql_test_env.hh before
- add forward declarations in few other places

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12952
2023-02-22 09:32:30 +02:00
Calle Wilund
97881091d3 commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off
Fixes #12810

We did not update total_size_on_disk in commitlog totals when use o_dsync was off.
This means we essentially ran with no registered footprint, also causing broken
comparisons in delete_segments.
2023-02-21 16:35:23 +00:00
Calle Wilund
64102780fe commitlog: Use static (reused) regex for (left over) descriptor parse
Refs #11710

Allows reusing regex for segment matching (for opening left-over segments after crash).
Should remove any stalls caused by commitlog replay preparation.

v2: Add unit test for descriptor parsing

Closes #12112
2023-02-21 18:34:04 +02:00
Tomasz Grabiec
c8e2bf1596 db: schema_tables: Optimize schema merge
Currently, applying a schema change on a replica works like this:

  Collect all affected keyspaces from incoming mutations
  Read current state of schema
  Apply the mutations
  Read new state of schema

The "Read ... state of schema" step reads all kinds of schema
objects. In particular, to read the "table" objects, it does the
following:

  for every affected keyspace k:
       read all mutations from system_schema.tables for k
       extract all existing table names from those mutations
       for every existing table:
             read mutations from {tables, columns, indexes, view_virtual_columns, ...} for that table

As you can see, the number of reads performed is O(nr tables in a
keyspace), not O(nr tables in a change). This means that making a
sequence of schema changes, like adding a table, is quadratic.

Another aspect which magnifies this is that we don't read those tables
using a single scan, but issue individual queries for each table
separately.

This patch optimizes this by considering only affected tables when
reading schema for the purpose of diff calculation.

When mutations contain multi-table deletions, we still read the
set of tables, like before. This could be optimized by looking
at the database to get the list, but it's not part of the patch.

I tested this using a test case provided by Kamil (kbr-scylla@53fe154)

  ./test.py --mode debug test_many_schema_changes -s

The test bootstraps a cluster and then creates about 40 schema
changes. Then a new node is bootstrapped and replays those changes via
group0.

In debug mode, each change takes roughly 2s to process before the
patch, and 0.5s after the patch.

The whole replay is reduced to 56% of what was before:

Before (1m19s) :

INFO  2023-01-20 19:44:35,848 [shard 0] raft_group0 - setup_group0: ensuring that the cluster has fully upgraded to use Raft...
INFO  2023-01-20 19:45:54,844 [shard 0] raft_group0 - setup_group0: waiting for peers to synchronize state...

After (45s):

INFO  2023-01-20 22:02:51,869 [shard 0] raft_group0 - setup_group0: ensuring that the cluster has fully upgraded to use Raft...
INFO  2023-01-20 22:03:36,834 [shard 0] raft_group0 - setup_group0: waiting for peers to synchronize state...

Closes #12592

Closes #12592
2023-02-21 17:26:57 +02:00
Calle Wilund
6f972ee68b commitlog: change type of stored size
known_size() is technically not a size_t.
2023-02-21 15:26:02 +00:00
Kefu Chai
df63e2ba27 types: move types.{cc,hh} into types
they are part of the CQL type system, and are "closer" to types.
let's move them into "types" directory.

the building systems are updated accordingly.

the source files referencing `types.hh` were updated using following
command:

```
find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} +
```

the source files under sstables include "types.hh", which is
indeed the one located under "sstables", so include "sstables/types.hh"
instea, so it's more explicit.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12926
2023-02-19 21:05:45 +02:00
Avi Kivity
e2f6e0b848 utils: move hashing related files to utils/ module
Closes #12884
2023-02-17 07:19:52 +02:00
Kefu Chai
2f0cb9e68f db/virtual_table: mark the dtor of base class virtual
as `my_result_collector` has virtual function, and its dtor is not
marked virtual, Clang complains. let's mark its base class virtual
to be on the safe side.

```
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on non-final 'my_result_collector' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
        delete __ptr;
        ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<my_result_collector>::operator()' requested here
          get_deleter()(std::move(__ptr));
          ^
/home/kefu/dev/scylladb/db/virtual_table.cc:134:25: note: in instantiation of member function 'std::unique_ptr<my_result_collector>::~unique_ptr' requested here
        auto consumer = std::make_unique<my_result_collector>(s, permit, &pr, std::move(reader_and_handle.second));
                        ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12879
2023-02-17 07:11:18 +02:00
Botond Dénes
dc3d47b1e4 Merge 'Get compaction history without using qctx' from Pavel Emelyanov
There are two methods to mess with compaction history -- update and get. The former had been patched to use local system-keyspace instance by 907fd2d3 (system_keyspace: De-static compaction history update) now it's time for the latter (spoiler: it's only used by the API handler)

Closes #12889

* github.com:scylladb/scylladb:
  system_keyspace; Make get_compaction_history non static and drop qctx
  api, compaction_manager: Get compaction history via manager
  system_keyspace: Move compaction_history_entry to namespace scope
2023-02-16 19:05:48 +02:00
Pavel Emelyanov
e234726123 system_keyspace; Make get_compaction_history non static and drop qctx
Now the call is done via the system_keyspace instance, so it can be
unmarked static and can use the local query processor instead of global
qctx.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-16 11:28:04 +03:00
Pavel Emelyanov
d0e47ace16 system_keyspace: Move compaction_history_entry to namespace scope
It's now a sub-class and it makes forward-declaration in another unit
impossible

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-16 11:24:23 +03:00
Pavel Emelyanov
737f4acc10 features: Enable persisted features on all shards
Commit 1365e2f13e (gms: feature_service: re-enable features on node
startup) re-enabled features on feature service very early, so that on
boot a node sees its "correct" features state before it starts loading
system tables and replaying commitlog.

However, checking features happens on all shards independently, so
re-enabling should also happen on all shards.

One faced problem is in extract_scylla_specific_keyspace_info(). This
helper is used when loading non-system keyspace to read scylla-specific
keyspace options. The helper is called on all shards and on all-but-zero
it evaluates the checked SCYLLA_KEYSPACES feature to false leaving the
specific data empty. As the result, different shards have different view
of keyspaces' configuration.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12881
2023-02-16 00:52:05 +01:00
Kefu Chai
0cb842797a treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-15 22:57:18 +02:00
Benny Halevy
25ebc63b82 move to_string.hh to utils/
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-15 11:09:04 +02:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Avi Kivity
c5e4bf51bd Introduce mutation/ module
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.

mutation_reader remains in the readers/ module.

mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.

This is a step forward towards librarization or modularization of the
source base.

Closes #12788
2023-02-14 11:19:03 +02:00