Commit Graph

659 Commits

Author SHA1 Message Date
Botond Dénes
7af0690762 mutation/mutation_compactor: drop v2 from compactor and related names 2025-05-09 07:53:29 -04:00
Botond Dénes
b5170e27d0 replica/table: s/make_reader_v2/make_mutation_reader/ 2025-05-09 07:53:29 -04:00
Botond Dénes
ca7f557e86 readers/multishard: drop v2 from reader and related names 2025-05-09 07:53:29 -04:00
Botond Dénes
4d92bc8b2f readers/evictable: drop v2 from reader and related names 2025-05-09 07:53:28 -04:00
Raphael S. Carvalho
c77f710a0c sstables: Fix quadratic space complexity in partitioned_sstable_set
Interval map is very susceptible to quadratic space behavior when
it's flooded with many entries overlapping all (or most of)
intervals, since each such entry will have presence on all
intervals it overlaps with.

A trigger we observed was memtable flush storm, which creates many
small "L0" sstables that spans roughly the entire token range.

Since we cannot rely on insertion order, solution will be about
storing sstables with such wide ranges in a vector (unleveled).

There should be no consequence for single-key reads, since upper
layer applies an additional filtering based on token of key being
queried.
And for range scans, there can be an increase in memory usage,
but not significant because the sstables span an wide range and
would have been selected in the combined reader if the range of
scan overlaps with them.

Anyway, this is a protection against storm of memtable flushes
and shouldn't be the common scenario.

It works both with tablets and vnodes, by adjusting the token
range spanned by compaction group accordingly.

Fixes #23634.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-04-29 15:47:33 -03:00
Wojciech Mitros
bf7bba9634 mv: add a test for dropping an index while it's building
Dropping an index is a schema change of its base table and
a schema drop of the index's materialized view. This combination
of schema changes used to cause issues during view building, because
when a view schema was dropped, it wasn't getting updated with the
new version of the base schema, and while the view building was
in progress, we would update the base schema for the base table
mutation reader and try generating updates with a view schema that
wasn't compatible with the base schema, failing on an `on_internal_error`.

In this patch we add a test for this scenario. We create an index,
halt its view building process using an injection, and drop it.
If no errors are thrown, the test succeeds.

The test was failing before https://github.com/scylladb/scylladb/pull/23337
and is passing afterwards.
2025-04-24 01:09:32 +02:00
Wojciech Mitros
d77f11d436 base_info: remove the lw_shared_ptr variant
The base_dependent_view_info is no longer needed to be shared or
modified in the view_info, so we no longer need to keep it as
a shared pointer.
2025-04-24 01:08:40 +02:00
Wojciech Mitros
d7bd86591e view_info: don't re-set base_info after construction
In the previous commits we made sure that the base info is not dependent
on the base schema version, and the info dependent on the base schema
version is calculated when it's needed. In this patch we remove the
unnecessary re-setting of the base_info.

The set_base_info method isn't removed completely, because it also has
a secondary function - zeroing the view_info fields other than base_info.
Because of this, in this patch we rename it accordingly and limit its
use to the updates caused by a base schema change.
2025-04-24 01:08:40 +02:00
Wojciech Mitros
ea462efa3d base_info: remove base_info snapshot semantics
The base info in view schemas no longer changes on base schema
updates, so saving the base info with a view schema from a specific
point in time doesn't provide any additional benefits.
In this patch we remove the code using the base_and_view snapshots
as it's no longer useful.
2025-04-24 01:08:40 +02:00
Wojciech Mitros
ad55935411 base_info: remove base schema from the base_info
The base info now only contains values which are not reliant on the
base schema version. We remove the the base schema from the base info
to make it immutable regardless of base schema version, at the point
of this patch it's also not needed anywhere - the new base info can
replace the base schema in most places, and in the few (view_updates)
where we need it, we pull the most recent base schema version from
the database.

After this change, the base info no longer changes in a view schema
after creation, so we'll no longer get errors when we try generating
view updates with a base_info that's incompatible with a specific
base schema version.

Fixes #9059
Fixes #21292
Fixes #22410
2025-04-24 01:08:39 +02:00
Wojciech Mitros
05fce91945 schema_registry: store base info instead of base schema for view entries
In the following patch we plan to remove the base schema from the base_info
to make the base_info immutable. To do that, we first prepare the schema
registry for the change; we need to be able to create view schemas from
frozen schemas there and frozen schemas have no information about the base
table. Unless we do this change, after base schemas are removed from the
base info, we'll no longer be able to load a view schema to the schema registry
without looking up the base schema in the database.

This change also required some updates to schema building:
* we add a method for unfreezing a view schema with base info instead of
a base schema
* we make it possible to use schema_builder with a base info instead of
a base schema
* we add a method for creating a view schema from mutations with a base info
instead of a base schema
* we add a view_info constructor withat base info instead of a base schema
* we update the naming in schema_registry to reflect the usage of base info
instead of base schema
2025-04-24 01:08:39 +02:00
Wojciech Mitros
6e539c2b4d base_info: make members non-const
In the following patches we'll add the base info instead of the
base schema to various places (schema building, schema registry).
There, we'll sometimes need to update the base_info fields, which
we can't do with const members. There's also a place (global_schema_ptr)
where we won't be able to use the base_info_ptr (a shared pointer to the
base_info), so we can't just use the base_info_ptr everywhere instead.

In this patch we unmark these members as const.
In the following patches we'll remove the methods for changing the
base_info in the view schema, so it will remain effectively const.
2025-04-24 01:08:39 +02:00
Wojciech Mitros
32258d8f9a view_info: move the base info to a separate header
In the following commits the base_depenedent_view_info will be needed
in many more places. To avoid including the whole db/view/view.hh
or forward declaring (where possible) the base info, we move it to
a separate header which can be included anywhere at almost no cost.
2025-04-24 01:08:39 +02:00
Wojciech Mitros
a3d2cd6b5e view_info: move computation of view pk columns not in base pk to view_updates
In preparation of making the base_info immutable, we want to get rid of
any base_dependent_view_info fields that can change when base schema
is updated.
The _base_regular_columns_in_view_pk and _base_static_columns_in_view_pk
base column_ids of corresponding base columns and they can change
(decrease) when an earlier column is dropped in the base table.
view_updates is the only location where these values are used and calculating
them is not expensive when comparing to the overall work done while performing
a view update - we iterate over all view primary key columns and look them up
in the base table.
With this in mind, we can just calculate them when creating a view_updates
object, instead of keeping them in the base_info. We do that in this patch.
2025-04-24 01:08:39 +02:00
Wojciech Mitros
a33963daef view_info: move base-dependent variables into base_info
The has_computed_column_depending_on_base_non_primary_key
and is_partition_key_permutation_of_base_partition_key variables
in the view_info depend on the base table so they should be in the
base_dependent_view_info instead of view_info.
2025-04-24 01:08:39 +02:00
Wojciech Mitros
900687c818 view_info: set base info on construction
Currently, the base_info may or may not be set in view schemas.
Even when it's set, it may be modified. This necessitates extra
checks when handling view schemas, as well as potentially causing
errors when we forget to set it at some point.

Instead, we want to make the base info an immutable member of view
schemas (inside view_info). The first step towards that is making
sure that all newly created schemas have the base info set.
We achieve that by requiring a base schema when constructing a view
schema. Unfortunately, this adds complexity each time we're making
a view schema - we need to get the base schema as well.
In most cases, the base schema is already available. The most
problematic scenario is when we create a schema from mutations:
- when parsing system tables we can get the schema from the
database, as regular tables are parsed before views
- when loading a view schema using the schema loader tool, we need
to load the base additionally to the view schema, effectively
doubling the work
- when pulling the schema from another node - in this case we can
only get the current version of the base schema from the local
database

Additionally, we need to consider the base schema version - when
we generate view updates the version of the base schema used for
reads should match the version of the base schema in view's base
info.
This is achieved by selecting the correct (old or new) schema in
`db::schema_tables::merge_tables_and_views` and using the stored
base schema in the schema_registry.
2025-04-24 01:08:39 +02:00
Botond Dénes
c29c696780 readers: mv from_mutations_v2.hh from_mutations.hh
Completely mechanical change.
2025-04-16 04:46:08 -04:00
Botond Dénes
b104862702 tree: s/make_mutation_reader_from_mutations_v2/make_mutation_reader_from_mutations/s
Completely mechanical change.
2025-04-16 04:46:07 -04:00
Botond Dénes
7547d0c6a9 readers: mv from_fragments_v2.hh from_fragments.hh
Completely mechanical change.
2025-04-16 04:35:00 -04:00
Benny Halevy
e1fe82ed33 utils: phased_barrier, pluggable: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:47:00 +03:00
Amnon Heiman
19a414598b db/view/view.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_view_builder_builds_in_progress

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00
Kefu Chai
6e4cb20a69 tree: implement boost::accumulate with std::ranges library
Replace boost::accumulate() calls with std::ranges facilities. This
change reduces external dependencies and modernizes the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23062
2025-02-26 23:22:02 +02:00
Andrzej Jackowski
b4f0a5149a db: cql3: add comments regarding unsafe interval<clustering_key_prefix>
class clustering_range is a range of Clustering Key Prefixes implemented
as interval<clustering_key_prefix>. However, due to the nature of
Clustering Key Prefix, the ordering of clustering_range is complex and
does not satisfy the invariant of interval<>. To be more specific, as a
comment in interval<> implementation states: “The end bound can never be
smaller than the start bound”. As a range of CKP violates the invariant,
some algorithms, like intersection(), can return incorrect results.
For more details refer to scylladb#8157, scylladb#21604, scylladb#22817.

This commit:
 - Add a WARNING comment to discourage usage of clustering_range
 - Add WARNING comments to potentially incorrect uses of
   interval<clustering_key_prefix> non-trivial methods
 - Add a FIXME comment to incorrect use of
   interval<clustering_key_prefix_view>::deoverlap and WARNING comments
   to related interval<clustering_key_prefix_view> misuse.

Closes scylladb/scylladb#22913
2025-02-26 12:01:28 +01:00
Kefu Chai
7ff0d7ba98 tree: Remove unused boost headers
This commit eliminates unused boost header includes from the tree.

Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22857
2025-02-15 20:32:22 +02:00
Nadav Har'El
cae8a7222e alternator: fix view build on oversized GSI key attribute
Before this patch, the regular_column_transformation constructor, which
we used in Alternator GSIs to generates a view key from a regular-column
cell, accepted a cell of any size. As a reviewer (Avi) noticed, very
long cells are possible, well beyond what Scylla allows for keys (64KB),
and because regular_column_transformation stores such values in a
contiguous "bytes" object it can cause stalls.

But allowing oversized attributes creates an even more accute problem:
While view building (backfilling in DynamoDB jargon), if we encounter
an oversized (>64KB) key, the view building step will fail and the
entire view building will hang forever.

This patch fixes both problems by adding to regular_column_transformation's
constructor the check that if the cell is 64KB or larger, an empty value
is returned for the key. This causes the backfilling to silently skip
this item, which is what we expect to happen (backfilling cannot do
anything to fix or reject the pre-existing items in the best table).

A test test_gsi_updatetable.py::test_gsi_backfill_oversized_key is
introduced to reproduce this problem and its fix. The test adds a 65KB
attribute to a base table, and then adds GSIs to this table with this
attribute as its partition key or its sort key. Before this patch, the
backfilling process for the new GSIs hangs, and never completes.
After this patch, the backfilling completes and as expected contains
other base-table items but not the item with the oversized attribute.
The new test also passes on DynamoDB.

However, while implementing this fix I realized that issue #10347 also
exists for GSIs. Issue #10347 is about the fact that DynamoDB limits
partition key and sort key attributes to 2048 and 1024 bytes,
respectively. In the fix described above we only handled the accute case
of lengths above 64 KB, but we should actually skip items whose GSI
keys are over 2048 or 1024 bytes - not 64KB. This extra checking is
not handled in this patch, and is part of a wider existing issue:
Refs #10347

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:50 +01:00
Nadav Har'El
7a0027bacc mv: clean up do_delete_old_entry
The function do_delete_old_entry() had an if() which was supposedly for
the case of collection column indexing, and which our previous patch
that improved this function to support caller-specified deletion_ts
left behind.

As a reviewer noticed, the new tombstone-setting code was in an "else"
of that existing if(), and it wasn't clear what happens if we get to that
else in the collection column indexing. So I reviewed the code and added
breakpoints and realized that in fact, do_delete_old_entry() is never
called for the collection-indexing case, which has its own
update_entry_for_computed_column() which view_updates::generate_update()
calls instead of the do_delete_old_entry() function and its friends.
So it appears that do_delete_old_entry() doesn't need that special
case at all, which simplifies it.

We should eventually simplify this code further. In particular, the
function generate_update() already knows the key of the rows it
adds or deletes so do_delete_old_entry() and its friends don't need
to call get_view_rows() to get it again. But these simplifications
and other will need to come in a later patch series, this one is
already long enough :-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Nadav Har'El
bc7b5926d2 mv: support regular_column_transformation key columns in view
In an earlier patch, we introduced regular_column_transformation,
a new type of computed column that does a computation on a cell in
regular column in the base and returns a potentially transformed cell
(value or deletion, timestamp and ttl). In *this* patch, we wire the
materialized view code to support this new kind of computed column that
is usable as a materialized-view key column. This new type of computed
column is not yet used in this patch - this will come in the next
patch, where we will use it for Alternator GSIs.

Before this patch, the logic of deciding when the view update needs
to create a new row or delete a new one, and which timestamp and ttl
to give to the new row, could depend on one (or two - in Alternator)
cells read from base-table regular columns. In this patch, this logic
is rewritten - the notion of "base table regular columns" is generalized
to the notion of "updatable view key columns" - these are view key
columns that an update may change - because they really are base regular
columns, or a computed function of one (regular_column_transformation).

In some sense, the new code is easier to understand - there is no longer
a separate "compute_row_marker()" function, rather the top-level
generate_update() is now in charge of finding the "updatable view key
columns" and calculate the row marker (timestamp and ttl) as part
of deciding what needs to be done.

But unfortunately the code still has separate code paths for "collection
secondary indexing", and also for old-style column_computation (basically,
only token_column_computation). Perhaps in the future this can be further
simplified.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Nadav Har'El
c8ea9f8470 mv: introduce regular_column_transformation, a new type of computed column
In the patches that follow, we want Alternator to be able to use as a
key for a materialized view (GSI) not a real column from the schema,
but rather an attribute value deserialized from a member of the ":attrs"
map.

For this, we need the ability for materialized view to define a key
column which is computed as function of a real column (":attrs").

We already have an MV feature which we called "computed column"
(column_computation), but it is wholy inadequate for this job:
column_computation can only take a partition key, and produce a value -
while we need it to take a regular column (one member of ":attrs"),
not just the partition key, and return a cell - value or deletion,
timestamp and TTL.

So in this patch we introduce a new type of computed column, which we
called "regular_column_transformation" since it intends to perform some
sort of transformation on a single column (or more accurately, a single
atomic cell). The limitation that this function transforms a single
column only is important - if we had a function of multiple columns,
we wouldn't know which timestamp or ttl it should use for the result
if the two columns had different timestamps or TTLs.

The new class isn't wired to anything yet: The MV code cannot handle
it yet, and the Alternator code will not use it yet.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:48 +01:00
Michael Litvak
6d34125eb7 view_builder: fix loop in view builder when tokens are moved
The view builder builds a view by going over the entire token ring,
consuming the base table partitions, and generating view updates for
each partition.

A view is considered as built when we complete a full cycle of the
token ring. Suppose we start to build a view at a token F. We will
consume all partitions with tokens starting at F until the maximum
token, then go back to the minimum token and consume all partitions
until F, and then we detect that we pass F and complete building the
view. This happens in the view builder consumer in
`check_for_built_views`.

The problem is that we check if we pass the first token F with the
condition `_step.current_token() >= it->first_token` whenever we consume
a new partition or the current_token goes back to the minimum token.
But suppose that we don't have any partitions with a token greater than
or equal to the first token (this could happen if the partition with
token F was moved to another node for example), then this condition will never be
satisfied, and we don't detect correctly when we pass F. Instead, we
go back to the minimum token, building the same token ranges again,
in a possibly infinite loop.

To fix this we add another step when reaching the end of the reader's
stream. When this happens it means we don't have any more fragments to
consume until the end of the range, so we advance the current_token to
the end of the range, simulating a partition, and check for built views
in that range.

Fixes scylladb/scylladb#21829

Closes scylladb/scylladb#22493
2025-01-30 14:35:18 +02:00
Benny Halevy
dd21d591f6 network_topology_strategy_test: add tablets rack_aware_view_pairing tests
Test the simple case of base/view pairing with replication_factor
that is a multiple of the number of racks.

As well as the complex case when simple_tablets_rack_aware_view_pairing
is not possible.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
249b793674 view: get_view_natural_endpoint: implement rack-aware pairing for tablets
Enabled with the tablets_rack_aware_view_pairing cluster feature
rack-aware pairing pairs base to view replicas that are in the
same dc and rack, using their ordinality in the replica map

We distinguish between 2 cases:
- Simple rack-aware pairing: when the replication factor in the dc
  is a multiple of the number of racks and the minimum number of nodes
  per rack in the dc is greater than or equal to rf / nr_racks.

  In this case (that includes the single rack case), all racks would
  have the same number of replicas, so we first filter all replicas
  by dc and rack, retaining their ordinality in the process, and
  finally, we pair between the base replicas and view replicas,
  that are in the same rack, using their original order in the
  tablet-map replica set.

  For example, nr_racks=2, rf=4:
  base_replicas = { N00, N01, N10, N11 }
  view_replicas = { N11, N12, N01, N02 }
  pairing would be: { N00, N01 }, { N01, N02 }, { N10, N11 }, { N11, N12 }
  Note that we don't optimize for self-pairing if it breaks pairing ordinality.

- Complex rack-aware pairing: when the replication factor is not
  a multiple of nr_racks.  In this case, we attempt best-match
  pairing in all racks, using the minimum number of base or view replicas
  in each rack (given their global ordinality), while pairing all the other
  replicas, across racks, sorted by their ordinality.

  For example, nr_racks=4, rf=3:
  base_replicas = { N00, N10, N20 }
  view_replicas = { N11, N21, N31 }
  pairing would be: { N00, N31 }*, { N10, N11 }, { N20, N21 }
  * cross-rack pair

  If we'd simply stable-sort both base and view replicas by rack,
  we might end up with much worse pairing across racks:
  { N00, N11 }*, { N10, N21 }*, { N20, N31 }*
  * cross-rack pair

Fixes scylladb/scylladb#17147

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
0e388a1594 view: get_view_natural_endpoint: handle case when there are too few view replicas
Currently, when reducing RF, we may drop replicas from
the view before dropping replicas from the base table.
Since get_view_natural_endpoint is allowed to return
a disengaged optional if it can't find a pair for the
base replica, replcace the exiting assertion with code
handling this case, and count those events in a new
table metric: total_view_updates_failed_pairing.

Note that this does not fix the root cause for the issue
which is the unsynchronized dropping of replicas, that
should be atomic, using a single group0 transaction.

Refs scylladb/scylladb#21492

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
858b0a51f8 view: get_view_natural_endpoint: track replica locator::nodes
Rather than tracking only the replica host_id, keep
track of the locator:::node& to prepare for
rack-aware pairing.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
cadd33bdf6 view: get_view_natural_endpoint: refactor predicate function
Simplify the function logic by calculating the predicate
function once, before scanning all base and view replicas,
rather than testing the different options in the inner loop.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
97f85e52f7 view: get_view_natural_endpoint: clarify documentation
"self-pairing" is enabled only when use_legacy_self_pairing
is enabled.  That is currently unclear in the documentation
comment for this function.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
6d4de30a3a view: mutate_MV: optimize remote_endpoints filtering check
Currently we always lookup both `my_address` and *target_endpoint
in remote_endpoints. But if my_address is in remote_endpoints
in some cases the second lookup is not needed, so do it only
to decide whether to swap target_endpoint with my_address, if
found in remote_endpoints, or to remove that match, if
*target_endpoint is already pending as well.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
91d3bf8ebc view: mutate_MV: lookup base and view erms synchronously
Although at the moment storage_service::replicate_to_all_cores
may yield between updating the base and view tables with
a new effective_replication_map, scylladb/scylladb#21781
was submitted to change that so that they are updated
atomically together.

This change prepares for the above change, and is harmless
at the moment.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
d04cdce0fc view: mutate_MV: calculate keyspace-dependent flags once
All view live in the same keyspace as their base
table, so calculate the keyspace-dependent flags
once, outside the per-view update loop.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Kamil Braun
89ee2a6834 Merge 'drop ip addresses from token metadata' from Gleb
Now that all topology related code uses host ids there is not point to
maintain ip to id (and back) mappings in the token metadata. After the
patch the mapping will be maintained in the gossiper only. The rest of
the system will use host ids and in rare cases where translation is
needed (mostly for UX compatibility reasons) the translation will be
done using gossiper.

Fixes: scylladb/scylla#21777

* 'gleb/drop-ip-from-tm-v3' of github.com:scylladb/scylla-dev: (57 commits)
  hint manager: do not translate ip to id in case hint manager is stopped already
  locator: token_metadata: drop update_host_id() function that does nothing now
  locator: topology: drop indexing by ips
  repair: drop unneeded code
  storage_service: use host_id to look for a node in on_alive handler
  storage_proxy: translate ips to ids in forward array using gossiper
  locator: topology: remove unused functions
  storage_service: check for outdated ip in on_change notification in the peers table
  storage_proxy: translate id to ip using address map in tablets's describe_ring code instead of taking one from the topology
  topology coordinator: change connection dropping code to work on host ids
  cql3: report host id instead of ip in error during SELECT FROM MUTATION_FRAGMENTS query
  locator: drop unused function from tablet_effective_replication_map
  api: view_build_statuses: do not use IP from the topology, but translate id to ip using address map instead
  locator: token_metadata: remove unused ip based functions
  locator: network_topology_strategy: use host_id based function to check number of endpoints in dcs
  gossiper: drop get_unreachable_token_owners functions
  storage_service: use gossiper to map ip to id in node_ops operations
  storage_service: fix indentation after the last patch
  storage_service: drop loops from node ops replace_prepare handling since there can be only one replacing node
  token_metadata: drop no longer used functions
  ...
2025-01-17 11:00:52 +01:00
Gleb Natapov
122d58b4ad api: view_build_statuses: do not use IP from the topology, but translate id to ip using address map instead 2025-01-16 16:37:07 +02:00
Gleb Natapov
844cb090bf view: do not use get_endpoint_for_host_id_if_known to check if a node is part of the topology
Check directly in the topology instead.
2025-01-15 16:30:28 +02:00
Michael Litvak
7a6aec1a6c view_builder: hold semaphore during entire startup
Guard the whole view builder startup routine by holding the semaphore
until it's done instead of releasing it early, so that it's not
intercepted by migration notifications.
2025-01-14 12:31:29 +02:00
Michael Litvak
1104411f83 view_builder: pass view name by value to write_view_build_status
The function write_view_build_status takes two lambda functions and
chooses which of them to run depending on the upgrade state. It might
run both of them.

The parameters ks_name and view_name should be passed by value instead
of by reference because they are moved inside each lambda function.
Otherwise, if both lambdas are run, the second call operates on invalid
values that were moved.
2025-01-14 12:31:29 +02:00
Michael Litvak
b1be2d3c41 view_builder: write status to tables before starting to build
When adding a new view for building, first write the status to the
system tables and then add the view building step that will start
building it.

Otherwise, if we start building it before the status is written to the
table, it may happen that we complete building the view, write the
SUCCESS status, and then overwrite it with the STARTED status. The
view_build_status table will remain in incorrect state indicating the
view building is not complete.

Fixes scylladb/scylladb#20638
2025-01-14 12:31:20 +02:00
Michael Litvak
2a8ff478f0 view_builder: register listener for new views before reading views
When starting the view builder, we find all existing views in
`calculate_shard_build_step` and then register a listener for new views.
Between these steps we may yield and create a new view, then we miss
initializing the view build step for the new view, and we won't start
building it.

To fix this we first register the listener and then read existing views,
so a view can't be missed.

Fixes scylladb/scylladb#20338

Closes scylladb/scylladb#22184
2025-01-09 13:18:28 +02:00
Kefu Chai
e4463b11af treewide: replace boost::algorithm::join() with fmt::join()
Replace usages of `boost::algorithm::join()` with `fmt::join()` to improve
performance and reduce dependency on Boost. `fmt::join()` allows direct
formatting of ranges and tuples with custom separators without creating
intermediate strings.

When formatting comma-separated values into another string, fmt::join()
avoids the overhead of temporary string creation that
`boost::algorithm::join()` requires. This change also helps streamline
our dependencies by leveraging the existing fmt library instead of
Boost.Algorithm.

To avoid the ambiguity, some caller sites were updated to call
`seastar::format()` explicitly.

See also

- boost::algorithm::join():
  https://www.boost.org/doc/libs/1_87_0/doc/html/string_algo/reference.html#doxygen.join_8hpp
- fmt::join():
  https://fmt.dev/11.0/api/#ranges-api

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22082
2025-01-07 12:45:05 +02:00
Wojciech Mitros
37a25d3af4 mv: avoid stalls when calculating affected clustering ranges
Currently, when finishing db::view::calculate_affected_clustering_ranges
we deoverlap, transform and copy all ranges prepared before. This
is all done within a single continuation and can cause stalls.

We fix this by adding yields after each transform and moving elements
to the final vector one by one instead of copying them all at the end.

After this change, the longest continuation in this code will be
deoverlapping the initial ranges (and one transform). While it has
a relatively high computational complexity (we sort all ranges), it
should execute quickly because we're operating on views there and
we don't need to copy the actual bytes. If we encounter a stall there,
we'll need to implement an asynchronous `deoverlap` method.

Fixes scylladb/scylladb#21843

Closes scylladb/scylladb#21846
2024-12-19 12:50:30 +01:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Botond Dénes
34a8b492be Merge 'materialized view: make flow-control maximum delay configurable' from Piotr Dulikowski
This pull request is continuation of scylladb/scylladb#20688 - contents of the main commit are the same, the only change is the additional commit with a test.

Until this patch, the materialized view flow-control algorithm (https://www.scylladb.com/2018/12/04/worry-free-ingestion-flow-control/) used a constant delay_limit_us hard-coded to one second, which means that when the size of view-update backlog reached the maximum (10% of memory), we delay every request by an additional second - while smaller amounts of backlog will result in smaller delays.

This hard-coded one maximum second delay was considered *huge* - it will slow down a client with concurrency 1000 to just 1000 requests per second - but we already saw some workloads where it was not enough - such as a test workload running very slow reads at high concurrency on a slow machine, where a latency of over one second was expected for each read, so adding a one second latecy for writes wasn't having any noticable affect on slowing down the client.

So this patch replaces the hard-coded default with a live-updateable configuration parameter, `view_flow_control_delay_limit_in_ms`, which defaults to 1000ms as before.

Another useful way in which the new `view_flow_control_delay_limit_in_ms` can be used is to set it to 0. In that case, the view-update flow control always adds zero delay, and in effect - does absolutely nothing. This setting can be used in emergency situations where it is suspected that the MV flow control is not behaving properly, and the user wants to disable it.

The new parameter's help string mentions both these use cases of the parameter.

Fixes #18187

This is new functionality, no need to backport to any open source release.

Closes scylladb/scylladb#21647

* github.com:scylladb/scylladb:
  materialized views: test for the MV delay configuration parameter
  service: add injection for skipping view update backlog
  materialized view: make flow-control maximum delay configurable
2024-12-16 14:20:33 +02:00
muthu90tech
e49381119d locator: topology: use node& instead of node*
This change goes thru locator:topology to use node&
instead of node* where nullptr is not possible. There are
places where the node object is used in unordered_set, in
those cases the node is wrapped in std::reference_wrapper.

Fixes scylladb/scylladb#20357

Closes scylladb/scylladb#21863
2024-12-12 13:22:55 +01:00