Commit Graph

2795 Commits

Author SHA1 Message Date
Benny Halevy
243dc2efce hints: host_filter: check topology::has_endpoint if enabled_selectively
Don't call get_datacenter(ep) without checking
first has_endpoint(ep) since the former may abort
on internal error if the endpoint is not listed
in topology.

Refs #11870

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12054
2022-11-24 14:33:06 +03:00
Kamil Braun
d7649a86c4 Merge 'Build up to support of dynamic IP address changes in Raft' from Konstantin Osipov
We plan to stop storing IP addresses in Raft configuration, and instead
use the information disseminated through gossip to locate Raft peers.

Implement patches that are building up to that:
* improve Raft API of configuration change notifications
* disseminate raft host id in Gossip
* avoid using Raft addresses from Raft configuraiton, and instead
  consistently use the translation layer between raft server id <-> IP
  address

Closes #11953

* github.com:scylladb/scylladb:
  raft: persist the initial raft address map
  raft: (upgrade) do not use IP addresses from Raft config
  raft: (and gossip) begin gossiping raft server ids
  raft: change the API of conf change notifications
2022-11-18 11:38:19 +01:00
Botond Dénes
437fcdeeda Merge 'Make use of enum_set in directory lister' from Pavel Emelyanov
The lister accepts sort of a filter -- what kind of entries to list, regular, directories or both. It currently uses unordered_set, but enum_set is shorter and better describes the intent.

Closes #12017

* github.com:scylladb/scylladb:
  lister: Make lister::dir_entry_types an enum_set
  database: Avoid useless local variable
2022-11-18 12:15:26 +02:00
Asias He
4571fcf9e7 token_metadata: Rename is_member to is_normal_token_owner
The name is_normal_token_owner is more clear than is_member.
The is_normal_token_owner reflects what it really checks.
2022-11-18 09:29:20 +08:00
Pavel Emelyanov
bc62ca46d4 lister: Make lister::dir_entry_types an enum_set
This type is currently an unordered_set, but only consists of at most
two elements. Making it an enum_set renders it into a size_t variable
and better describes the intention.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-17 19:01:45 +03:00
Konstantin Osipov
262566216b raft: persist the initial raft address map 2022-11-17 14:26:36 +03:00
Konstantin Osipov
051dceeaff raft: (and gossip) begin gossiping raft server ids
We plan to use gossip data to educate Raft RPC about IP addresses
of raft peers. Add raft server ids to application state, so
that when we get a notification about a gossip peer we can
identify which raft server id this notification is for,
specifically, we can find what IP address stands for this server
id, and, whenever the IP address changes, we can update Raft
address map with the new address.

On the same token, at boot time, we now have to start Gossip
before Raft, since Raft won't be able to send any messages
without gossip data about IP addresses.
2022-11-17 12:07:31 +03:00
Benny Halevy
94f2e95a2f view: get_view_natural_endpoint: get topology from erm
Get the topology for the effective replication map rather
than from the storage_proxy to ensure its synchronized
with the natural endpoints.

Since there's no preemption between the two calls
currently there is no issue, so this is merely a clean up
of the code and not supposed to fix anything.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:56:34 +02:00
Nadav Har'El
2f2f01b045 materialized views: fix view writes after base table schema change
When we write to a materialized view, we need to know some information
defined in the base table such as the columns in its schema. We have
a "view_info" object that tracks each view and its base.

This view_info object has a couple of mutable attributes which are
used to lazily-calculate and cache the SELECT statement needed to
read from the base table. If the base-table schema ever changes -
and the code calls set_base_info() at that point - we need to forget
this cached statement. If we don't (as before this patch), the SELECT
will use the wrong schema and writes will no longer work.

This patch also includes a reproducing test that failed before this
patch, and passes afterwords. The test creates a base table with a
view that has a non-trivial SELECT (it has a filter on one of the
base-regular columns), makes a benign modification to the base table
(just a silly addition of a comment), and then tries to write to the
view - and before this patch it fails.

Fixes #10026
Fixes #11542
2022-11-16 13:58:21 +02:00
Pavel Emelyanov
bd48fdaad5 Merge 'handle_state_normal: do not update topology of removed endpoint' from Benny Halevy
Currently, when replacing a node ip, keeping the old host,
we might end up with the the old endpoint in system.peers
if it is inserted back into the topology by `handle_state_normal`
when on_join is called with the old endpoint.

Then, later on, on_change sees that:
```
    if (get_token_metadata().is_member(endpoint)) {
        co_await do_update_system_peers_table(endpoint, state, value);
```

As described in #11925.

Fixes #11925

Closes #11930

* github.com:scylladb/scylladb:
  storage_service, system_keyspace: add debugging around system.peers update
  storage_service: handle_state_normal: update topology and notify_joined endpoint only if not removed
2022-11-14 13:58:28 +03:00
Botond Dénes
21bc37603a Merge 'utils: config_src: add set_value_on_all_shards functions' from Benny Halevy
Currently when we set a single value we need
to call broadcast_to_all_shards to let observers on all
shards get notified of the new value.

However, the latter broadcasts all value to all shards
so it's terribly inefficient.

Instead, add async set_value_on_all_shards functions
to broadcast a value to all shards.

Use those in system_keyspace for db_config_table virtual table
and in task_manager_test to update the task_manager ttl.

Refs #7316

Closes #11893

* github.com:scylladb/scylladb:
  tests: check ttl on different shards
  utils: config_src: add set_value_on_all_shards functions
  utils: config_file: add config_source::API
2022-11-10 07:16:39 +02:00
Botond Dénes
3aff59f189 Merge 'staging sstables: filter tokens for view update generation' from Benny Halevy
This mini-series introduces dht::tokens_filter and uses it for consuming staging sstable in the view_update_generator.

The tokens_filter uses the token ranges owned by the current node, as retrieved by get_keyspace_local_ranges.

Refs #9559

Closes #11932

* github.com:scylladb/scylladb:
  db: view_update_generator: always clean up staging sstables
  compaction: extract incremental_owned_ranges_checker out to dht
2022-11-10 07:00:51 +02:00
Benny Halevy
38d8777d42 storage_service, system_keyspace: add debugging around system.peers update
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-09 14:45:47 +02:00
Benny Halevy
1a183047c0 utils: config_src: add set_value_on_all_shards functions
Currently when we set a single value we need
to call broadcast_to_all_shards to let observers on all
shards get notified of the new value.

However, the latter broadcasts all value to all shards
so it's terribly inefficient.

Instead, add async set_value_on_all_shards functions
to broadcast a value to all shards.

Use those in system_keyspace for db_config_table virtual table
and in task_manager_test to update the task_manager ttl.

Refs #7316

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-09 11:55:14 +02:00
Benny Halevy
10f8f13b90 db: view_update_generator: always clean up staging sstables
Since they are currently not cleaned up by cleanup compaction
filter their tokens, processing only tokens owned by the
current node (based on the keyspace replication strategy).

Refs #9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-09 07:38:22 +02:00
Avi Kivity
3d345609d8 config: disable "mc" format sstables for new data
"md" format was introduced in 4.3, in 3530e80ce1, two years ago.
Disable the option to create new sstables with the "mc" format.

Closes #11265
2022-11-08 08:36:27 +02:00
Aleksandra Martyniuk
e2c7c1495d repair: change UUID to task_id
Change type of repair id from utils::UUID to task_id to distinguish
them from ids of other entities.
2022-10-31 10:07:08 +01:00
Benny Halevy
40cd685371 storage_service: get_ignore_dead_nodes_for_replace: use tm.parse_host_id_and_endpoint
Allow specifying the dead node to ignore either as host_id
or ip address.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-28 07:38:13 +03:00
Pavel Emelyanov
64c9359443 storage_proxy: Don't use default-initialized endpoint in get_read_executor()
After calling filter_for_query() the extra_replica to speculate to may
be left default-initialized which is :0 ipv6 address. Later below this
address is used as-is to check if it belongs to the same DC or not which
is not nice, as :0 is not an address of any existing endpoint.

Recent move of dc/rack data onto topology made this place reveal itself
by emitting the internal error due to :0 not being present on the
topology's collection of endpoints. Prior to this move the dc filter
would count :0 as belonging to "default_dc" datacenter which may or may
not match with the dc of the local node.

The fix is to explicitly tell set extra_replica from unset one.

fixes: #11825

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11833
2022-10-25 09:16:50 +03:00
Botond Dénes
e981bd4f21 Merge 'Alternator, MV: fix bug in some view updates which set the view key to its existing value' from Nadav Har'El
As described in issue #11801, we saw in Alternator when a GSI has both partition and sort keys which were non-key attributes in the base, cases where updating the GSI-sort-key attribute to the same value it already had caused the entire GSI row to be deleted.

In this series fix this bug (it was a bug in our materialized views implementation) and add a reproducing test (plus a few more tests for similar situations which worked before the patch, and continue to work after it).

Fixes #11801

Closes #11808

* github.com:scylladb/scylladb:
  test/alternator: add test for issue 11801
  MV: fix handling of view update which reassign the same key value
  materialized views: inline used-once and confusing function, replace_entry()
2022-10-21 10:49:28 +03:00
Pavel Emelyanov
52d6e56a10 system_keyspace: Don't use global snitch instance
There are two places to patch: .start() and .setup() and both only need
snitch to get local dc/rack from, nothing more. Thus both can live with
the explicit argument for now

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-20 12:29:26 +03:00
Avi Kivity
69199dbfba Merge 'schema_tables: limit concurrency' from Benny Halevy
To prevent stalls due to large number of tables.

Fixes scylladb/scylladb#11574

Closes #11689

* github.com:scylladb/scylladb:
  schema_tables: merge_tables_and_views reindent
  schema_tables: limit paralellism
2022-10-19 18:40:45 +03:00
Nadav Har'El
8f4243b875 MV: fix handling of view update which reassign the same key value
When a materialized view has a key (in Alternator, this can be two
keys) which was a regular column in the base table, and a base
update modifies that regular column, there are two distinct cases:

1. If the old and new key values are different, we need to delete the
   old view row, and create a new view row (with the different key).

2. If the old and new key values are the same, we just need to update
   the pre-existing row.

It's important not to confuse the two cases: If we try to delete and
create the *same* view row in the same timestamp, the result will be
that the row will be deleted (a tombstone wins over data if they have the
same timestamp) instead of updated. This is what we saw in issue #11801.

We had a bug that was seen when an update set the view key column to
the old value it already had: To compare the old and new key values
we used the function compare_atomic_cell_for_merge(), but this compared
not just they values but also incorrectly compared the metadata such as
a the timestamp. Because setting a column to the same value changes its
timestamp, we wrongly concluded that these to be different view keys
and used the delete-and-create code for this case, resulting in the
view row being deleted (as explained above).

The simple fix is to compare just the key values - not looking at
the metadata.

See tests reproducing this bug and confirming its fix in the next patch.

Fixes #11801

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-10-19 13:43:12 +03:00
Nadav Har'El
e1f8cb6521 materialized views: inline used-once and confusing function, replace_entry()
The replace_entry() function is nothing more than a convenience for
calling delete_old_entry() and then create_entry(). But it is only used
once in the code, and we can just open-code the two calls instead of
the one.

The reason I want to change it now is that the shortcut replace_entry()
helped hide a bug (#11801) - replace_entry() works incorrectly if the
old and new row have the same key, because if they do we get a deletion
and creation of the same row with the same timestamp - and the deletion
wins. Having the two calls not hidden by a convenience function makes
this potential problem more apparent.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-10-19 13:25:34 +03:00
Benny Halevy
ce22dd4329 schema_tables: merge_tables_and_views reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-19 13:05:41 +03:00
Benny Halevy
7ccb0e70f0 schema_tables: limit paralellism
To prevent stalls due to large number of tables.

Fixes scylladb/scylladb#11574

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-19 13:05:38 +03:00
Botond Dénes
2d581e9e8f Merge "Maintain dc/rack by topology" from Pavel Emelyanov
"
There's an ongoing effort to move the endpoint -> {dc/rack} mappings
from snitch onto topology object and this set finalizes it. After it the
snitch service stops depending on gossiper and system keyspace and is
ready for de-globalization. As a nice side-effect the system keyspace no
longer needs to maintain the dc/rack info cache and its starting code gets
relaxed.

refs: #2737
refs: #2795
"

* 'br-snitch-dont-mess-with-topology-data-2' of https://github.com/xemul/scylla: (23 commits)
  system_keyspace: Dont maintain dc/rack cache
  system_keyspace: Indentation fix after previous patch
  system_keyspace: Coroutinuze build_dc_rack_info()
  topology: Move all post-configuration to topology::config
  snitch: Start early
  gossiper: Do not export system keyspace
  snitch: Remove gossiper reference
  snitch: Mark get_datacenter/_rack methods const
  snitch: Drop some dead dependency knots
  snitch, code: Make get_datacenter() report local dc only
  snitch, code: Make get_rack() report local rack only
  storage_service: Populate pending endpoint in on_alive()
  code: Populate pending locations
  topology: Put local dc/rack on topology early
  topology: Add pending locations collection
  topology: Make get_location() errors more verbose
  token_metadata: Add config, spread everywhere
  token_metadata: Hide token_metadata_impl copy constructor
  gosspier: Remove messaging service getter
  snitch: Get local address to gossip via config
  ...
2022-10-19 06:50:21 +03:00
Tomasz Grabiec
4ff204c028 Merge 'cache: make all removals of cache items explicit' from Michał Chojnowski
This series is a step towards non-LRU cache algorithms.

Our cache items are able to unlink themselves from the LRU list. (In other words, they can be unlinked solely via a pointer to the item, without access to the containing list head). Some places in the code make use of that, e.g. by relying on auto-unlink of items in their destructor.

However, to implement algorithms smarter than LRU, we might want to update some cache-wide metadata on item removal. But any cache-wide structures are unreachable through an item pointer, since items only have access to themselves and their immediate neighbours. Therefore, we don't want items to unlink themselves — we want `cache.remove(item)`, rather than `item.remove_self()`, because the former can update the metadata in `cache`.

This series inserts explicit item unlink calls in places that were previously relying on destructors, gets rid of other self-unlinks, and adds an assert which ensures that every item is explicitly unlinked before destruction.

Closes #11716

* github.com:scylladb/scylladb:
  utils: lru: assert that evictables are unlinked before destruction
  utils: lru: remove unlink_from_lru()
  cache: make all cache unlinks explicit
2022-10-17 12:47:02 +02:00
Michał Chojnowski
d785364375 cache: make all cache unlinks explicit
Our LSA cache is implemented as an auto_unlink Boost intrusive list, meaning
that elements of the list unlink themselves from the list automatically on
destruction. Some parts of the code rely on that, and don't unlink them
manually.

However, this precludes accurate bookkeeping about the cache. Elements only have
access to themselves and their neighbours, not to any bookkeeping context.
Therefore, a destructor cannot update the relevant metadata.

In this patch, we fix this by adding explicit unlink calls to places where it
would be done by a destructor. In a following patch, we will add an assert to
the destructor to check that every element is unlinked before destruction.
2022-10-17 12:07:27 +02:00
Avi Kivity
19e62d4704 commitlog: delete unused "num_deleted" variable
Since d478896d46 we update the variable, but never read it.
Clang 15 notices and complains. Remove the variable to make it
happy.

Closes #11765
2022-10-13 15:11:32 +02:00
Raphael S. Carvalho
ec79ac46c9 db/view: Add visibility to view updating of Staging SSTables
Today, we're completely blind about the progress of view updating
on Staging files. We don't know how long it will take, nor how much
progress we've made.

This patch adds visibility with a new metric that will inform
the number of bytes to be processed from Staging files.
Before any work is done, the metric tell us the total size to be
processed. As view updating progresses, the metric value is
expected to decrease, unless work is being produced faster than
we can consume them.

We're piggybacking on sstables::read_monitor, which allows the
progress metric to be updated whenever the SSTable reader makes
progress.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #11751
2022-10-12 16:57:37 +03:00
Kamil Braun
08e654abf5 Merge 'raft: (service) cleanups on the path for dynamic IP address support' from Konstantin Osipov
In preparation for supporting IP address changes of Raft Group 0:
1) Always use start_server_for_group0() to start a server for group 0.
   This will provide a single extension point when it's necessary to
   prompt raft_address_map with gossip data.
2) Don't use raft::server_address in discovery, since going forward
   discovery won't store raft::server_address. On the same token stop
   using discovery::peer_set anywhere outside discovery (for persistence),
   use a peer_list instead, which is easier to marshal.

Closes #11676

* github.com:scylladb/scylladb:
  raft: (discovery) do not use raft::server_address to carry IP data
  raft: (group0) API refactoring to avoid raft::server_address
  raft: rename group0_upgrade.hh to group0_fwd.hh
  raft: (group0) move the code around
  raft: (discovery) persist a list of discovered peers, not a set
  raft: (group0) always start group0 using start_server_for_group0()
2022-10-11 13:43:41 +02:00
Pavel Emelyanov
8b8b37cdda system_keyspace: Dont maintain dc/rack cache
Some good news finally. The saved dc/rack info about the ring is now
only loaded once on start. So the whole cache is not needed and the
loading code in storage_service can be greatly simplified

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:18:31 +03:00
Pavel Emelyanov
775f42c8d1 system_keyspace: Indentation fix after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:18:31 +03:00
Pavel Emelyanov
8f1df240c7 system_keyspace: Coroutinuze build_dc_rack_info()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:18:31 +03:00
Pavel Emelyanov
4206b1f98f snitch, code: Make get_datacenter() report local dc only
The continuation of the previous patch -- all the code uses
topology::get_datacenter(endpoint) to get peers' dc string. The topology
still uses snitch for that, but it already contains the needed data.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:17:08 +03:00
Pavel Emelyanov
6c6711404f snitch, code: Make get_rack() report local rack only
All the code out there now calls snitch::get_rack() to get rack for the
local node. For other nodes the topology::get_rack(endpoint) is used.
Since now the topology is properly populated with endpoints, it can
finally be patched to stop using snitch and get rack from its internal
collections

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:17:08 +03:00
Konstantin Osipov
3e46c32d7b raft: (discovery) do not use raft::server_address to carry IP data
We plan to remove IP information from Raft addresses.
raft::server_address is used in Raft configuration and
also in discovery, which is a separate algorithm, as a handy data
structure, to avoid having new entities in RPC.

Since we plan to remove IP addresses from Raft configuration,
using raft::server_address in discovery and still storing
IPs in it would create ambiguity: in some uses raft::server_address
would store an IP, and in others - would not.

So switch to an own data structure for the purposes of discovery,
discovery_peer, which contains a pair ip, raft server id.

Note to reviewers: ideally we should switch to URIs
in discovery_peer right away. Otherwise we may have to
deal with incompatible changes in discovery when adding URI
support to Scylla.
2022-10-10 16:24:33 +03:00
Pavel Emelyanov
b1f4273f0d large_data_handler: Use local system_keyspace to update entries
The l._d._h.'s way to update system keyspace is not like in other code.
Instead of a dedicated helper on the system_keyspace's side it executes
the insertion query directly with the help of qctx.

Now when the l._d._h. has the weak system keyspace reference it can
execute queries on _it_ rather than on the qctx.

Just like in previous patch, it needs to keep the sys._k.s. weak
reference alive until the query's future resolves.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Pavel Emelyanov
907fd2d355 system_keyspace: De-static compaction history update
Compaction manager now has the weak reference on the system keyspace
object and can use it to update its stats. It only needs to take care
and keep the shared pointer until the respective future resolves.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Pavel Emelyanov
3e0b61d707 compaction_manager: Relax history paths
There's a virtual method on table_state to update the entry in system
keyspace. It's an overkill to facilitate tests that don't want this.
With new system_keyspace weak referencing it can be made simpled by
moving the updating call to the compaction_manager itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Pavel Emelyanov
f9b57df471 database: Plug/unplug system_keyspace
There's a circular dependency between system_keyspace and database. The
former needs the latter because it needs to execula local requests via
query_processor. The latter needs the former via compaction manager and
large data handler, database depends on both and these too need to
insert their entries into system keyspace.

To cut this loop the compaction manager and large data handler both get
a weak reference on the system keysace. Once system keyspace starts is
activcates this reference via the database call. When system keyspace is
shutdown-ed on stop, it deactivates the reference.

Technically the weak reference is implemented by marking the system_k.s.
object as async_sharded_service, and the "reference" in question is the
shared_from_this() pointer. When compaction manager or large data
handler need to update a system keyspace's table, they both hold an
extra reference on the system keyspace until the entry is committed,
thus making sure that sys._k.s. doesn't stop from under their feet. At
the same time, unplugging the reference on shutdown makes sure that no
new entries update will appear and the system_k.s. will eventually be
released.

It's not a C++ classical reference, because system_keyspace starts after
and stops before database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Konstantin Osipov
224dd9ce1e raft: rename group0_upgrade.hh to group0_fwd.hh
The plan is to add other group-0-related forward declarations
to this file, not just the ones for upgrade.
2022-10-10 15:58:48 +03:00
Pavel Emelyanov
caed12c8f2 system_keyspace: Add .shutdown() method
Many services out there have one (sometimes called .drain()) that's
called early on stop and that's responsible for prearing the service for
stop -- aborting pending/in-flight fibers and alike.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 15:29:33 +03:00
Pavel Emelyanov
53bad617c0 virtual_tables: Use token_metadata.is_member()
This method just jumps into topology.has_endpoint(). The change is
for consistency with other users of it and as a preparation for
topology.has_endpoint() future enhancements

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 12:16:19 +03:00
Botond Dénes
b247f29881 Merge 'De-static system_keyspace::get_{saved|local}_tokens()' from Pavel Emelyanov
Yet another user of global qctx object. Making the method(s) non-static requires pushing the system_keyspace all the way down to size_estimate_virtual_reader and a small update of the cql_test_env

Closes #11738

* github.com:scylladb/scylladb:
  system_keyspace: Make get_{local|saved}_tokens non static
  size_estimates_virtual_reader: Pass sys_ks argument to get_local_ranges()
  cql_test_env: Keep sharded<system_keyspace> reference
  size_estimate_virtual_reader: Keep system_keyspace reference
  system_keyspace: Pass sys_ks argument to install_virtual_readers()
  system_keyspace: Make make() non-static
  distributed_loader: Pass sys_ks argument to init_system_keyspace()
  system_keyspace: Remove dangling forward declaration
2022-10-07 11:28:32 +03:00
Avi Kivity
20bad62562 Merge 'Detect and record large collections' from Benny Halevy
This series adds support for detecting collections that have too many items
and recording them in `system.large_cells`.

A configuration variable was added to db/config: `compaction_collection_items_count_warning_threshold` set by default to 10000.
Collections that have more items than this threshold will be warned about and will be recorded as a large cell in the `system.large_cells` table.  Documentation has been updated respectively.

A new column was added to system.large_cells: `collection_items`.
Similar to the `rows` column in system.large_partition, `collection_items` holds the number of items in a collection when the large cell is a collection, or 0 if it isn't.  Note that the collection may be recorded in system.large_cells either due to its size, like any other cell, and/or due to the number of items in it, if it cross the said threshold.

Note that #11449 called for a new system.large_collections table, but extending system.large_cells follows the logic of system.large_partitions is a smaller change overall, hence it was preferred.

Since the system keyspace schema is hard coded, the schema version of system.large_cells was bumped, and since the change is not backward compatible, we added a cluster feature - `LARGE_COLLECTION_DETECTION` - to enable using it.
The large_data_handler large cell detection record function will populate the new column only when the new cluster feature is enabled.

In addition, unit tests were added in sstable_3_x_test for testing large cells detection by cell size, and large_collection detection by the number of items.

Closes #11449

Closes #11674

* github.com:scylladb/scylladb:
  sstables: mx/writer: optimize large data stats members order
  sstables: mx/writer: keep large data stats entry as members
  db: large_data_handler: dynamically update config thresholds
  utils/updateable_value: add transforming_value_updater
  db/large_data_handler: cql_table_large_data_handler: record large_collections
  db/large_data_handler: pass ref to feature_service to cql_table_large_data_handler
  db/large_data_handler: cql_table_large_data_handler: move ctor out of line
  docs: large-rows-large-cells-tables: fix typos
  db/system_keyspace: add collection_elements column to system.large_cells
  gms/feature_service: add large_collection_detection cluster feature
  test: sstable_3_x_test: add test_sstable_too_many_collection_elements
  test: lib: simple_schema: add support for optional collection column
  test: lib: simple_schema: build schema in ctor body
  test: lib: simple_schema: cql: define s1 as static only if built this way
  db/large_data_handler: maybe_record_large_cells: consider collection_elements
  db/large_data_handler: debug cql_table_large_data_handler::delete_large_data_entries
  sstables: mx/writer: pass collection_elements to writer::maybe_record_large_cells
  sstables: mx/writer: add large_data_type::elements_in_collection
  db/large_data_handler: get the collection_elements_count_threshold
  db/config: add compaction_collection_elements_count_warning_threshold
  test: sstable_3_x_test: add test_sstable_write_large_cell
  test: sstable_3_x_test: pass cell_threshold_bytes to large_data_handler
  test: sstable_3_x_test: large_data_handler: prepare callback for testing large_cells
  test: sstable_3_x_test: large_data tests: use BOOST_REQUIRE_[GL]T
  test: sstable_3_x_test: test_sstable_log_too_many_rows: use tests::random
2022-10-06 18:28:21 +03:00
Pavel Emelyanov
59da903054 system_keyspace: Make get_{local|saved}_tokens non static
Now all callers have system_keyspace reference at hand. This removes one
more user of the global qctx object

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-06 18:02:09 +03:00
Pavel Emelyanov
b03f1e7b17 size_estimates_virtual_reader: Pass sys_ks argument to get_local_ranges()
This method static calls system_keyspace::get_local_tokens(). Having the
system_keyspace reference will make this method non-static

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-06 18:00:09 +03:00
Pavel Emelyanov
34e8e5959f size_estimate_virtual_reader: Keep system_keyspace reference
The s._e._v._reader::fill_buffer() method needs system keyspace to get
node's local tokens. Now it's a static method, having system_keyspace
reference will make it non-static

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-06 17:58:07 +03:00