Commit Graph

2777 Commits

Author SHA1 Message Date
Pavel Emelyanov
64c9359443 storage_proxy: Don't use default-initialized endpoint in get_read_executor()
After calling filter_for_query() the extra_replica to speculate to may
be left default-initialized which is :0 ipv6 address. Later below this
address is used as-is to check if it belongs to the same DC or not which
is not nice, as :0 is not an address of any existing endpoint.

Recent move of dc/rack data onto topology made this place reveal itself
by emitting the internal error due to :0 not being present on the
topology's collection of endpoints. Prior to this move the dc filter
would count :0 as belonging to "default_dc" datacenter which may or may
not match with the dc of the local node.

The fix is to explicitly tell set extra_replica from unset one.

fixes: #11825

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11833
2022-10-25 09:16:50 +03:00
Botond Dénes
e981bd4f21 Merge 'Alternator, MV: fix bug in some view updates which set the view key to its existing value' from Nadav Har'El
As described in issue #11801, we saw in Alternator when a GSI has both partition and sort keys which were non-key attributes in the base, cases where updating the GSI-sort-key attribute to the same value it already had caused the entire GSI row to be deleted.

In this series fix this bug (it was a bug in our materialized views implementation) and add a reproducing test (plus a few more tests for similar situations which worked before the patch, and continue to work after it).

Fixes #11801

Closes #11808

* github.com:scylladb/scylladb:
  test/alternator: add test for issue 11801
  MV: fix handling of view update which reassign the same key value
  materialized views: inline used-once and confusing function, replace_entry()
2022-10-21 10:49:28 +03:00
Pavel Emelyanov
52d6e56a10 system_keyspace: Don't use global snitch instance
There are two places to patch: .start() and .setup() and both only need
snitch to get local dc/rack from, nothing more. Thus both can live with
the explicit argument for now

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-20 12:29:26 +03:00
Avi Kivity
69199dbfba Merge 'schema_tables: limit concurrency' from Benny Halevy
To prevent stalls due to large number of tables.

Fixes scylladb/scylladb#11574

Closes #11689

* github.com:scylladb/scylladb:
  schema_tables: merge_tables_and_views reindent
  schema_tables: limit paralellism
2022-10-19 18:40:45 +03:00
Nadav Har'El
8f4243b875 MV: fix handling of view update which reassign the same key value
When a materialized view has a key (in Alternator, this can be two
keys) which was a regular column in the base table, and a base
update modifies that regular column, there are two distinct cases:

1. If the old and new key values are different, we need to delete the
   old view row, and create a new view row (with the different key).

2. If the old and new key values are the same, we just need to update
   the pre-existing row.

It's important not to confuse the two cases: If we try to delete and
create the *same* view row in the same timestamp, the result will be
that the row will be deleted (a tombstone wins over data if they have the
same timestamp) instead of updated. This is what we saw in issue #11801.

We had a bug that was seen when an update set the view key column to
the old value it already had: To compare the old and new key values
we used the function compare_atomic_cell_for_merge(), but this compared
not just they values but also incorrectly compared the metadata such as
a the timestamp. Because setting a column to the same value changes its
timestamp, we wrongly concluded that these to be different view keys
and used the delete-and-create code for this case, resulting in the
view row being deleted (as explained above).

The simple fix is to compare just the key values - not looking at
the metadata.

See tests reproducing this bug and confirming its fix in the next patch.

Fixes #11801

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-10-19 13:43:12 +03:00
Nadav Har'El
e1f8cb6521 materialized views: inline used-once and confusing function, replace_entry()
The replace_entry() function is nothing more than a convenience for
calling delete_old_entry() and then create_entry(). But it is only used
once in the code, and we can just open-code the two calls instead of
the one.

The reason I want to change it now is that the shortcut replace_entry()
helped hide a bug (#11801) - replace_entry() works incorrectly if the
old and new row have the same key, because if they do we get a deletion
and creation of the same row with the same timestamp - and the deletion
wins. Having the two calls not hidden by a convenience function makes
this potential problem more apparent.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-10-19 13:25:34 +03:00
Benny Halevy
ce22dd4329 schema_tables: merge_tables_and_views reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-19 13:05:41 +03:00
Benny Halevy
7ccb0e70f0 schema_tables: limit paralellism
To prevent stalls due to large number of tables.

Fixes scylladb/scylladb#11574

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-19 13:05:38 +03:00
Botond Dénes
2d581e9e8f Merge "Maintain dc/rack by topology" from Pavel Emelyanov
"
There's an ongoing effort to move the endpoint -> {dc/rack} mappings
from snitch onto topology object and this set finalizes it. After it the
snitch service stops depending on gossiper and system keyspace and is
ready for de-globalization. As a nice side-effect the system keyspace no
longer needs to maintain the dc/rack info cache and its starting code gets
relaxed.

refs: #2737
refs: #2795
"

* 'br-snitch-dont-mess-with-topology-data-2' of https://github.com/xemul/scylla: (23 commits)
  system_keyspace: Dont maintain dc/rack cache
  system_keyspace: Indentation fix after previous patch
  system_keyspace: Coroutinuze build_dc_rack_info()
  topology: Move all post-configuration to topology::config
  snitch: Start early
  gossiper: Do not export system keyspace
  snitch: Remove gossiper reference
  snitch: Mark get_datacenter/_rack methods const
  snitch: Drop some dead dependency knots
  snitch, code: Make get_datacenter() report local dc only
  snitch, code: Make get_rack() report local rack only
  storage_service: Populate pending endpoint in on_alive()
  code: Populate pending locations
  topology: Put local dc/rack on topology early
  topology: Add pending locations collection
  topology: Make get_location() errors more verbose
  token_metadata: Add config, spread everywhere
  token_metadata: Hide token_metadata_impl copy constructor
  gosspier: Remove messaging service getter
  snitch: Get local address to gossip via config
  ...
2022-10-19 06:50:21 +03:00
Tomasz Grabiec
4ff204c028 Merge 'cache: make all removals of cache items explicit' from Michał Chojnowski
This series is a step towards non-LRU cache algorithms.

Our cache items are able to unlink themselves from the LRU list. (In other words, they can be unlinked solely via a pointer to the item, without access to the containing list head). Some places in the code make use of that, e.g. by relying on auto-unlink of items in their destructor.

However, to implement algorithms smarter than LRU, we might want to update some cache-wide metadata on item removal. But any cache-wide structures are unreachable through an item pointer, since items only have access to themselves and their immediate neighbours. Therefore, we don't want items to unlink themselves — we want `cache.remove(item)`, rather than `item.remove_self()`, because the former can update the metadata in `cache`.

This series inserts explicit item unlink calls in places that were previously relying on destructors, gets rid of other self-unlinks, and adds an assert which ensures that every item is explicitly unlinked before destruction.

Closes #11716

* github.com:scylladb/scylladb:
  utils: lru: assert that evictables are unlinked before destruction
  utils: lru: remove unlink_from_lru()
  cache: make all cache unlinks explicit
2022-10-17 12:47:02 +02:00
Michał Chojnowski
d785364375 cache: make all cache unlinks explicit
Our LSA cache is implemented as an auto_unlink Boost intrusive list, meaning
that elements of the list unlink themselves from the list automatically on
destruction. Some parts of the code rely on that, and don't unlink them
manually.

However, this precludes accurate bookkeeping about the cache. Elements only have
access to themselves and their neighbours, not to any bookkeeping context.
Therefore, a destructor cannot update the relevant metadata.

In this patch, we fix this by adding explicit unlink calls to places where it
would be done by a destructor. In a following patch, we will add an assert to
the destructor to check that every element is unlinked before destruction.
2022-10-17 12:07:27 +02:00
Avi Kivity
19e62d4704 commitlog: delete unused "num_deleted" variable
Since d478896d46 we update the variable, but never read it.
Clang 15 notices and complains. Remove the variable to make it
happy.

Closes #11765
2022-10-13 15:11:32 +02:00
Raphael S. Carvalho
ec79ac46c9 db/view: Add visibility to view updating of Staging SSTables
Today, we're completely blind about the progress of view updating
on Staging files. We don't know how long it will take, nor how much
progress we've made.

This patch adds visibility with a new metric that will inform
the number of bytes to be processed from Staging files.
Before any work is done, the metric tell us the total size to be
processed. As view updating progresses, the metric value is
expected to decrease, unless work is being produced faster than
we can consume them.

We're piggybacking on sstables::read_monitor, which allows the
progress metric to be updated whenever the SSTable reader makes
progress.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #11751
2022-10-12 16:57:37 +03:00
Kamil Braun
08e654abf5 Merge 'raft: (service) cleanups on the path for dynamic IP address support' from Konstantin Osipov
In preparation for supporting IP address changes of Raft Group 0:
1) Always use start_server_for_group0() to start a server for group 0.
   This will provide a single extension point when it's necessary to
   prompt raft_address_map with gossip data.
2) Don't use raft::server_address in discovery, since going forward
   discovery won't store raft::server_address. On the same token stop
   using discovery::peer_set anywhere outside discovery (for persistence),
   use a peer_list instead, which is easier to marshal.

Closes #11676

* github.com:scylladb/scylladb:
  raft: (discovery) do not use raft::server_address to carry IP data
  raft: (group0) API refactoring to avoid raft::server_address
  raft: rename group0_upgrade.hh to group0_fwd.hh
  raft: (group0) move the code around
  raft: (discovery) persist a list of discovered peers, not a set
  raft: (group0) always start group0 using start_server_for_group0()
2022-10-11 13:43:41 +02:00
Pavel Emelyanov
8b8b37cdda system_keyspace: Dont maintain dc/rack cache
Some good news finally. The saved dc/rack info about the ring is now
only loaded once on start. So the whole cache is not needed and the
loading code in storage_service can be greatly simplified

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:18:31 +03:00
Pavel Emelyanov
775f42c8d1 system_keyspace: Indentation fix after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:18:31 +03:00
Pavel Emelyanov
8f1df240c7 system_keyspace: Coroutinuze build_dc_rack_info()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:18:31 +03:00
Pavel Emelyanov
4206b1f98f snitch, code: Make get_datacenter() report local dc only
The continuation of the previous patch -- all the code uses
topology::get_datacenter(endpoint) to get peers' dc string. The topology
still uses snitch for that, but it already contains the needed data.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:17:08 +03:00
Pavel Emelyanov
6c6711404f snitch, code: Make get_rack() report local rack only
All the code out there now calls snitch::get_rack() to get rack for the
local node. For other nodes the topology::get_rack(endpoint) is used.
Since now the topology is properly populated with endpoints, it can
finally be patched to stop using snitch and get rack from its internal
collections

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-11 05:17:08 +03:00
Konstantin Osipov
3e46c32d7b raft: (discovery) do not use raft::server_address to carry IP data
We plan to remove IP information from Raft addresses.
raft::server_address is used in Raft configuration and
also in discovery, which is a separate algorithm, as a handy data
structure, to avoid having new entities in RPC.

Since we plan to remove IP addresses from Raft configuration,
using raft::server_address in discovery and still storing
IPs in it would create ambiguity: in some uses raft::server_address
would store an IP, and in others - would not.

So switch to an own data structure for the purposes of discovery,
discovery_peer, which contains a pair ip, raft server id.

Note to reviewers: ideally we should switch to URIs
in discovery_peer right away. Otherwise we may have to
deal with incompatible changes in discovery when adding URI
support to Scylla.
2022-10-10 16:24:33 +03:00
Pavel Emelyanov
b1f4273f0d large_data_handler: Use local system_keyspace to update entries
The l._d._h.'s way to update system keyspace is not like in other code.
Instead of a dedicated helper on the system_keyspace's side it executes
the insertion query directly with the help of qctx.

Now when the l._d._h. has the weak system keyspace reference it can
execute queries on _it_ rather than on the qctx.

Just like in previous patch, it needs to keep the sys._k.s. weak
reference alive until the query's future resolves.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Pavel Emelyanov
907fd2d355 system_keyspace: De-static compaction history update
Compaction manager now has the weak reference on the system keyspace
object and can use it to update its stats. It only needs to take care
and keep the shared pointer until the respective future resolves.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Pavel Emelyanov
3e0b61d707 compaction_manager: Relax history paths
There's a virtual method on table_state to update the entry in system
keyspace. It's an overkill to facilitate tests that don't want this.
With new system_keyspace weak referencing it can be made simpled by
moving the updating call to the compaction_manager itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Pavel Emelyanov
f9b57df471 database: Plug/unplug system_keyspace
There's a circular dependency between system_keyspace and database. The
former needs the latter because it needs to execula local requests via
query_processor. The latter needs the former via compaction manager and
large data handler, database depends on both and these too need to
insert their entries into system keyspace.

To cut this loop the compaction manager and large data handler both get
a weak reference on the system keysace. Once system keyspace starts is
activcates this reference via the database call. When system keyspace is
shutdown-ed on stop, it deactivates the reference.

Technically the weak reference is implemented by marking the system_k.s.
object as async_sharded_service, and the "reference" in question is the
shared_from_this() pointer. When compaction manager or large data
handler need to update a system keyspace's table, they both hold an
extra reference on the system keyspace until the entry is committed,
thus making sure that sys._k.s. doesn't stop from under their feet. At
the same time, unplugging the reference on shutdown makes sure that no
new entries update will appear and the system_k.s. will eventually be
released.

It's not a C++ classical reference, because system_keyspace starts after
and stops before database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Konstantin Osipov
224dd9ce1e raft: rename group0_upgrade.hh to group0_fwd.hh
The plan is to add other group-0-related forward declarations
to this file, not just the ones for upgrade.
2022-10-10 15:58:48 +03:00
Pavel Emelyanov
caed12c8f2 system_keyspace: Add .shutdown() method
Many services out there have one (sometimes called .drain()) that's
called early on stop and that's responsible for prearing the service for
stop -- aborting pending/in-flight fibers and alike.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 15:29:33 +03:00
Pavel Emelyanov
53bad617c0 virtual_tables: Use token_metadata.is_member()
This method just jumps into topology.has_endpoint(). The change is
for consistency with other users of it and as a preparation for
topology.has_endpoint() future enhancements

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 12:16:19 +03:00
Botond Dénes
b247f29881 Merge 'De-static system_keyspace::get_{saved|local}_tokens()' from Pavel Emelyanov
Yet another user of global qctx object. Making the method(s) non-static requires pushing the system_keyspace all the way down to size_estimate_virtual_reader and a small update of the cql_test_env

Closes #11738

* github.com:scylladb/scylladb:
  system_keyspace: Make get_{local|saved}_tokens non static
  size_estimates_virtual_reader: Pass sys_ks argument to get_local_ranges()
  cql_test_env: Keep sharded<system_keyspace> reference
  size_estimate_virtual_reader: Keep system_keyspace reference
  system_keyspace: Pass sys_ks argument to install_virtual_readers()
  system_keyspace: Make make() non-static
  distributed_loader: Pass sys_ks argument to init_system_keyspace()
  system_keyspace: Remove dangling forward declaration
2022-10-07 11:28:32 +03:00
Avi Kivity
20bad62562 Merge 'Detect and record large collections' from Benny Halevy
This series adds support for detecting collections that have too many items
and recording them in `system.large_cells`.

A configuration variable was added to db/config: `compaction_collection_items_count_warning_threshold` set by default to 10000.
Collections that have more items than this threshold will be warned about and will be recorded as a large cell in the `system.large_cells` table.  Documentation has been updated respectively.

A new column was added to system.large_cells: `collection_items`.
Similar to the `rows` column in system.large_partition, `collection_items` holds the number of items in a collection when the large cell is a collection, or 0 if it isn't.  Note that the collection may be recorded in system.large_cells either due to its size, like any other cell, and/or due to the number of items in it, if it cross the said threshold.

Note that #11449 called for a new system.large_collections table, but extending system.large_cells follows the logic of system.large_partitions is a smaller change overall, hence it was preferred.

Since the system keyspace schema is hard coded, the schema version of system.large_cells was bumped, and since the change is not backward compatible, we added a cluster feature - `LARGE_COLLECTION_DETECTION` - to enable using it.
The large_data_handler large cell detection record function will populate the new column only when the new cluster feature is enabled.

In addition, unit tests were added in sstable_3_x_test for testing large cells detection by cell size, and large_collection detection by the number of items.

Closes #11449

Closes #11674

* github.com:scylladb/scylladb:
  sstables: mx/writer: optimize large data stats members order
  sstables: mx/writer: keep large data stats entry as members
  db: large_data_handler: dynamically update config thresholds
  utils/updateable_value: add transforming_value_updater
  db/large_data_handler: cql_table_large_data_handler: record large_collections
  db/large_data_handler: pass ref to feature_service to cql_table_large_data_handler
  db/large_data_handler: cql_table_large_data_handler: move ctor out of line
  docs: large-rows-large-cells-tables: fix typos
  db/system_keyspace: add collection_elements column to system.large_cells
  gms/feature_service: add large_collection_detection cluster feature
  test: sstable_3_x_test: add test_sstable_too_many_collection_elements
  test: lib: simple_schema: add support for optional collection column
  test: lib: simple_schema: build schema in ctor body
  test: lib: simple_schema: cql: define s1 as static only if built this way
  db/large_data_handler: maybe_record_large_cells: consider collection_elements
  db/large_data_handler: debug cql_table_large_data_handler::delete_large_data_entries
  sstables: mx/writer: pass collection_elements to writer::maybe_record_large_cells
  sstables: mx/writer: add large_data_type::elements_in_collection
  db/large_data_handler: get the collection_elements_count_threshold
  db/config: add compaction_collection_elements_count_warning_threshold
  test: sstable_3_x_test: add test_sstable_write_large_cell
  test: sstable_3_x_test: pass cell_threshold_bytes to large_data_handler
  test: sstable_3_x_test: large_data_handler: prepare callback for testing large_cells
  test: sstable_3_x_test: large_data tests: use BOOST_REQUIRE_[GL]T
  test: sstable_3_x_test: test_sstable_log_too_many_rows: use tests::random
2022-10-06 18:28:21 +03:00
Pavel Emelyanov
59da903054 system_keyspace: Make get_{local|saved}_tokens non static
Now all callers have system_keyspace reference at hand. This removes one
more user of the global qctx object

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-06 18:02:09 +03:00
Pavel Emelyanov
b03f1e7b17 size_estimates_virtual_reader: Pass sys_ks argument to get_local_ranges()
This method static calls system_keyspace::get_local_tokens(). Having the
system_keyspace reference will make this method non-static

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-06 18:00:09 +03:00
Pavel Emelyanov
34e8e5959f size_estimate_virtual_reader: Keep system_keyspace reference
The s._e._v._reader::fill_buffer() method needs system keyspace to get
node's local tokens. Now it's a static method, having system_keyspace
reference will make it non-static

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-06 17:58:07 +03:00
Pavel Emelyanov
04552f2d58 system_keyspace: Pass sys_ks argument to install_virtual_readers()
The size-estimate-virtual-reader will need it, now it's available as
"this" from system_keyspace::make() method

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-06 17:57:13 +03:00
Pavel Emelyanov
1938412d7a system_keyspace: Make make() non-static
This helper needs system_keyspace reference and using "this" as this
looks natural. Also this de-static-ification makes it possible to put
some sense into the invoke_on_all() call from init_system_keyspace()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-06 17:56:11 +03:00
Pavel Emelyanov
e996503f0d system_keyspace: Remove dangling forward declaration
It doesn't match the real system_keyspace_make() definition and is in
fact not needed, as there's another "real" one in database.hh

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-06 17:54:22 +03:00
Benny Halevy
2c4ff71d2b db: large_data_handler: dynamically update config thresholds
make the various large data thresholds live-updateable
and construct the observers and updaters in
cql_table_large_data_handler to dynamically update
the base large_data_handler class threshold members.

Fixes #11685

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-05 10:53:40 +03:00
Avi Kivity
37c6b46d26 dirty_memory_manager: re-term "virtual dirty" to "unspooled dirty"
The "virtual dirty" term is not very informative. "Virtual" means
"not real", but it doesn't say in which way it isn't real.

In this case, virtual dirty refers to real dirty memory, minus
the portion of memtables that has been written to disk (but not
yet sealed - in that case it would not be dirty in the first
place).

I chose to call "the portion of memtables that has been written
to disk" as "spooled memory". At least the unique term will cause
people to look it up and may be easier to remember. From that
we have "unspooled memory".

I plan to further change the accounting to account for spooled memory
rather than unspooled, as that is a more natural term, but that is left
for later.

The documentation, config item, and metrics are adjusted. The config
item is practically unused so it isn't worth keeping compatibility here.
2022-10-04 14:03:59 +03:00
Benny Halevy
46ebffcc93 db/large_data_handler: cql_table_large_data_handler: record large_collections
When the large_collection_detection cluster feature is enabled,
select the internal_record_large_cells_and_collections method
to record the large collection cell, storing also the collection_elements
column.

We want to do that only when the cluster feature is enabled
to facilitate rollback in case rolling upgrade is aborted,
otherwise system.large_cells won't be backward compatible
and will have to be deleted manually.

Delete the sstable from system.large_cells if it contains
elements_in_collection above threshold.

Closes #11449

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-04 08:42:10 +03:00
Benny Halevy
3f8bba202f db/large_data_handler: pass ref to feature_service to cql_table_large_data_handler
For recording collection_elements of large_collections when
the large_collection_detection feature is enabled.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-04 08:42:10 +03:00
Benny Halevy
dc4e7d8e01 db/large_data_handler: cql_table_large_data_handler: move ctor out of line
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-04 08:42:09 +03:00
Benny Halevy
2f49eebb04 db/system_keyspace: add collection_elements column to system.large_cells
And bump the schema version offset since the new schema
should be distinguishable from the previous one.

Refs scylladb/scylladb#11660

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-04 08:42:08 +03:00
Benny Halevy
6dadca2648 db/large_data_handler: maybe_record_large_cells: consider collection_elements
Detect large_collections when the number of collection_elements
is above the configured threshold.

Next step would be to record the number of collection_elements
in the system.large_cells table, when the respective
cluster feature is enabled.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-04 08:42:05 +03:00
Benny Halevy
27ee75c54e db/large_data_handler: debug cql_table_large_data_handler::delete_large_data_entries
Log in debug level when deleting large data entry
from system table.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-04 08:42:04 +03:00
Benny Halevy
a107f583fd db/large_data_handler: get the collection_elements_count_threshold
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-04 08:31:11 +03:00
Benny Halevy
167ec84eeb db/config: add compaction_collection_elements_count_warning_threshold
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-04 08:31:10 +03:00
Botond Dénes
5621cdd7f9 db/view/view_builder: don't drop partition and range tombstones when resuming
The view builder builds the views from a given base table in
view_builder::batch_size batches of rows. After processing this many
rows, it suspends so the view builder can switch to building views for
other base tables in the name of fairness. When resuming the build step
for a given base table, it reuses the reader used previously (also
serving the role of a snapshot, pinning sstables read from). The
compactor however is created anew. As the reader can be in the middle of
a partition, the view builder injects a partition start into the
compactor to prime it for continuing the partition. This however only
included the partition-key, crucially missing any active tombstones:
partition tombstone or -- since the v2 transition -- active range
tombstone. This can result in base rows covered by either of this to be
resurrected and the view builder to generate view updates for them.
This patch solves this by using the detach-state mechanism of the
compactor which was explicitly developed for situations like this (in
the range scan code) -- resuming a read with the readers kept but the
compactor recreated.
Also included are two test cases reproducing the problem, one with a
range tombstone, the other with a partition tombstone.

Fixes: #11668

Closes #11671
2022-10-03 11:28:22 +03:00
Botond Dénes
ad04f200d3 Merge 'database: automatically take snapshot of base table views' from Benny Halevy
The logic to reject explicit snapshot of views/indexes was improved in aa127a2dbb. However, we never implemented auto-snapshot of
view/indexes when taking a snapshot of the base table.

This is implemented in this patch.

The implementation is built on top of
ba42852b0e
so it would be hard to backport to 5.1 or earlier
releases.

Fixes #11612

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11616

* github.com:scylladb/scylladb:
  database: automatically take snapshot of base table views
  api: storage_service: reject snapshot of views in api layer
2022-09-29 13:33:31 +03:00
Benny Halevy
d32c497cd9 database: automatically take snapshot of base table views
The logic to reject explicit snapshot of views/indexes
was improved in aa127a2dbb.
However, we never implemented auto-snapshot of
view/indexes when taking a snapshot of the base table.

This is implemented in this patch.

The implementation is built on top of
ba42852b0e
so it would be hard to backport to 5.1 or earlier
releases.

Fixes #11612

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 11:02:54 +03:00
Benny Halevy
55b0b8fe2c api: storage_service: reject snapshot of views in api layer
Rather than pushing the check to
`snapshot_ctl::take_column_family_snapshot`, just check
that explcitly when taking a snapshot of a particular
table by name over the api.

Other paths that call snapshot_ctl::take_column_family_snapshot
are internal and use it to snap views already.

With that, we can get rid of the allow_view_snapshots flag
that was introduced in aab4cd850c.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 10:44:56 +03:00
Benny Halevy
fcbbc3eb9c db/large_data_handler: print static cell/collection description in log warning
When warning about a large cell/collection in a static row,
print that fact in the log warning to make it clearer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-25 14:37:42 +03:00