Define table_schema_version as a distinct tagged_uuid class,
So it can be differentiated from other uuid-class types,
in particular table_id.
Added reversed(table_schema_version) for convenience
and uniformity since the same logic is currently open coded
in several places.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.
Fixes#11207
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Add include statements to satisfy dependencies.
Delete, now unneeded, include directives from the upper level
source files.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
idl definition files are not intended for direct
inclusion in .cc files.
Data types it represents are supposed to be defined
in regular C++ header, so define them in db/hints/scyn_point.hh
and include it rather then idl/hinted_handoff.idl.hh.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Pass an optional truncated_at time_point to
truncate_table_on_all_shards instead of the over-complicated
timestamp_func that returns the same time_point on all shards
anyhow, and was only used for coordination across shards.
Since now we synchronize the internal execution phase in
truncate_table_on_all_shards, there is no longer need
for this timestamp_func.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
timestamp_func
Since in the drop_table case we want to discard ALL
sstables in the table, not only those with `max_data_age()`
up until drop started.
Fixes#11232
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"
There are several helpers in this .cc file that need to get datacenter
for endpoints. For it they use global snitch, because there's no other
place out there to get that data from.
The whole dc/rack info is now moving to topology, so this set patches
the consistency_level.cc to get the topology. This is done two ways.
First, the helpers that have keyspace at hand may get the topology via
ks's effective_replication_map.
Two difficult cases are db::is_local() and db.count_local_endpoints()
because both have just inet_address at hand. Those are patched to be
methods of topology itself and all their callers already mess with
token metadata and can get topology from it.
"
* 'br-consistency-level-over-topology' of https://github.com/xemul/scylla:
consistency_level: Remove is_local() and count_local_endpoints()
storage_proxy: Use topology::local_endpoints_count()
storage_proxy: Use proxy's topology for DC checks
storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver
storage_proxy: Use topology local_dc_filter in its methods
storage_proxy: Mark some digest_read_resolver methods private
forwarding_service: Use topology local_dc_filter
storage_service: Use topology local_dc_filter
consistency_level: Use topology local_dc_filter
consitency-level: Call count_local_endpoints from topology
consistency_level: Get datacenter from topology
replication_strategy: Remove hold snitch reference
effective_replication_map: Get datacenter from topology
topology: Add local-dc detection shugar
No code uses them now -- switched to use topology -- so thse two can be
dropped together with their calls for global snitch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Similar to previous patch, in those places with keyspace object at
hand the topology can be obtained from ks' replication map
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In some of db/consistency_level.cc helpers the topology can be
obtained from keyspace's effective replication map
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Fixes#11184
Not including it here can cause our estimate of "delete or not" after replay
to be skewed in favour of retaining segments as (new) recycles (or even flip
a counter), and if we have repeated crash+restarts we could be accumulating
an effectivly ever increasing segment footprint
Closes#11205
Introduce a `remote` class that handles all remote communication in `storage_proxy`: sending and receiving RPCs, checking the state of other nodes by accessing the gossiper, and fetching schema.
The `remote` object lives inside `storage_proxy` and right now it's initialized and destroyed together with `storage_proxy`.
The long game here is to split the initialization of `storage_proxy` into two steps:
- the first step, which constructs `storage_proxy`, initializes it "locally" and does not require references to `messaging_service` and `gossiper`.
- the second step will take those references and add the `remote` part to `storage_proxy`.
This will allow us to remove some cycles from the service (de)initialization order and in general clean it up a bit. We'll be able to start `storage_proxy` right after the `database` (without messaging/gossiper). Similar refactors are planned for `query_processor`.
Closes#11088
* github.com:scylladb/scylladb:
service: storage_proxy: pass `migration_manager*` to `init_messaging_service`
service: storage_proxy: `remote`: make `_gossiper` a const reference
gms: gossiper: mark some member functions const
db: consistency_level: `filter_for_query`: take `const gossiper&`
replica: table: `get_hit_rate`: take `const gossiper&`
gms: gossiper: move `endpoint_filter` to `storage_proxy` module
service: storage_proxy: pass `shared_ptr<gossiper>` to `start_hints_manager`
service: storage_proxy: establish private section in `remote`
service: storage_proxy: remove `migration_manager` pointer
service: storage_proxy: remove calls to `storage_proxy::remote()` from `remote`
service: storage_proxy: remove `_gossiper` field
alternator: ttl: pass `gossiper&` to `expiration_service`
service: storage_proxy: move `truncate_blocking` implementation to `remote`
service: storage_proxy: introduce `is_alive` helper
service: storage_proxy: remove `_messaging` reference
service: storage_proxy: move `connection_dropped` to `remote`
service: storage_proxy: make `encode_replica_exception_for_rpc` a static function
service: storage_proxy: move `handle_write` to `remote`
service: storage_proxy: move `handle_paxos_prune` to `remote`
service: storage_proxy: move `handle_paxos_accept` to `remote`
service: storage_proxy: move `handle_paxos_prepare` to `remote`
service: storage_proxy: move `handle_truncate` to `remote`
service: storage_proxy: move `handle_read_digest` to `remote`
service: storage_proxy: move `handle_read_mutation_data` to `remote`
service: storage_proxy: move `handle_read_data` to `remote`
service: storage_proxy: move `handle_mutation_failed` to `remote`
service: storage_proxy: move `handle_mutation_done` to `remote`
service: storage_proxy: move `handle_paxos_learn` to `remote`
service: storage_proxy: move `receive_mutation_handler` to `remote`
service: storage_proxy: move `handle_counter_mutation` to `remote`
service: storage_proxy: remove `get_local_shared_storage_proxy`
service: storage_proxy: (de)register RPC handlers in `remote`
service: storage_proxy: introduce `remote`
"
The helper is in charge of receiving INTERNAL_IP app state from
gossiper join/change notifications, updating system.peers with it
and kicking messaging service to update its preferred ip cache
along with initiating clients reconnection.
Effectively this helper duplicates the topology tracking code in
storage-service notifiers. Removing it makes less code and drops
a bunch of unwanted cross-components dependencies, in particular:
- one qctx call is gone
- snitch (almost) no longer needs to get messaging from gossiper
- public:private IP cache becomes local to messaging and can be
moved to topology at low cost
Some nice minor side effect -- this helper was left unsubscribed
from gossiper on stop and snitch rename. Now its all gone.
"
* 'br-remove-reconnectible-snitch-helper-2' of https://github.com/xemul/scylla:
snitch: Remove reconnectable snitch helper
snitch, storage_service: Move reconnect to internal_ip kick
snitch, storage_service: Move system.peers preferred_ip update
snitch: Export prefer-local
Calling WebAssembly UDFs requires wasmtime instance. Creating such an instance is expensive,
but these instances can be reused for subsequent calls of the same UDF on various inputs.
This patch introduces a way of reusing wasmtime instances: a wasm instance cache.
The cache stores a wasmtime instance for each UDF and scheduling group. The instances are
evicted using LRU strategy and their size is based on the size of their wasm memories.
The instances stored in the cache are also dropped when the UDF is dropped itself. For that reason,
the first patch modifies the current implementation of UDF dropping, so that the instance dropping may be added
later. The patch also removes the need of compiling the UDF again when dropping it.
The second patch contains the implementation and use of the new cache. The cache is implemented
in `lang/wasm_instance_cache.hh` and the main ways of using it are the `run_script` methods from `wasm.hh`
The third patch adds tests to `test_wasm.py` that check the correctness and performance of the new
cache. The tests confirm the instance reuse, size limits, instance eviction after timeout and after dropping the UDF.
Closes#10306
* github.com:scylladb/scylladb:
wasm: test instances reuse
wasm: reuse UDF instances
schema_tables: simplify merge_functions and avoid extra compilation
Currently the INTERNAL_IP state is updated using reconnectable helper
by subscribing on on_join/on_change events from gossiper. The same
subscription exists in storage service (it's a bit more elaborated by
checking if the node is the part of the ring which is OK).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Code that waited for all remote view updates was already there. This
commit modifies the conditions of this wait to take into account the
"synchronous mode" (enabled when db::SYNCHRONOUS_VIEW_UPDATES_TAG_KEY is
set).
This commit defines a new tag key (SYNCHRONOUS_VIEW_UPDATES_TAG_KEY) to
be used for marking "synchronous mode" views. This key is used in
`cf_prop_defs::apply_to_builder` if the properties contain
KW_SYNCHRONOUS_UPDATES.
Tags are a useful mechanism that could be used outside of alternator
namespace. My motivation to move tags_extension and other utilities to
db/tags/ was that I wanted to use them to mark "synchronous mode" views.
I have extracted `get_tags_of_table`, `find_tag` and `update_tags`
method to db/tags/utils.cc and moved alternator/tags_extension.hh to
db/tags/.
The signature of `get_tags_of_table` was changed from `const
std::map<sstring, sstring>&` to `const std::map<sstring, sstring>*`
Original behavior of this function was to throw an
`alternator::api_error` exception. This was undesirable, as it
introduced a dependency on the alternator module. I chose to change it
to return a potentially null value, and added a wrapper function to the
alternator module - `get_tags_of_table_or_throw` to keep the previous
throwing behavior.
When executing a wasm UDF, most of the time is spent on
setting up the instance. To minimize its cost, we reuse
the instance using wasm::instance_cache.
This patch adds a wasm instance cache, that stores
a wasmtime instance for each UDF and scheduling group.
The instances are evicted using LRU strategy. The
cache may store some entries for the UDF after evicting
the instance, but they are evicted when the corresponding
UDF is dropped, which greatly limits their number.
The size of stored instances is estimated using the size
of their WASM memories. In order to be able to read the
size of memory, we require that the memory is exported
by the client.
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
Currently, we have 2 mere_functions methods, where one is only the only
call to the other. We can replace them with a simple one.
The merge_functions method compiles a UDF (using create_func) only to
read its signature. We can avoid that by reading it from the row ourselves.
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
This PR removes all code that used classes `restriction`, `restrictions` and their children.
There were two fields in `statement_restrictions` that needed to be dealt with: `_clustering_columns_restrictions` and `_nonprimary_key_restrictions`.
Each function was reimplemented to operate on the new expression representaiion and eventually these fields weren't needed anymore.
After that the restriction classes weren't used anymore and could be deleted as well.
Now all of the code responsible for analyzing WHERE clause and planning a query works on expressions.
Closes#11069
* github.com:scylladb/scylla:
cql3: Remove all remaining restrictions code
cql3: Move a function from restrictions class to the test
cql3: Remove initial_key_restrictions
cql3: expr: Remove convert_to_restriction
cql3: Remove _new from _new_nonprimary_key_restrictions
cql3: Remove _nonprimary_key_restrictions field
cql3: Reimplement uses of _nonprimary_key_restrictions using expression
cql3: Keep a map of single column nonprimary key restrictions
cql3: Remove _new from _new_clustering_columns_restrictions
cql3: Remove _clustering_columns_restrictions from statement_restrictions
cql3: Use a variable instead of dynamic cast
cql3: Use the new map of single column clustering restrictions
cql3: Keep a map of single column clustering key restrictions
cql3: Return an expression in get_clustering_columns_restrctions()
cql3: Reimplement _clustering_columns_restrictions->has_supporting_index()
cql3: Don't create single element conjunction
cql3: Add expr::index_supports_some_column
cql3: Reimplement has_unrestricted_components()
cql3: Reimplement _clustering_columns_restrictions->need_filtering()
cql3: Reimplement num_prefix_columns_that_need_not_be_filtered
cql3: Use the new clustering restrictions field instead of ->expression
cql3: Reimplement _clustering_columns_restrictions->size() using expressions
cql3: Reimplement _clustering_columns_restrictions->get_column_defs() using expressions
cql3: Reimplement _clustering_columns_restrictions->is_all_eq() using expressions
cql3: expr: Add has_only_eq_binops function
cql3: Reimplement _clustering_columns_restrictions->empty() using expressions
All parts of the code that use _nonprimary_key_restrictions
are changed to use _new_nonprimary_key_restrictions instead.
I decided not to split this into multiple commits,
as there isn't a lot of changes and they are
analogous to the ones done before for partition
and clustering columns.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
This PR extends #9209. It consists of 2 main points:
To enable parallelization of user-defined aggregates, reduction function was added to UDA definition. Reduction function is optional and it has to be scalar function that takes 2 arguments with type of UDA's state and returns UDA's state
All currently implemented native aggregates got their reducible counterpart, which return their state as final result, so it can be reduced with other result. Hence all native aggregates can now be distributed.
Local 3-node cluster made with current master. `node1` updated to this branch. Accessing node with `ccm <node-name> cqlsh`
I've tested belowed things from both old and new node:
- creating UDA with reduce function - not allowed
- selecting count(*) - distributed
- selecting other aggregate function - not distributed
Fixes: #10224Closes#10295
* github.com:scylladb/scylla:
test: add tests for parallelized aggregates
test: cql3: Add UDA REDUCEFUNC test
forward_service: enable multiple selection
forward_service: support UDA and native aggregate parallelization
cql3:functions: Add cql3::functions::functions::mock_get()
cql3: selection: detect parallelize reduction type
db,cql3: Move part of cql3's function into db
selection: detect if selectors factory contains only simple selectors
cql3: reducible aggregates
DB: Add `scylla_aggregates` system table
db,gms: Add SCYLLA_AGGREGATES schema features
CQL3: Add reduce function to UDA
gms: add UDA_NATIVE_PARALLELIZED_AGGREGATION feature
"
Same thing was done for compaction class some time ago, now
it's time for streaming to keep repair-generated IO in bounds.
This set mostly resembles the one for compaction IO class with
the exception that boot-time reshard/reshape currently runs in
streaming class, but that's nod great if the class is throttled,
so the set also moves boot-time IO into default IO class.
"
* 'br-streaming-class-throttling-2' of https://github.com/xemul/scylla:
distributed_loader: Populate keyspaces in default class
streaming: Maintain class bandwidth
streaming: Pass db::config& to manager constructor
config: Add stream_io_throughput_mb_per_sec option
sstables: Keep priority class on sstable_directory
get_clustering_columns_restrctions() used to return
a shared pointer to the clustering_restrictions class.
Now everything is being converted to expression,
so it should return an expression as well.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
This patch makes memtable_flush_static_shares liveupdateable
to avoid having to restart the cluster after updating
this config.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
This patch makes compaction_static_shares liveupdateable
to avoid having to restart the cluster after updating
this config.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
It's going to control the bandwidth for the streaming prio class.
For now it's jsut added but does't work for real
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Moving `function`, `function_name` and `aggregate_function` into
db namespace to avoid including cql3 namespace into query-request.
For now, only minimal subset of cql3 function was moved to db.
There is no need for utils::make_joinpoint now
that the function calls replica::database::drop_table_on_all_shards.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
So that the dropped table's directory can be
removed after it has been dropped on all shards
if it has no snapshots.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Runs drop_column_family on all database shards.
Will be extended later to consider removing the table directory.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now that we use emit_only_live_rows::no everywhere we can remove this
template parameters. Only the template parameter is removed, the
internal logic around it is left in place (will be removed in a next
patch), by hard-wiring `only_live()`.
emit_only_live_rows is a convenience so downstream consumers of the
mutation compactors don't have to check the `bool is_live` already
passed to them. This convenience however causes a template parameter and
additional logic for the compactor. As the most prominent of these
consumers (the query result builder) will soon have to switch to
emit_only_live_rows::no for other reasons anyway (it will want to count
tombstones), we take the opportunity to switch everybody to ::no. This
can be done with very little additional complexity to these consumer --
basically an additional if or two.
This prepares the ground for removing this template parameter and the
associate logic from the compactor.
Recently we noticed a regression where with certain versions of the fmt
library,
SELECT value FROM system.config WHERE name = 'experimental_features'
returns string numbers, like "5", instead of feature names like "raft".
It turns out that the fmt library keep changing their overload resolution
order when there are several ways to print something. For enum_option<T> we
happen to have to conflicting ways to print it:
1. We have an explicit operator<<.
2. We have an *implicit* convertor to the type held by T.
We were hoping that the operator<< always wins. But in fmt 8.1, there is
special logic that if the type is convertable to an int, this is used
before operator<<()! For experimental_features_t, the type held in it was
an old-style enum, so it is indeed convertible to int.
The solution I used in this patch is to replace the old-style enum
in experimental_features_t by the newer and more recommended "enum class",
which does not have an implicit conversion to int.
I could have fixed it in other ways, but it wouldn't have been much
prettier. For example, dropping the implicit convertor would require
us to change a bunch of switch() statements over enum_option (and
not just experimental_features_t, but other types of enum_option).
Going forward, all uses of enum_option should use "enum class", not
"enum". tri_mode_restriction_t was already using an enum class, and
now so does experimental_features_t. I changed the examples in the
comments to also use "enum class" instead of enum.
This patch also adds to the existing experimental_features test a
check that the feature names are words that are not numbers.
Fixes#11003.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11004