scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 11:55:15 +00:00

Author	SHA1	Message	Date
Avi Kivity	421557b40a	Merge "Provide DC/RACK when populating topology" from Pavel E " The topology object maintains all sort of node/DC/RACK mappings on board. When new entries are added to it the DC and RACK are taken from the global snitch instance which, in turn, checks gossiper, system keyspace and its local caches. This set make topology population API require DC and RACK via the call argument. In most of the cases the populating code is the storage service that knows exactly where to get those from. After this set it will be possible to remove the dependency knot consiting of snitch, gossiper, system keyspace and messaging. " * 'br-topology-dc-rack-info' of https://github.com/xemul/scylla: toplogy: Use the provided dc/rack info test: Provide testing dc/rack infos storage_service: Provide dc/rack for snitch reconfiguration storage_service: Provide dc/rack from system ks on start storage_service: Provide dc/rack from gossiper for replacement storage_service: Provide dc/rack from gossiper for remotes storage_service,dht,repair: Provide local dc/rack from system ks system_keyspace: Cache local dc-rack on .start() topology: Some renames after previous patch topology: Require entry in the map for update_normal_tokens() topology: Make update_endpoint() accept dc-rack info replication_strategy: Accept dc-rack as get_pending_address_ranges argument dht: Carry dc-rack over boot_strapper and range_streamer storage_service: Make replacement info a real struct	2022-08-31 12:53:06 +03:00
Tomasz Grabiec	ae8d2a550d	db: schema_tables: Make table creation shadow earlier concurrent changes Issuing two CREATE TABLE statements with a different name for one of the partition key columns leads to the following assertion failure on all replicas: scylla: schema.cc:363: schema::schema(const schema::raw_schema&, std::optional<raw_view_info>): Assertion `!def.id \|\| def.id == id - column_offset(def.kind)' failed. The reason is that once the create table mutations are merged, the columns table contains two entries for the same position in the partition key tuple. If the schemas were the same, or not conflicting in a way which leads to abort, the current behavior would be to drop the older table as if the last CREATE TABLE was preceded by a DROP TABLE. The proposed fix is to make CREATE TABLE mutation include a tombstone for all older schema changes of this table, effectively overriding them. The behavior will be the same as if the schemas were not different, older table will be dropped. Fixes #11396	2022-08-29 12:06:02 +02:00
Tomasz Grabiec	661db2706f	db: schema_tables: Fix formatting	2022-08-26 17:37:48 +02:00
Pavel Emelyanov	a03d6f7751	system_keyspace: Cache local dc-rack on .start() There's a cache of endpoint:{dc,rack} on system keyspace cache, but the local node is not there, because this data is populated from the peers table, while local node's dc/rack is in snitch (or system.local table). At the same time, storage_service::join_cluster() and whoever it calls (e.g. -- the repair) will need this info on start and it's convenient to have this data on sys-ks cache. It's not on the peers part of the cache because next branch removes this map and it's going to be very clumsy to have a whole container with just one enty in it. There's a peer code in system_keyspace::setup() that gets the local node dc/rack and committs it into the system.local table. However, putting the data into cache is done on .start(). This is because cql-test-env needs this data cached too, but it doesn't call sys_ks.setup(). Will be cleaned some other day. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:47:30 +03:00
Avi Kivity	0dbcd13a0f	config: change logging::settings constructor call to use designated initializer Safer wrt reordering, and more readable too. Closes #11382	2022-08-26 06:14:01 +03:00
Wojciech Mitros	49dba4f0c1	functions: fix dropping of a keyspace with an aggregate in it Currently, if a keyspace has an aggregate and the keyspace is dropped, the keyspace becomes corrupted and another keyspace with the same name cannot be created again This is caused by the fact that when removing an aggregate, we call create_aggregate() to get values for its name and signature. In the create_aggregate(), we check whether the row and final functions for the aggregate exist. Normally, that's not an issue, because when dropping an existing aggregate alone, we know that its UDFs also exist. But when dropping and entire keyspace, we first drop the UDFs, making us unable to drop the aggregate afterwards. This patch fixes this behavior by removing the create_aggregate() from the aggregate dropping implementation and replacing it with specific calls for getting the aggregate name and signature. Additionally, a test that would previously fail is added to cql-pytest/test_uda.py where we drop a keyspace with an aggregate. Fixes #11327 Closes #11375	2022-08-25 16:28:57 +02:00
Kamil Braun	7e56251aea	service/raft: introduce `group0_upgrade_state` Define an enum class, `group0_upgrade_state`, describing the state of the upgrade procedure (implemented in later commits). Provide IDL definitions for (de)serialization. The node will have its current upgrade state stored on disk in `system.scylla_local` under the `group0_upgrade_state` key. If the key is not present we assume `use_pre_raft_procedures` (meaning we haven't started upgrading yet or we're at the beginning of upgrade). Introduce `system_keyspace` accessor methods for storing and retrieving the on-disk state.	2022-08-19 19:15:19 +02:00
Kamil Braun	547134faf4	db: system_keyspace: introduce `load_peers` Load the addresses of our peers from `system.peers`. Will be used be the Raft upgrade procedure to obtain the set of all peers.	2022-08-19 19:15:18 +02:00
Piotr Sarna	cf30d4cbcf	Merge 'Secondary index of collection columns' from Nadav Har'El This pull request introduces global secondary-indexing for non-frozen collections. The intent is to enable such queries: ``` CREATE TABLE test(int id, somemap map<int, int>, somelist<int>, someset<int>, PRIMARY KEY(id)); CREATE INDEX ON test(keys(somemap)); CREATE INDEX ON test(values(somemap)); CREATE INDEX ON test(entries(somemap)); CREATE INDEX ON test(values(somelist)); CREATE INDEX ON test(values(someset)); -- index on test(c) is the same as index on (values(c)) CREATE INDEX IF NOT EXISTS ON test(somelist); CREATE INDEX IF NOT EXISTS ON test(someset); CREATE INDEX IF NOT EXISTS ON test(somemap); SELECT * FROM test WHERE someset CONTAINS 7; SELECT * FROM test WHERE somelist CONTAINS 7; SELECT * FROM test WHERE somemap CONTAINS KEY 7; SELECT * FROM test WHERE somemap CONTAINS 7; SELECT * FROM test WHERE somemap[7] = 7; ``` We use here all-familiar materialized views (MVs). Scylla treats all the collections the same way - they're a list of pairs (key, value). In case of sets, the value type is dummy one. In case of lists, the key type is TIMEUUID. When describing the design, I will forget that there is more than one collection type. Suppose that the columns in the base table were as follows: ``` pkey int, ckey1 int, ckey2 int, somemap map<int, text>, PRIMARY KEY(pkey, ckey1, ckey2) ``` The MV schema is as follows (the names of columns which are not the same as in base might be different). All the columns here form the primary key. ``` -- for index over entries indexed_coll (int, text), idx_token long, pkey int, ckey1 int, ckey2 int -- for index over keys indexed_coll int, idx_token long, pkey int, ckey1 int, ckey2 int -- for index over values indexed_coll text, idx_token long, pkey int, ckey1 int, ckey2 int, coll_keys_for_values_index int ``` The reason for the last additional column is that the values from a collection might not be unique. Fixes #2962 Fixes #8745 Fixes #10707 This patch does not implement local secondary indexes for collection columns: Refs #10713. Closes #10841 * github.com:scylladb/scylladb: test/cql-pytest: un-xfail yet another passing collection-indexing test secondary index: fix paging in map value indexing test/cql-pytest: test for paging with collection values index cql, view: rename and explain bytes_with_action cql, index: make collection indexing a cluster feature test/cql-pytest: failing tests for oversized key values in MV and SI cql: fix secondary index "target" when column name has special characters cql, index: improve error messages cql, index: fix default index name for collection index test/cql-pytest: un-xfail several collecting indexing tests test/cql-pytest/test_secondary_index: verify that local index on collection fails. docs/design-notes/secondary_index: add `VALUES` to index target list test/cql-pytest/test_secondary_index: add randomized test for indexes on collections cql-pytest/cassandra_tests/.../secondary_index_test: fix error message in test ported from Cassandra cql-pytest/cassandra_tests/.../secondary_index_on_map_entries,select_test: test ported from Cassandra is expected to fail, since Scylla assumes that comparison with null doesn't throw error, just evaluates to false. Since it's not a bug, but expected behavior from the perspective of Scylla, we don't mark it as xfail. test/boost/secondary_index_test: update for non-frozen indexes on collections test/cql-pytest: Uncomment collection indexes tests that should be working now cql, index: don't use IS NOT NULL on collection column cql3/statements/select_statement: for index on values of collection, don't emit duplicate rows cql/expr/expression, index/secondary_index_manager: needs_filtering and index_supports_expression rewrite to accomodate for indexes over collections cql3, index: Use entries() indexes on collections for queries cql3, index: Use keys() and values() indexes on collections for queries. types/tuple: Use std::begin() instead of .begin() in tuple_type_impl::build_value_fragmented cql3/statements/index_target: throw exception to signalize that we didn't miss returning from function db/view/view.cc: compute view_updates for views over collections view info: has_computed_column_depending_on_base_non_primary_key column_computation: depends_on_non_primary_key_column schema, index/secondary_index_manager: make schema for index-induced mv index/secondary_index_manager: extract keys, values, entries types from collection cql3/statements/: validate CREATE INDEX for index over a collection cql3/statements/create_index_statement,index_target: rewrite index target for collection column_computation.hh, schema.cc: collection_column_computation column_computation.hh, schema.cc: compute_value interface refactor Cql.g, treewide: support cql syntax `INDEX ON table(VALUES(collection))`	2022-08-16 14:18:51 +02:00
Botond Dénes	d56dcb842c	db/virtual_table: add virtual destructor to virtual_table It should have had one, derived instances are stored and destroyed via the base-class. The only reason this haven't caused bugs yet is that derived instances happen to not have any non-trivial members yet. Closes #11293	2022-08-15 16:58:05 +03:00
Botond Dénes	a9573b84c5	Merge 'commitlog: Revert/modify `fac2bc4` - do footprint add in delete' from Calle Wilund Fixes #11184 Fixes #11237 In prev (broken) fix for https://github.com/scylladb/scylladb/issues/11184 we added the footprint for left-over files (replay candidates) to disk footprint on commitlog init. This effectively prevents us from creating segments iff we have tight limits. Since we nowadays do quite a bit of inserts _before_ commitlog replay (system.local, but...) we can end up in a situation where we deadlock start because we cannot get to the actual replay that will eventually free things. Another, not thought through, consequence is that we add a single footprint to _all_ commitlog shard instances - even though only shard 0 will get to actually replay + delete (i.e. drop footprint). So shards 1-X would all be either locked out or performance degraded. Simplest fix is to add the footprint in delete call instead. This will lock out segment creation until delete call is done, but this is fast. Also ensures that only replay shard is involved. To further emphasize this, don't store segments found on init scan in all shard instances, instead retrieve (based on low time-pos for current gen) when required. This changes very little, but we at last don't store pointless string lists in shards 1 to X, and also we can potentially ask for the list twice. More to the point, goes better hand-in-hand with the semantics of "delete_segments", where any file sent in is considered candidate for recycling, and included in footprint. Closes #11251 * github.com:scylladb/scylladb: commitlog: Make get_segments_to_replay on-demand commitlog: Revert/modify `fac2bc4` - do footprint add in delete	2022-08-15 09:10:32 +03:00
Kamil Braun	b4c5b79f5e	db: system_distributed_keyspace: don't call `on_internal_error` in `check_exists` The function `check_exists` checks whether a given table exists, giving an error otherwise. It previously used `on_internal_error`. `check_exists` is used in some old functions that insert CDC metadata to CDC tables. These tables are no longer used in newer Scylla versions (they were replaced with other tables with different schema), and this function is no longer called. The table definitions were removed and these tables are no longer created. They will only exists in clusters that were upgraded from old versions of Scylla (4.3) through a sequence of upgrades. If you tried to upgrade from a very old version of Scylla which had neither the old or the new tables to a modern version, say from 4.2 to 5.0, you would get `on_internal_error` from this `check_exists` function. Fortunately: 1. we don't support such upgrade paths 2. `on_internal_error` in production clusters does not crash the system, only throws. The exception would be catched, printed, and the system would run (just without CDC - until you finished upgrade and called the propoer nodetool command to fix the CDC module). Unfortunately, there is a dtest (`partitioner_tests.py`) which performs an unsupported upgrade scenario - it starts Scylla from Cassandra (!) work directories, which is like upgrading from a very old version of Scylla. This dtest was not failing due to another bug which masked the problem. When we try to fix the bug - see #11225 - the dtest starts hitting the assertion in `check_exists`. Because it's a test, we configure `on_internal_error` to crash the system. The point of this commit is to not crash the system in this rare scenario which happens only in some weird tests. We now throw `std::runtime_error` instead of calling `on_internal_error`. In the dtest, we already ignore the resulting CDC error appearing in the logs (see scylladb/scylla-dtest#2804). Together with this change, we'll be able to fix the #11225 bug and pass this test. Closes #11287	2022-08-14 13:12:03 +03:00
Nadav Har'El	5d556115a1	cql, view: rename and explain bytes_with_action The structure "bytes_with_action" was very hard to understand because of its mysterious and general-sounding name, and no comments. In this patch I add a large comment explaining its purpose, and rename it to a more suitable name, view_key_and_action, which suggests that each such object is about one view key (where to add a view row), and an additional "action" that we need to take beyond adding the view row. This is the best I can do to make this code easier to understand without completely reorganizing it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Michał Radwański	32289d681f	db/view/view.cc: compute view_updates for views over collections For collection indexes, logic of computing values for each of the column needed to change, since a single particular column might produce more than one value as a result. The liveness info from individual cells of the collection impacts the liveness info of resulting rows. Therefore it is needed to rewrite the control flow - instead of functions getting a row from get_view_row and later computing row markers and applying it, they compute these values by themselves. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:49 +03:00
Michał Radwański	112086767c	view info: has_computed_column_depending_on_base_non_primary_key In case of secondary indexes, if an index does not contain any column from the base which makes up for the primary key, then it is assumed that during update, a change to some cells from the base table cannot cause that we're dealing with a different row in the view. This however doesn't take into account the possibility of computed columns which in fact do depend on some non-primary-key columns. Introduce additional property of an index, has_computed_column_depending_on_base_non_primary_key.	2022-08-14 10:29:14 +03:00
Michał Radwański	ebc4ad4713	column_computation.hh, schema.cc: collection_column_computation This type of column computation will be used for creating updates to materialized views that are indexes over collections. This type features additional function, compute_values_with_action, which depending on an (optional) old row and new row (the update to the base table) returns multiple bytes_with_action, a vector of pairs (computed value, some action), where the action signifies whether a deletion of row with a specific key is needed, or creation thereby.	2022-08-14 10:29:13 +03:00
Michał Radwański	2babee2cdc	column_computation.hh, schema.cc: compute_value interface refactor The compute_value function of column_computation has had previously the following signature: virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const override; This is superfluous, since never in the history of Scylla, the last parameter (row) was used in any implentation, and never did it happen that it returned bytes_opt. The absurdity of this interface can be seen especially when looking at call sites like following, where dummy empty row was created: ``` token_column.get_computation().compute_value( *_schema, pkv_linearized, clustering_row(clustering_key_prefix::make_empty())); ```	2022-08-14 10:29:13 +03:00
Piotr Sarna	fe617ed198	Merge 'db/system_keyspace: in system.local, use broadcast_rpc_address in rpc_address column' from Piotr Dulikowski Previously, the `system.local`'s `rpc_address` column kept local node's `rpc_address` from the scylla.yaml configuration. Although it sounds like it makes sense, there are a few reasons to change it to the value of scylla.yaml's `broadcast_rpc_address`: - The `broadcast_rpc_address` is the address that the drivers are supposed to connect to. `rpc_address` is the address that the node binds to - it can be set for example to 0.0.0.0 so that Scylla listens on all addresses, however this gives no useful information to the driver. - The `system.peers` table also has the `rpc_address` column and it already keeps other nodes' `broadcast_rpc_address`es. - Cassandra is going to do the same change in the upcoming version 4.1. Fixes: #11201 Closes #11204 * github.com:scylladb/scylladb: db/system_keyspace: fix indentation after previous patch db/system_keyspace: in system.local, use broadcast_rpc_address in rpc_address column	2022-08-12 16:24:28 +02:00
Piotr Sarna	1ab4c6aab3	Merge 'cql3: enable collections as UDA accumulators' from Wojciech Mitros Currently, the initial values of UDA accumulators are converted to strings using the to_string() method and from strings using the from_string() method. The from_string() method is not implemented for collections, and it can't be implemented without changing the string format, because in that format, we cannot differentiate whether a separator is a part of a value or is an actual separator between values. In particular, the separators are not escaped in the collection values. Instead of from_string()/to_string() the cql parser is used for creating a value from a string (the same , and to_parsable_string() is used to converting a value into a string. A test using a list as an accumulator is added to cql-pytest/test_uda.py. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com> Closes #11250 * github.com:scylladb/scylladb: cql3: enable collections as UDA accumulators cql3: extend implementation of to_bytes for raw_value	2022-08-12 12:51:17 +02:00
Benny Halevy	d295d8e280	everywhere: define locator::host_id as a strong tagged_uuid type So it can be distinguished from other uuid-based identifiers in the system. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11276	2022-08-12 06:01:44 +03:00
Avi Kivity	46bd0b1e62	consistency_level: accept effective_replication_map as parameter, rather than keyspace A keyspace is a mutable object that can change from time to time. An effective_replication_map captures the state of a keyspace at a point in time and can therefore be consistent (with care from the caller). Change consistency_level's functions to accept an effective_replication_map. This allows the caller to ensure that separate calls use the same information and are consistent with each other. Current callers are likely correct since they are called from one continuation, but it's better to be sure.	2022-08-11 17:58:42 +03:00
Avi Kivity	1078d1bfda	consistency_level: be more const when using replication_strategy We don't modify the replication_strategy here, so use const. This will help when the object we get is const itself, as it will be in the next patches.	2022-08-11 17:58:42 +03:00
Wojciech Mitros	48bd752971	cql3: enable collections as UDA accumulators Currently, the initial values of UDA accumulators are converted to strings using the to_string() method and from strings using the from_string() method. The from_string() method is not implemented for collections, and it can't be implemented without changing the string format, because in that format, we cannot differentiate whether a separator is a part of a value or is an actual separator between values. In particular, the separators are not escaped in the collection values. For example, a list with string elements: 'a, b', 'c' would be represented as a string 'a, b, c', while now it is represented as "['a, b', 'c']". Some types that were parsable are now represented in a different way. For example, a tuple ('a', null, 0) was represented as "a:\@:0", and now it is "('a', null, 0)". Instead of from_string()/to_string() the cql parser is used for creating a value from a string (the same , and to_parsable_string() is used to converting a value into a string. A test using a list as an accumulator is added to cql-pytest/test_uda.py. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2022-08-11 16:23:57 +02:00
Calle Wilund	a729c2438e	commitlog: Make get_segments_to_replay on-demand Refs #11237 Don't store segments found on init scan in all shard instances, instead retrieve (based on low time-pos for current gen) when required. This changes very little, but we at last don't store pointless string lists in shards 1 to X, and also we can potentially ask for the list twice. More to the point, goes better hand-in-hand with the semantics of "delete_segments", where any file sent in is considered candidate for recycling, and included in footprint.	2022-08-11 06:41:23 +00:00
Calle Wilund	8116c56807	commitlog: Revert/modify `fac2bc4` - do footprint add in delete Fixes #11184 Fixes #11237 In prev (broken) fix for #11184 we added the footprint for left-over files (replay candidates) to disk footprint on commitlog init. This effectively prevents us from creating segments iff we have tight limits. Since we nowadays do quite a bit of inserts _before_ commitlog replay (system.local, but...) we can end up in a situation where we deadlock start because we cannot get to the actual replay that will eventually free things. Another, not thought through, consequence is that we add a single footprint to _all_ commitlog shard instances - even though only shard 0 will get to actually replay + delete (i.e. drop footprint). So shards 1-X would all be either locked out or performance degraded. Simplest fix is to add the footprint in delete call instead. This will lock out segment creation until delete call is done, but this is fast. Also ensures that only replay shard is involved.	2022-08-10 08:04:03 +00:00
Botond Dénes	7730419f5c	query-result-writer: stop when tombstone-limit is reached The query result writer now counts tombstones and cuts the page (marking it as a short one) when the tombstone limit is reached. This is to avoid timing out on large span of tombstones, especially prefixes. In the case of unpaged queries, we fail the read instead, similarly to how we do with max result size. If the limit is 0, the previous behaviour is used: tombstones are not taken into consideration at all.	2022-08-10 06:03:38 +03:00
Botond Dénes	d1d53f1b84	query: add tombstone-limit to read-command Propagate the tombstone-limit from coordinator to replicas, to make sure all is using the same limit.	2022-08-10 06:01:47 +03:00
Botond Dénes	33f0447ba0	db/config: add config item for query tombstone limit This will be the value used to break pages, after processing the specified amount of tombstones. The page will be cut even if empty. We could maybe use the already existing tombstone_{warn,fail}_threshold instead and use them as a soft/hard limit pair, like we did with page sizes.	2022-08-09 10:00:40 +03:00
Benny Halevy	2b017ce285	schema, everywhere: define and use table_schema_version as a strong type Define table_schema_version as a distinct tagged_uuid class, So it can be differentiated from other uuid-class types, in particular table_id. Added reversed(table_schema_version) for convenience and uniformity since the same logic is currently open coded in several places. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:09:45 +03:00
Benny Halevy	257d74bb34	schema, everywhere: define and use table_id as a strong type Define table_id as a distinct utils::tagged_uuid modeled after raft tagged_id, so it can be differentiated from other uuid-class types, in particular from table_schema_version. Fixes #11207 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:09:41 +03:00
Benny Halevy	6e77ad9392	system_keyspace: get_truncation_record: delete unused lambda capture Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:28 +03:00
Benny Halevy	1fda686f96	idl: make idl headers self-sufficient Add include statements to satisfy dependencies. Delete, now unneeded, include directives from the upper level source files. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	cfc7e9aa59	db: hints: sync_point: do not include idl definition file idl definition files are not intended for direct inclusion in .cc files. Data types it represents are supposed to be defined in regular C++ header, so define them in db/hints/scyn_point.hh and include it rather then idl/hinted_handoff.idl.hh. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	82fa205723	db/per_partition_rate_limit: tidy up headers self-sufficiency include what's needed where needed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	37b7a9cce2	utils: get rid of joinpoint Now that it is no longer used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	56f336d1aa	database: get rid of timestamp_func Pass an optional truncated_at time_point to truncate_table_on_all_shards instead of the over-complicated timestamp_func that returns the same time_point on all shards anyhow, and was only used for coordination across shards. Since now we synchronize the internal execution phase in truncate_table_on_all_shards, there is no longer need for this timestamp_func. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	d96b56fee2	database: rename {flush,snapshot}_on_all and make static Follow the convention of drop_table_on_all_shards. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	5e8c05f1a8	database: drop_table_on_all_shards: do not accept a truncated_at timestamp_func Since in the drop_table case we want to discard ALL sstables in the table, not only those with `max_data_age()` up until drop started. Fixes #11232 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:52:51 +03:00
Botond Dénes	fbbe2529c1	Merge "Remove global snitch usage from consistency_level.cc" from Pavel Emelyanov " There are several helpers in this .cc file that need to get datacenter for endpoints. For it they use global snitch, because there's no other place out there to get that data from. The whole dc/rack info is now moving to topology, so this set patches the consistency_level.cc to get the topology. This is done two ways. First, the helpers that have keyspace at hand may get the topology via ks's effective_replication_map. Two difficult cases are db::is_local() and db.count_local_endpoints() because both have just inet_address at hand. Those are patched to be methods of topology itself and all their callers already mess with token metadata and can get topology from it. " * 'br-consistency-level-over-topology' of https://github.com/xemul/scylla: consistency_level: Remove is_local() and count_local_endpoints() storage_proxy: Use topology::local_endpoints_count() storage_proxy: Use proxy's topology for DC checks storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver storage_proxy: Use topology local_dc_filter in its methods storage_proxy: Mark some digest_read_resolver methods private forwarding_service: Use topology local_dc_filter storage_service: Use topology local_dc_filter consistency_level: Use topology local_dc_filter consitency-level: Call count_local_endpoints from topology consistency_level: Get datacenter from topology replication_strategy: Remove hold snitch reference effective_replication_map: Get datacenter from topology topology: Add local-dc detection shugar	2022-08-05 13:31:55 +03:00
Pavel Emelyanov	c3718b7a6e	consistency_level: Remove is_local() and count_local_endpoints() No code uses them now -- switched to use topology -- so thse two can be dropped together with their calls for global snitch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:48 +03:00
Pavel Emelyanov	0da8caba1d	consistency_level: Use topology local_dc_filter The filter_for_query() helper has keyspace at hand Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:47 +03:00
Pavel Emelyanov	de58b33eee	consitency-level: Call count_local_endpoints from topology Similar to previous patch, in those places with keyspace object at hand the topology can be obtained from ks' replication map Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:47 +03:00
Pavel Emelyanov	f84ee8f0fb	consistency_level: Get datacenter from topology In some of db/consistency_level.cc helpers the topology can be obtained from keyspace's effective replication map Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:47 +03:00
Calle Wilund	fac2bc41ba	commitlog: Include "segments_to_replay" in initial footprint Fixes #11184 Not including it here can cause our estimate of "delete or not" after replay to be skewed in favour of retaining segments as (new) recycles (or even flip a counter), and if we have repeated crash+restarts we could be accumulating an effectivly ever increasing segment footprint Closes #11205	2022-08-05 12:16:53 +03:00
Pavel Emelyanov	527b345079	Merge 'storage_proxy: introduce a `remote` "subservice"' from Kamil Braun Introduce a `remote` class that handles all remote communication in `storage_proxy`: sending and receiving RPCs, checking the state of other nodes by accessing the gossiper, and fetching schema. The `remote` object lives inside `storage_proxy` and right now it's initialized and destroyed together with `storage_proxy`. The long game here is to split the initialization of `storage_proxy` into two steps: - the first step, which constructs `storage_proxy`, initializes it "locally" and does not require references to `messaging_service` and `gossiper`. - the second step will take those references and add the `remote` part to `storage_proxy`. This will allow us to remove some cycles from the service (de)initialization order and in general clean it up a bit. We'll be able to start `storage_proxy` right after the `database` (without messaging/gossiper). Similar refactors are planned for `query_processor`. Closes #11088 * github.com:scylladb/scylladb: service: storage_proxy: pass `migration_manager*` to `init_messaging_service` service: storage_proxy: `remote`: make `_gossiper` a const reference gms: gossiper: mark some member functions const db: consistency_level: `filter_for_query`: take `const gossiper&` replica: table: `get_hit_rate`: take `const gossiper&` gms: gossiper: move `endpoint_filter` to `storage_proxy` module service: storage_proxy: pass `shared_ptr<gossiper>` to `start_hints_manager` service: storage_proxy: establish private section in `remote` service: storage_proxy: remove `migration_manager` pointer service: storage_proxy: remove calls to `storage_proxy::remote()` from `remote` service: storage_proxy: remove `_gossiper` field alternator: ttl: pass `gossiper&` to `expiration_service` service: storage_proxy: move `truncate_blocking` implementation to `remote` service: storage_proxy: introduce `is_alive` helper service: storage_proxy: remove `_messaging` reference service: storage_proxy: move `connection_dropped` to `remote` service: storage_proxy: make `encode_replica_exception_for_rpc` a static function service: storage_proxy: move `handle_write` to `remote` service: storage_proxy: move `handle_paxos_prune` to `remote` service: storage_proxy: move `handle_paxos_accept` to `remote` service: storage_proxy: move `handle_paxos_prepare` to `remote` service: storage_proxy: move `handle_truncate` to `remote` service: storage_proxy: move `handle_read_digest` to `remote` service: storage_proxy: move `handle_read_mutation_data` to `remote` service: storage_proxy: move `handle_read_data` to `remote` service: storage_proxy: move `handle_mutation_failed` to `remote` service: storage_proxy: move `handle_mutation_done` to `remote` service: storage_proxy: move `handle_paxos_learn` to `remote` service: storage_proxy: move `receive_mutation_handler` to `remote` service: storage_proxy: move `handle_counter_mutation` to `remote` service: storage_proxy: remove `get_local_shared_storage_proxy` service: storage_proxy: (de)register RPC handlers in `remote` service: storage_proxy: introduce `remote`	2022-08-04 17:50:20 +03:00
Kamil Braun	a9fd156a1b	db: consistency_level: `filter_for_query`: take `const gossiper&`	2022-08-04 12:19:38 +02:00
Botond Dénes	df203a48af	Merge "Remove reconnectable_snitch_helper" from Pavel Emelyanov " The helper is in charge of receiving INTERNAL_IP app state from gossiper join/change notifications, updating system.peers with it and kicking messaging service to update its preferred ip cache along with initiating clients reconnection. Effectively this helper duplicates the topology tracking code in storage-service notifiers. Removing it makes less code and drops a bunch of unwanted cross-components dependencies, in particular: - one qctx call is gone - snitch (almost) no longer needs to get messaging from gossiper - public:private IP cache becomes local to messaging and can be moved to topology at low cost Some nice minor side effect -- this helper was left unsubscribed from gossiper on stop and snitch rename. Now its all gone. " * 'br-remove-reconnectible-snitch-helper-2' of https://github.com/xemul/scylla: snitch: Remove reconnectable snitch helper snitch, storage_service: Move reconnect to internal_ip kick snitch, storage_service: Move system.peers preferred_ip update snitch: Export prefer-local	2022-08-04 13:06:05 +03:00
Piotr Dulikowski	4f2adc14de	db/system_keyspace: fix indentation after previous patch	2022-08-03 13:19:19 +02:00
Piotr Dulikowski	eff8a6368c	db/system_keyspace: in system.local, use broadcast_rpc_address in rpc_address column Previously, the `system.local`'s `rpc_address` column kept local node's `rpc_address` from the scylla.yaml configuration. Although it sounds like it makes sense, there are a few reasons to change it to the value of scylla.yaml's `broadcast_rpc_address`: - The `broadcast_rpc_address` is the address that the drivers are supposed to connect to. `rpc_address` is the address that the node binds to - it can be set for example to 0.0.0.0 so that Scylla listens on all addresses, however this gives no useful information to the driver. - The `system.peers` table also has the `rpc_address` column and it already keeps other nodes' `broadcast_rpc_address`es. - Cassandra is going to do the same change in the upcoming version 4.1. Fixes: #11201	2022-08-03 13:19:03 +02:00
Avi Kivity	268e4abe77	Merge 'wasm: reuse instances for wasm UDFs' from Wojciech Mitros Calling WebAssembly UDFs requires wasmtime instance. Creating such an instance is expensive, but these instances can be reused for subsequent calls of the same UDF on various inputs. This patch introduces a way of reusing wasmtime instances: a wasm instance cache. The cache stores a wasmtime instance for each UDF and scheduling group. The instances are evicted using LRU strategy and their size is based on the size of their wasm memories. The instances stored in the cache are also dropped when the UDF is dropped itself. For that reason, the first patch modifies the current implementation of UDF dropping, so that the instance dropping may be added later. The patch also removes the need of compiling the UDF again when dropping it. The second patch contains the implementation and use of the new cache. The cache is implemented in `lang/wasm_instance_cache.hh` and the main ways of using it are the `run_script` methods from `wasm.hh` The third patch adds tests to `test_wasm.py` that check the correctness and performance of the new cache. The tests confirm the instance reuse, size limits, instance eviction after timeout and after dropping the UDF. Closes #10306 * github.com:scylladb/scylladb: wasm: test instances reuse wasm: reuse UDF instances schema_tables: simplify merge_functions and avoid extra compilation	2022-08-02 13:51:16 +03:00

1 2 3 4 5 ...

2699 Commits