Commit Graph

2699 Commits

Author SHA1 Message Date
Avi Kivity
421557b40a Merge "Provide DC/RACK when populating topology" from Pavel E
"
The topology object maintains all sort of node/DC/RACK mappings on
board. When new entries are added to it the DC and RACK are taken
from the global snitch instance which, in turn, checks gossiper,
system keyspace and its local caches.

This set make topology population API require DC and RACK via the
call argument. In most of the cases the populating code is the
storage service that knows exactly where to get those from.

After this set it will be possible to remove the dependency knot
consiting of snitch, gossiper, system keyspace and messaging.
"

* 'br-topology-dc-rack-info' of https://github.com/xemul/scylla:
  toplogy: Use the provided dc/rack info
  test: Provide testing dc/rack infos
  storage_service: Provide dc/rack for snitch reconfiguration
  storage_service: Provide dc/rack from system ks on start
  storage_service: Provide dc/rack from gossiper for replacement
  storage_service: Provide dc/rack from gossiper for remotes
  storage_service,dht,repair: Provide local dc/rack from system ks
  system_keyspace: Cache local dc-rack on .start()
  topology: Some renames after previous patch
  topology: Require entry in the map for update_normal_tokens()
  topology: Make update_endpoint() accept dc-rack info
  replication_strategy: Accept dc-rack as get_pending_address_ranges argument
  dht: Carry dc-rack over boot_strapper and range_streamer
  storage_service: Make replacement info a real struct
2022-08-31 12:53:06 +03:00
Tomasz Grabiec
ae8d2a550d db: schema_tables: Make table creation shadow earlier concurrent changes
Issuing two CREATE TABLE statements with a different name for one of
the partition key columns leads to the following assertion failure on
all replicas:

scylla: schema.cc:363: schema::schema(const schema::raw_schema&, std::optional<raw_view_info>): Assertion `!def.id || def.id == id - column_offset(def.kind)' failed.

The reason is that once the create table mutations are merged, the
columns table contains two entries for the same position in the
partition key tuple.

If the schemas were the same, or not conflicting in a way which leads
to abort, the current behavior would be to drop the older table as if
the last CREATE TABLE was preceded by a DROP TABLE.

The proposed fix is to make CREATE TABLE mutation include a tombstone
for all older schema changes of this table, effectively overriding
them. The behavior will be the same as if the schemas were not
different, older table will be dropped.

Fixes #11396
2022-08-29 12:06:02 +02:00
Tomasz Grabiec
661db2706f db: schema_tables: Fix formatting 2022-08-26 17:37:48 +02:00
Pavel Emelyanov
a03d6f7751 system_keyspace: Cache local dc-rack on .start()
There's a cache of endpoint:{dc,rack} on system keyspace cache, but the
local node is not there, because this data is populated from the peers
table, while local node's dc/rack is in snitch (or system.local table).

At the same time, storage_service::join_cluster() and whoever it calls
(e.g. -- the repair) will need this info on start and it's convenient
to have this data on sys-ks cache.

It's not on the peers part of the cache because next branch removes this
map and it's going to be very clumsy to have a whole container with just
one enty in it.

There's a peer code in system_keyspace::setup() that gets the local node
dc/rack and committs it into the system.local table. However, putting
the data into cache is done on .start(). This is because cql-test-env
needs this data cached too, but it doesn't call sys_ks.setup(). Will be
cleaned some other day.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:47:30 +03:00
Avi Kivity
0dbcd13a0f config: change logging::settings constructor call to use designated initializer
Safer wrt reordering, and more readable too.

Closes #11382
2022-08-26 06:14:01 +03:00
Wojciech Mitros
49dba4f0c1 functions: fix dropping of a keyspace with an aggregate in it
Currently, if a keyspace has an aggregate and the keyspace
is dropped, the keyspace becomes corrupted and another keyspace
with the same name cannot be created again

This is caused by the fact that when removing an aggregate, we
call create_aggregate() to get values for its name and signature.
In the create_aggregate(), we check whether the row and final
functions for the aggregate exist.
Normally, that's not an issue, because when dropping an existing
aggregate alone, we know that its UDFs also exist. But when dropping
and entire keyspace, we first drop the UDFs, making us unable to drop
the aggregate afterwards.

This patch fixes this behavior by removing the create_aggregate()
from the aggregate dropping implementation and replacing it with
specific calls for getting the aggregate name and signature.

Additionally, a test that would previously fail is added to
cql-pytest/test_uda.py where we drop a keyspace with an aggregate.

Fixes #11327

Closes #11375
2022-08-25 16:28:57 +02:00
Kamil Braun
7e56251aea service/raft: introduce group0_upgrade_state
Define an enum class, `group0_upgrade_state`, describing the state of
the upgrade procedure (implemented in later commits).

Provide IDL definitions for (de)serialization.

The node will have its current upgrade state stored on disk in
`system.scylla_local` under the `group0_upgrade_state` key. If the key
is not present we assume `use_pre_raft_procedures` (meaning we haven't
started upgrading yet or we're at the beginning of upgrade).

Introduce `system_keyspace` accessor methods for storing and retrieving
the on-disk state.
2022-08-19 19:15:19 +02:00
Kamil Braun
547134faf4 db: system_keyspace: introduce load_peers
Load the addresses of our peers from `system.peers`.

Will be used be the Raft upgrade procedure to obtain the set of all
peers.
2022-08-19 19:15:18 +02:00
Piotr Sarna
cf30d4cbcf Merge 'Secondary index of collection columns' from Nadav Har'El
This pull request introduces global secondary-indexing for non-frozen collections.

The intent is to enable such queries:

```
CREATE TABLE test(int id, somemap map<int, int>, somelist<int>, someset<int>, PRIMARY KEY(id));
CREATE INDEX ON test(keys(somemap));
CREATE INDEX ON test(values(somemap));
CREATE INDEX ON test(entries(somemap));
CREATE INDEX ON test(values(somelist));
CREATE INDEX ON test(values(someset));

-- index on test(c) is the same as index on (values(c))
CREATE INDEX IF NOT EXISTS ON test(somelist);
CREATE INDEX IF NOT EXISTS ON test(someset);
CREATE INDEX IF NOT EXISTS ON test(somemap);

SELECT * FROM test WHERE someset CONTAINS 7;
SELECT * FROM test WHERE somelist CONTAINS 7;
SELECT * FROM test WHERE somemap CONTAINS KEY 7;
SELECT * FROM test WHERE somemap CONTAINS 7;
SELECT * FROM test WHERE somemap[7] = 7;
```

We use here all-familiar materialized views (MVs). Scylla treats all the
collections the same way - they're a list of pairs (key, value). In case
of sets, the value type is dummy one. In case of lists, the key type is
TIMEUUID. When describing the design, I will forget that there is more
than one collection type.  Suppose that the columns in the base table
were as follows:

```
pkey int, ckey1 int, ckey2 int, somemap map<int, text>, PRIMARY KEY(pkey, ckey1, ckey2)
```

The MV schema is as follows (the names of columns which are not the same
as in base might be different). All the columns here form the primary
key.

```
-- for index over entries
indexed_coll (int, text), idx_token long, pkey int, ckey1 int, ckey2 int
-- for index over keys
indexed_coll int, idx_token long, pkey int, ckey1 int, ckey2 int
-- for index over values
indexed_coll text, idx_token long, pkey int, ckey1 int, ckey2 int, coll_keys_for_values_index int
```

The reason for the last additional column is that the values from a collection might not be unique.

Fixes #2962
Fixes #8745
Fixes #10707

This patch does not implement **local** secondary indexes for collection columns: Refs #10713.

Closes #10841

* github.com:scylladb/scylladb:
  test/cql-pytest: un-xfail yet another passing collection-indexing test
  secondary index: fix paging in map value indexing
  test/cql-pytest: test for paging with collection values index
  cql, view: rename and explain bytes_with_action
  cql, index: make collection indexing a cluster feature
  test/cql-pytest: failing tests for oversized key values in MV and SI
  cql: fix secondary index "target" when column name has special characters
  cql, index: improve error messages
  cql, index: fix default index name for collection index
  test/cql-pytest: un-xfail several collecting indexing tests
  test/cql-pytest/test_secondary_index: verify that local index on collection fails.
  docs/design-notes/secondary_index: add `VALUES` to index target list
  test/cql-pytest/test_secondary_index: add randomized test for indexes on collections
  cql-pytest/cassandra_tests/.../secondary_index_test: fix error message in test ported from Cassandra
  cql-pytest/cassandra_tests/.../secondary_index_on_map_entries,select_test: test ported from Cassandra is expected to fail, since Scylla assumes that comparison with null doesn't throw error, just evaluates to false. Since it's not a bug, but expected behavior from the perspective of Scylla, we don't mark it as xfail.
  test/boost/secondary_index_test: update for non-frozen indexes on collections
  test/cql-pytest: Uncomment collection indexes tests that should be working now
  cql, index: don't use IS NOT NULL on collection column
  cql3/statements/select_statement: for index on values of collection, don't emit duplicate rows
  cql/expr/expression, index/secondary_index_manager: needs_filtering and index_supports_expression rewrite to accomodate for indexes over collections
  cql3, index: Use entries() indexes on collections for queries
  cql3, index: Use keys() and values() indexes on collections for queries.
  types/tuple: Use std::begin() instead of .begin() in tuple_type_impl::build_value_fragmented
  cql3/statements/index_target: throw exception to signalize that we didn't miss returning from function
  db/view/view.cc: compute view_updates for views over collections
  view info: has_computed_column_depending_on_base_non_primary_key
  column_computation: depends_on_non_primary_key_column
  schema, index/secondary_index_manager: make schema for index-induced mv
  index/secondary_index_manager: extract keys, values, entries types from collection
  cql3/statements/: validate CREATE INDEX for index over a collection
  cql3/statements/create_index_statement,index_target: rewrite index target for collection
  column_computation.hh, schema.cc: collection_column_computation
  column_computation.hh, schema.cc: compute_value interface refactor
  Cql.g, treewide: support cql syntax `INDEX ON table(VALUES(collection))`
2022-08-16 14:18:51 +02:00
Botond Dénes
d56dcb842c db/virtual_table: add virtual destructor to virtual_table
It should have had one, derived instances are stored and destroyed via
the base-class. The only reason this haven't caused bugs yet is that
derived instances happen to not have any non-trivial members yet.

Closes #11293
2022-08-15 16:58:05 +03:00
Botond Dénes
a9573b84c5 Merge 'commitlog: Revert/modify fac2bc4 - do footprint add in delete' from Calle Wilund
Fixes #11184
Fixes #11237

In prev (broken) fix for https://github.com/scylladb/scylladb/issues/11184 we added the footprint for left-over
files (replay candidates) to disk footprint on commitlog init.

This effectively prevents us from creating segments iff we have tight limits. Since we nowadays do quite a bit of inserts _before_ commitlog replay (system.local, but...) we can end up in a situation where we deadlock start because we cannot get to the actual replay that will eventually free things.

Another, not thought through, consequence is that we add a single footprint to _all_ commitlog shard instances - even though only shard 0 will get to actually replay + delete (i.e. drop footprint).
So shards 1-X would all be either locked out or performance degraded.

Simplest fix is to add the footprint in delete call instead. This will lock out segment creation until delete call is done, but this is fast. Also ensures that only replay shard is involved.

To further emphasize this, don't store segments found on init scan in all shard instances,
instead retrieve (based on low time-pos for current gen) when required. This changes very little, but we at last don't store
pointless string lists in shards 1 to X, and also we can potentially ask for the list twice.
More to the point, goes better hand-in-hand with the semantics of "delete_segments", where any file sent in is
considered candidate for recycling, and included in footprint.

Closes #11251

* github.com:scylladb/scylladb:
  commitlog: Make get_segments_to_replay on-demand
  commitlog: Revert/modify fac2bc4 - do footprint add in delete
2022-08-15 09:10:32 +03:00
Kamil Braun
b4c5b79f5e db: system_distributed_keyspace: don't call on_internal_error in check_exists
The function `check_exists` checks whether a given table exists, giving
an error otherwise. It previously used `on_internal_error`.

`check_exists` is used in some old functions that insert CDC metadata to
CDC tables. These tables are no longer used in newer Scylla versions
(they were replaced with other tables with different schema), and this
function is no longer called. The table definitions were removed and
these tables are no longer created. They will only exists in clusters
that were upgraded from old versions of Scylla (4.3) through a sequence
of upgrades.

If you tried to upgrade from a very old version of Scylla which had
neither the old or the new tables to a modern version, say from 4.2 to
5.0, you would get `on_internal_error` from this `check_exists`
function. Fortunately:
1. we don't support such upgrade paths
2. `on_internal_error` in production clusters does not crash the system,
   only throws. The exception would be catched, printed, and the system
   would run (just without CDC - until you finished upgrade and called
   the propoer nodetool command to fix the CDC module).

Unfortunately, there is a dtest (`partitioner_tests.py`) which performs
an unsupported upgrade scenario - it starts Scylla from Cassandra (!)
work directories, which is like upgrading from a very old version of
Scylla.

This dtest was not failing due to another bug which masked the problem.
When we try to fix the bug - see #11225 - the dtest starts hitting the
assertion in `check_exists`. Because it's a test, we configure
`on_internal_error` to crash the system.

The point of this commit is to not crash the system in this rare
scenario which happens only in some weird tests. We now throw
`std::runtime_error` instead of calling `on_internal_error`. In the
dtest, we already ignore the resulting CDC error appearing in the logs
(see scylladb/scylla-dtest#2804). Together with this change, we'll be
able to fix the #11225 bug and pass this test.

Closes #11287
2022-08-14 13:12:03 +03:00
Nadav Har'El
5d556115a1 cql, view: rename and explain bytes_with_action
The structure "bytes_with_action" was very hard to understand because of
its mysterious and general-sounding name, and no comments.

In this patch I add a large comment explaining its purpose, and rename
it to a more suitable name, view_key_and_action, which suggests that
each such object is about one view key (where to add a view row), and
an additional "action" that we need to take beyond adding the view row.

This is the best I can do to make this code easier to understand without
completely reorganizing it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Michał Radwański
32289d681f db/view/view.cc: compute view_updates for views over collections
For collection indexes, logic of computing values for each of the column
needed to change, since a single particular column might produce more
than one value as a result.

The liveness info from individual cells of the collection impacts the
liveness info of resulting rows. Therefore it is needed to rewrite the
control flow - instead of functions getting a row from get_view_row and
later computing row markers and applying it, they compute these values
by themselves.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:49 +03:00
Michał Radwański
112086767c view info: has_computed_column_depending_on_base_non_primary_key
In case of secondary indexes, if an index does not contain any column from
the base which makes up for the primary key, then it is assumed that
during update, a change to some cells from the base table cannot cause
that we're dealing with a different row in the view. This however
doesn't take into account the possibility of computed columns which in
fact do depend on some non-primary-key columns. Introduce additional
property of an index,
has_computed_column_depending_on_base_non_primary_key.
2022-08-14 10:29:14 +03:00
Michał Radwański
ebc4ad4713 column_computation.hh, schema.cc: collection_column_computation
This type of column computation will be used for creating updates to
materialized views that are indexes over collections.

This type features additional function, compute_values_with_action,
which depending on an (optional) old row and new row (the update to the
base table) returns multiple bytes_with_action, a vector of pairs
(computed value, some action), where the action signifies whether a
deletion of row with a specific key is needed, or creation thereby.
2022-08-14 10:29:13 +03:00
Michał Radwański
2babee2cdc column_computation.hh, schema.cc: compute_value interface refactor
The compute_value function of column_computation has had previously the
following signature:
    virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const override;

This is superfluous, since never in the history of Scylla, the last
parameter (row) was used in any implentation, and never did it happen
that it returned bytes_opt. The absurdity of this interface can be seen
especially when looking at call sites like following, where dummy empty
row was created:
```
    token_column.get_computation().compute_value(
            *_schema, pkv_linearized, clustering_row(clustering_key_prefix::make_empty()));
```
2022-08-14 10:29:13 +03:00
Piotr Sarna
fe617ed198 Merge 'db/system_keyspace: in system.local, use broadcast_rpc_address in rpc_address column' from Piotr Dulikowski
Previously, the `system.local`'s `rpc_address` column kept local node's
`rpc_address` from the scylla.yaml configuration. Although it sounds
like it makes sense, there are a few reasons to change it to the value
of scylla.yaml's `broadcast_rpc_address`:

- The `broadcast_rpc_address` is the address that the drivers are
  supposed to connect to. `rpc_address` is the address that the node
  binds to - it can be set for example to 0.0.0.0 so that Scylla listens
  on all addresses, however this gives no useful information to the
  driver.
- The `system.peers` table also has the `rpc_address` column and it
  already keeps other nodes' `broadcast_rpc_address`es.
- Cassandra is going to do the same change in the upcoming version 4.1.

Fixes: #11201

Closes #11204

* github.com:scylladb/scylladb:
  db/system_keyspace: fix indentation after previous patch
  db/system_keyspace: in system.local, use broadcast_rpc_address in rpc_address column
2022-08-12 16:24:28 +02:00
Piotr Sarna
1ab4c6aab3 Merge 'cql3: enable collections as UDA accumulators' from Wojciech Mitros
Currently, the initial values of UDA accumulators are converted
to strings using the to_string() method and from strings using the
from_string() method. The from_string() method is not implemented
for collections, and it can't be implemented without changing the
string format, because in that format, we cannot differentiate
whether a separator is a part of a value or is an actual separator
between values. In particular, the separators are not escaped
in the collection values.

Instead of from_string()/to_string() the cql parser is used
for creating a value from a string (the same , and to_parsable_string()
is used to converting a value into a string.

A test using a list as an accumulator is added to
cql-pytest/test_uda.py.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>

Closes #11250

* github.com:scylladb/scylladb:
  cql3: enable collections as UDA accumulators
  cql3: extend implementation of to_bytes for raw_value
2022-08-12 12:51:17 +02:00
Benny Halevy
d295d8e280 everywhere: define locator::host_id as a strong tagged_uuid type
So it can be distinguished from other uuid-based
identifiers in the system.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11276
2022-08-12 06:01:44 +03:00
Avi Kivity
46bd0b1e62 consistency_level: accept effective_replication_map as parameter, rather than keyspace
A keyspace is a mutable object that can change from time to time. An
effective_replication_map captures the state of a keyspace at a point in
time and can therefore be consistent (with care from the caller).

Change consistency_level's functions to accept an effective_replication_map.
This allows the caller to ensure that separate calls use the same
information and are consistent with each other.

Current callers are likely correct since they are called from one
continuation, but it's better to be sure.
2022-08-11 17:58:42 +03:00
Avi Kivity
1078d1bfda consistency_level: be more const when using replication_strategy
We don't modify the replication_strategy here, so use const. This
will help when the object we get is const itself, as it will be in
the next patches.
2022-08-11 17:58:42 +03:00
Wojciech Mitros
48bd752971 cql3: enable collections as UDA accumulators
Currently, the initial values of UDA accumulators are converted
to strings using the to_string() method and from strings using the
from_string() method. The from_string() method is not implemented
for collections, and it can't be implemented without changing the
string format, because in that format, we cannot differentiate
whether a separator is a part of a value or is an actual separator
between values. In particular, the separators are not escaped
in the collection values. For example, a list with string elements:
'a, b', 'c' would be represented as a string 'a, b, c', while now
it is represented as "['a, b', 'c']".
Some types that were parsable are now represented in a different
way. For example, a tuple ('a', null, 0) was represented as
"a:\@:0", and now it is "('a', null, 0)".

Instead of from_string()/to_string() the cql parser is used
for creating a value from a string (the same , and to_parsable_string()
is used to converting a value into a string.

A test using a list as an accumulator is added to
cql-pytest/test_uda.py.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2022-08-11 16:23:57 +02:00
Calle Wilund
a729c2438e commitlog: Make get_segments_to_replay on-demand
Refs #11237

Don't store segments found on init scan in all shard instances,
instead retrieve (based on low time-pos for current gen) when
required. This changes very little, but we at last don't store
pointless string lists in shards 1 to X, and also we can potentially
ask for the list twice. More to the point, goes better hand-in-hand
with the semantics of "delete_segments", where any file sent in is
considered candidate for recycling, and included in footprint.
2022-08-11 06:41:23 +00:00
Calle Wilund
8116c56807 commitlog: Revert/modify fac2bc4 - do footprint add in delete
Fixes #11184
Fixes #11237

In prev (broken) fix for #11184 we added the footprint for left-over
files (replay candidates) to disk footprint on commitlog init.
This effectively prevents us from creating segments iff we have tight
limits. Since we nowadays do quite a bit of inserts _before_ commitlog
replay (system.local, but...) we can end up in a situation where we
deadlock start because we cannot get to the actual replay that will
eventually free things.
Another, not thought through, consequence is that we add a single
footprint to _all_ commitlog shard instances - even though only
shard 0 will get to actually replay + delete (i.e. drop footprint).
So shards 1-X would all be either locked out or performance degraded.

Simplest fix is to add the footprint in delete call instead. This will
lock out segment creation until delete call is done, but this is fast.
Also ensures that only replay shard is involved.
2022-08-10 08:04:03 +00:00
Botond Dénes
7730419f5c query-result-writer: stop when tombstone-limit is reached
The query result writer now counts tombstones and cuts the page (marking
it as a short one) when the tombstone limit is reached. This is to avoid
timing out on large span of tombstones, especially prefixes.
In the case of unpaged queries, we fail the read instead, similarly to
how we do with max result size.
If the limit is 0, the previous behaviour is used: tombstones are not
taken into consideration at all.
2022-08-10 06:03:38 +03:00
Botond Dénes
d1d53f1b84 query: add tombstone-limit to read-command
Propagate the tombstone-limit from coordinator to replicas, to make sure
all is using the same limit.
2022-08-10 06:01:47 +03:00
Botond Dénes
33f0447ba0 db/config: add config item for query tombstone limit
This will be the value used to break pages, after processing the
specified amount of tombstones. The page will be cut even if empty.
We could maybe use the already existing tombstone_{warn,fail}_threshold
instead and use them as a soft/hard limit pair, like we did with page
sizes.
2022-08-09 10:00:40 +03:00
Benny Halevy
2b017ce285 schema, everywhere: define and use table_schema_version as a strong type
Define table_schema_version as a distinct tagged_uuid class,
So it can be differentiated from other uuid-class types,
in particular table_id.

Added reversed(table_schema_version) for convenience
and uniformity since the same logic is currently open coded
in several places.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:45 +03:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00
Benny Halevy
6e77ad9392 system_keyspace: get_truncation_record: delete unused lambda capture
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:02:28 +03:00
Benny Halevy
1fda686f96 idl: make idl headers self-sufficient
Add include statements to satisfy dependencies.

Delete, now unneeded, include directives from the upper level
source files.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:02:27 +03:00
Benny Halevy
cfc7e9aa59 db: hints: sync_point: do not include idl definition file
idl definition files are not intended for direct
inclusion in .cc files.

Data types it represents are supposed to be defined
in regular C++ header, so define them in db/hints/scyn_point.hh
and include it rather then idl/hinted_handoff.idl.hh.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:02:27 +03:00
Benny Halevy
82fa205723 db/per_partition_rate_limit: tidy up headers self-sufficiency
include what's needed where needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:02:27 +03:00
Benny Halevy
37b7a9cce2 utils: get rid of joinpoint
Now that it is no longer used.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
56f336d1aa database: get rid of timestamp_func
Pass an optional truncated_at time_point to
truncate_table_on_all_shards instead of the over-complicated
timestamp_func that returns the same time_point on all shards
anyhow, and was only used for coordination across shards.

Since now we synchronize the internal execution phase in
truncate_table_on_all_shards, there is no longer need
for this timestamp_func.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
d96b56fee2 database: rename {flush,snapshot}_on_all and make static
Follow the convention of drop_table_on_all_shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
5e8c05f1a8 database: drop_table_on_all_shards: do not accept a truncated_at
timestamp_func

Since in the drop_table case we want to discard ALL
sstables in the table, not only those with `max_data_age()`
up until drop started.

Fixes #11232

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:52:51 +03:00
Botond Dénes
fbbe2529c1 Merge "Remove global snitch usage from consistency_level.cc" from Pavel Emelyanov
"
There are several helpers in this .cc file that need to get datacenter
for endpoints. For it they use global snitch, because there's no other
place out there to get that data from.

The whole dc/rack info is now moving to topology, so this set patches
the consistency_level.cc to get the topology. This is done two ways.
First, the helpers that have keyspace at hand may get the topology via
ks's effective_replication_map.

Two difficult cases are db::is_local() and db.count_local_endpoints()
because both have just inet_address at hand. Those are patched to be
methods of topology itself and all their callers already mess with
token metadata and can get topology from it.
"

* 'br-consistency-level-over-topology' of https://github.com/xemul/scylla:
  consistency_level: Remove is_local() and count_local_endpoints()
  storage_proxy: Use topology::local_endpoints_count()
  storage_proxy: Use proxy's topology for DC checks
  storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver
  storage_proxy: Use topology local_dc_filter in its methods
  storage_proxy: Mark some digest_read_resolver methods private
  forwarding_service: Use topology local_dc_filter
  storage_service: Use topology local_dc_filter
  consistency_level: Use topology local_dc_filter
  consitency-level: Call count_local_endpoints from topology
  consistency_level: Get datacenter from topology
  replication_strategy: Remove hold snitch reference
  effective_replication_map: Get datacenter from topology
  topology: Add local-dc detection shugar
2022-08-05 13:31:55 +03:00
Pavel Emelyanov
c3718b7a6e consistency_level: Remove is_local() and count_local_endpoints()
No code uses them now -- switched to use topology -- so thse two can be
dropped together with their calls for global snitch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:48 +03:00
Pavel Emelyanov
0da8caba1d consistency_level: Use topology local_dc_filter
The filter_for_query() helper has keyspace at hand

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:47 +03:00
Pavel Emelyanov
de58b33eee consitency-level: Call count_local_endpoints from topology
Similar to previous patch, in those places with keyspace object at
hand the topology can be obtained from ks' replication map

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:47 +03:00
Pavel Emelyanov
f84ee8f0fb consistency_level: Get datacenter from topology
In some of db/consistency_level.cc helpers the topology can be
obtained from keyspace's effective replication map

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:47 +03:00
Calle Wilund
fac2bc41ba commitlog: Include "segments_to_replay" in initial footprint
Fixes #11184

Not including it here can cause our estimate of "delete or not" after replay
to be skewed in favour of retaining segments as (new) recycles (or even flip
a counter), and if we have repeated crash+restarts we could be accumulating
an effectivly ever increasing segment footprint

Closes #11205
2022-08-05 12:16:53 +03:00
Pavel Emelyanov
527b345079 Merge 'storage_proxy: introduce a remote "subservice"' from Kamil Braun
Introduce a `remote` class that handles all remote communication in `storage_proxy`: sending and receiving RPCs, checking the state of other nodes by accessing the gossiper, and fetching schema.

The `remote` object lives inside `storage_proxy` and right now it's initialized and destroyed together with `storage_proxy`.

The long game here is to split the initialization of `storage_proxy` into two steps:
- the first step, which constructs `storage_proxy`, initializes it "locally" and does not require references to `messaging_service` and `gossiper`.
- the second step will take those references and add the `remote` part to `storage_proxy`.

This will allow us to remove some cycles from the service (de)initialization order and in general clean it up a bit. We'll be able to start `storage_proxy` right after the `database` (without messaging/gossiper). Similar refactors are planned for `query_processor`.

Closes #11088

* github.com:scylladb/scylladb:
  service: storage_proxy: pass `migration_manager*` to `init_messaging_service`
  service: storage_proxy: `remote`: make `_gossiper` a const reference
  gms: gossiper: mark some member functions const
  db: consistency_level: `filter_for_query`: take `const gossiper&`
  replica: table: `get_hit_rate`: take `const gossiper&`
  gms: gossiper: move `endpoint_filter` to `storage_proxy` module
  service: storage_proxy: pass `shared_ptr<gossiper>` to `start_hints_manager`
  service: storage_proxy: establish private section in `remote`
  service: storage_proxy: remove `migration_manager` pointer
  service: storage_proxy: remove calls to `storage_proxy::remote()` from `remote`
  service: storage_proxy: remove `_gossiper` field
  alternator: ttl: pass `gossiper&` to `expiration_service`
  service: storage_proxy: move `truncate_blocking` implementation to `remote`
  service: storage_proxy: introduce `is_alive` helper
  service: storage_proxy: remove `_messaging` reference
  service: storage_proxy: move `connection_dropped` to `remote`
  service: storage_proxy: make `encode_replica_exception_for_rpc` a static function
  service: storage_proxy: move `handle_write` to `remote`
  service: storage_proxy: move `handle_paxos_prune` to `remote`
  service: storage_proxy: move `handle_paxos_accept` to `remote`
  service: storage_proxy: move `handle_paxos_prepare` to `remote`
  service: storage_proxy: move `handle_truncate` to `remote`
  service: storage_proxy: move `handle_read_digest` to `remote`
  service: storage_proxy: move `handle_read_mutation_data` to `remote`
  service: storage_proxy: move `handle_read_data` to `remote`
  service: storage_proxy: move `handle_mutation_failed` to `remote`
  service: storage_proxy: move `handle_mutation_done` to `remote`
  service: storage_proxy: move `handle_paxos_learn` to `remote`
  service: storage_proxy: move `receive_mutation_handler` to `remote`
  service: storage_proxy: move `handle_counter_mutation` to `remote`
  service: storage_proxy: remove `get_local_shared_storage_proxy`
  service: storage_proxy: (de)register RPC handlers in `remote`
  service: storage_proxy: introduce `remote`
2022-08-04 17:50:20 +03:00
Kamil Braun
a9fd156a1b db: consistency_level: filter_for_query: take const gossiper& 2022-08-04 12:19:38 +02:00
Botond Dénes
df203a48af Merge "Remove reconnectable_snitch_helper" from Pavel Emelyanov
"
The helper is in charge of receiving INTERNAL_IP app state from
gossiper join/change notifications, updating system.peers with it
and kicking messaging service to update its preferred ip cache
along with initiating clients reconnection.

Effectively this helper duplicates the topology tracking code in
storage-service notifiers. Removing it makes less code and drops
a bunch of unwanted cross-components dependencies, in particular:

- one qctx call is gone
- snitch (almost) no longer needs to get messaging from gossiper
- public:private IP cache becomes local to messaging and can be
  moved to topology at low cost

Some nice minor side effect -- this helper was left unsubscribed
from gossiper on stop and snitch rename. Now its all gone.
"

* 'br-remove-reconnectible-snitch-helper-2' of https://github.com/xemul/scylla:
  snitch: Remove reconnectable snitch helper
  snitch, storage_service: Move reconnect to internal_ip kick
  snitch, storage_service: Move system.peers preferred_ip update
  snitch: Export prefer-local
2022-08-04 13:06:05 +03:00
Piotr Dulikowski
4f2adc14de db/system_keyspace: fix indentation after previous patch 2022-08-03 13:19:19 +02:00
Piotr Dulikowski
eff8a6368c db/system_keyspace: in system.local, use broadcast_rpc_address in rpc_address column
Previously, the `system.local`'s `rpc_address` column kept local node's
`rpc_address` from the scylla.yaml configuration. Although it sounds
like it makes sense, there are a few reasons to change it to the value
of scylla.yaml's `broadcast_rpc_address`:

- The `broadcast_rpc_address` is the address that the drivers are
  supposed to connect to. `rpc_address` is the address that the node
  binds to - it can be set for example to 0.0.0.0 so that Scylla listens
  on all addresses, however this gives no useful information to the
  driver.
- The `system.peers` table also has the `rpc_address` column and it
  already keeps other nodes' `broadcast_rpc_address`es.
- Cassandra is going to do the same change in the upcoming version 4.1.

Fixes: #11201
2022-08-03 13:19:03 +02:00
Avi Kivity
268e4abe77 Merge 'wasm: reuse instances for wasm UDFs' from Wojciech Mitros
Calling WebAssembly UDFs requires wasmtime instance. Creating such an instance is expensive,
but these instances can be reused for subsequent calls of the same UDF on various inputs.

This patch introduces a way of reusing wasmtime instances: a wasm instance cache.
The cache stores a wasmtime instance for each UDF and scheduling group. The instances are
evicted using LRU strategy and their size is based on the size of their wasm memories.

The instances stored in the cache are also dropped when the UDF is dropped itself. For that reason,
the first patch modifies the current implementation of UDF dropping, so that the instance dropping may be added
later. The patch also removes the need of compiling the UDF again when dropping it.

The second patch contains the implementation and use of the new cache. The cache is implemented
in `lang/wasm_instance_cache.hh` and the main ways of using it are the `run_script` methods from `wasm.hh`

The third patch adds tests to `test_wasm.py` that check the correctness and performance of the new
cache. The tests confirm the instance reuse, size limits, instance eviction after timeout and after dropping the UDF.

Closes #10306

* github.com:scylladb/scylladb:
  wasm: test instances reuse
  wasm: reuse UDF instances
  schema_tables: simplify merge_functions and avoid extra compilation
2022-08-02 13:51:16 +03:00