Compare commits

...

2800 Commits

Author SHA1 Message Date
Kefu Chai
2583a025fc s3/test: collect log on exit
the temporary directory holding the log file collecting the scylla
subprocess's output is specified by the test itself, and it is
`test_tempdir`. but unfortunately, cql-pytest/run.py is not aware
of this. so `cleanup_all()` is not able to print out the logging
messages at exit. as, please note, cql-pytest/run.py always
collect "log" file under the directory created using `pid_to_dir()`
where pid is the spawned subprocesses. but `object_store/run` uses
the main process's pid for its reusable tempdir.

so, with this change, we also register a cleanup func to printout
the logging message when the test exits.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13647
2023-04-24 13:53:25 +03:00
Pavel Emelyanov
28a01c9e60 Merge 'test: object_store: fix various pylint warnings' from Kefu Chai
when reading this source code, there are a handful issues reported by my flycheck plugin. none of them is critical, but better off fixing them.

Closes #13612

* github.com:scylladb/scylladb:
  test: object_store: specify timeout
  test: object_store: s/exit/sys.exit/
  test: object_store: do not declare a global variable for read
  test: object_store: remove unused imports
2023-04-24 13:45:01 +03:00
Benny Halevy
87d9c4d7f8 sstables: filesystem_storage::change_state: simplify log message
When moving to the base directory, the printout currently looks broken:
```
INFO  2023-04-16 09:15:58,631 [shard 0] sstable - Moving sstable .../data/ks/cf-4c1bb670dc3711ed96733daf102e4aab/upload/md-1-big-Data.db to  in ".../data/ks/cf-4c1bb670dc3711ed96733daf102e4aab/"
```

Since `path` already contains `to`, the message can be just simplified
and `to` need not be printed explicitly.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13525
2023-04-24 13:43:48 +03:00
Kefu Chai
4f21755c98 timeout_config: correct the misconfigured {truncate, other}_timeout
this change fixes the regression introduced by
ebf5e138e8, which

* initialized `truncate_timeout_in_ms` with
  `counter_write_request_timeout_in_ms`,
* returns `cas_timeout_in_ms` in the place of
  `other_timeout_in_ms`.

in this change, these two misconfigurations are fixed.

Fixes #13633
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13639
2023-04-24 12:26:14 +03:00
Kefu Chai
2c91728d8a auth: do not include unused header
in 5a9b4c02e3, the iostream based
formatter was dropped, there is no need to include `<iostream>`
or `<iosfwd>` in these source files anymore.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13643
2023-04-24 12:24:29 +03:00
Kefu Chai
642854f36f test: s/os.P_NOWAIT/os.WNOHANG/
`os.P_NOWAIT` is supposed to be used in spawn calls, while `os.WNOHANG`
is used as in the options parameter passed to wait calls. fortunately,
`P_NOWAIT` is defined as "1" in CPython, and `os.WNOHANG` is defined
as "1" in linux kernel. that's why the existing implementation works.

but we should not rely on this coincidence. so, in this change,
`os.P_NOWAIT` is replaced with `os.WNOHANG` for correctness and for
better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13646
2023-04-24 11:42:34 +03:00
Kefu Chai
a573a89128 keys: print "non-utf8-key" when clustering_key is not UTF-8
before this change we do not check if the clustering_key to be formatted
is UTF-8 encoded before printing it. but we do perform the validation
when printing paritition_keys. since the clustering_key is not different
from partition_key when it comes to encoding, actually they are
different parts of a parimary key. so let's validate the encoding of
clustering_key as well, when formatting it. this change is a follow-up
of 85b21ba049.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13641
2023-04-24 10:40:23 +03:00
Botond Dénes
864d27f9af Merge 'clear_gently: handle null unique_ptr and optional values' from Benny Halevy
This series adds handling of null std::unique_ptr to utils::clear_gently
and handling of std::optional and seastar::optimized_optional (both engaged and disengaged cases).

Also, unit tests were added to tests the above cases.

Fixes #13636

Closes #13638

* github.com:scylladb/scylladb:
  utils: clear_gently: add variants for optional values
  utils: clear_gently: do not clear null unique_ptr
2023-04-24 10:27:32 +03:00
Kefu Chai
c06b20431e cdc: generation: use default-generated operator==
now that C++20 generates operator== for us, these is no need to
handcraft it manually. also, in C++17, the standard library offers
default implementation of operator== for `std::variant<>`, so no need
to implement it by ourselves.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13625
2023-04-24 10:13:28 +03:00
Botond Dénes
2d8d8043be Merge 'Coroutinize system_keyspace::get_compaction_history' from Pavel Emelyanov
Closes #13620

* github.com:scylladb/scylladb:
  system_keyspace: Fix indentation after previous patch
  system_keyspace: Coroutinize get_compaction_history()
2023-04-24 09:48:01 +03:00
Botond Dénes
9e757d9c6d Merge 'De-globalize storage proxy' from Pavel Emelyanov
All users of global proxy are gone (*), proxy can be made fully main/cql_test_env local.

(*) one test case still needs it, but can get it via cql_test_env

Closes #13616

* github.com:scylladb/scylladb:
  code: Remove global proxy
  schema_change_test: Use proxy from cql_test_env
  test: Carry proxy reference on cql_test_env
2023-04-24 09:38:00 +03:00
Botond Dénes
1750bb34b7 Merge 'sstables, replica: add generation generator' from Kefu Chai
this is the first step to the uuid-based generation identifier. the goal is to encapsulate the generation related logic in generator, so its consumers do not have to understand the difference between the int64_t based generation and UUID v1 based generation.

this commit should not change the behavior of existing scylla. it just allows us to derive from `generation_generator` so we can have another generator which generates UUID based generation identifier.

Closes #13073

* github.com:scylladb/scylladb:
  replica, test: create generation id using generator
  sstables: add generation_generator
  test: sstables: use generate_n for generating ids for testing
2023-04-24 09:31:08 +03:00
Botond Dénes
85abece927 Merge 'Restrict logging of current_backtrace to log_level' from Benny Halevy
`seastar::current_backtrace()` can be quite heavey.
When we pass it to a log message in relatively detailed log_level
(debug/trace), we pay the price of `current_backtrace` every time,
but we rarely print the message.

Closes #13527

* github.com:scylladb/scylladb:
  locator/topology: call seastar::current_backtrace only when log_level is enabled
  schema_tables: call seastar::current_backtrace only when log_level is enabled
2023-04-24 08:50:32 +03:00
Botond Dénes
7f04d8231d Merge 'gms: define and use generation and version types' from Benny Halevy
This series cleans up the generation and value types used in gms / gossiper.
Currently we use a blend of int, int32_t, and int64_t around messaging.
This change defines gms::generation_type and gms::version_type as int32_t
and add check in non-release modes that the respective int64 value passed over messaging do not overflow 32 bits.

Closes #12966

* github.com:scylladb/scylladb:
  gossiper: version_generator: add {debug_,}validate_gossip_generation
  gms: gossip_digest: use generation_type and version_type
  gms: heart_beat_state: use generation_type and version_type
  gms: versioned_value: use version_type
  gms: version_generator: define version_type and generation_type strong types
  utils: move generation-number to gms
  utils: add tagged_integer
  gms: versioned_value: make members private
  scylla-gdb: add get_gms_versioned_value
  gms: versioned_value: delete unused compare_to function
  gms: gossip_digest: delete unused compare_to function
2023-04-24 08:44:48 +03:00
Maxim Korolyov
002bdd7ae7 doc: add jaeger integration docs
Closes #13490
2023-04-24 08:26:53 +03:00
Chang Chen Chien
c25a718008 docs: fix typo in using-scylla/local-secondary-indexes.rst
Closes #13607
2023-04-24 06:56:19 +03:00
Benny Halevy
002865018f utils: clear_gently: add variants for optional values
Implement clear_gently for std:;optional<T>
and seastar::optimized_optional<T> and respective
unit tests.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 21:34:02 +03:00
Benny Halevy
12877ad026 utils: clear_gently: do not clear null unique_ptr
Otherwise the null pointer is dereferenced.

Add a unit test reproducing the issue
and testing this fix.

Fixes #13636

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 21:33:11 +03:00
Pavel Emelyanov
5e201b9120 database: Remove compaction_manager.hh inclusion into database.hh
The only reason why it's there (right next to compaction_fwd.hh) is
because the database::table_truncate_state subclass needs the definition
of compaction_manager::compaction_reenabler subclass.

However, the former sub is not used outside of database.cc and can be
defined in .cc. Keeping it outside of the header allows dropping the
compaction_manager.hh from database.hh thus greatly reducing its fanout
over the code (from ~180 indirect inclusions down to ~20).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13622
2023-04-23 16:27:11 +03:00
Benny Halevy
5520d3a8e3 gossiper: version_generator: add {debug_,}validate_gossip_generation
Make sure that the int64_t generation we get over rpc
fits in the int32_t generation_type we keep locally.

Restrict this assertion to non-release builds.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:48:01 +03:00
Benny Halevy
5dc7b7811c gms: gossip_digest: use generation_type and version_type
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:48:01 +03:00
Benny Halevy
4cdad8bc8b gms: heart_beat_state: use generation_type and version_type
Define default constructor as heart_beat_state(gms::generation_type(0))

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:48:01 +03:00
Benny Halevy
b638571cb0 gms: versioned_value: use version_type
Adjust scylla-gdb.get_gms_version_value
to get the versioned_value version as version_type
(utils::tagged_integer).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:48:01 +03:00
Benny Halevy
2d20ee7d61 gms: version_generator: define version_type and generation_type strong types
Derived from utils::tagged_integer, using different tags,
the types are incompatible with each other and require explicit
typecasting to- and from- their value type.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:47:17 +03:00
Benny Halevy
d1817e9e1b utils: move generation-number to gms
Although get_generation_number implementation is
completely generic, it is used exclusively to seed
the gossip generation number.

Following patches will define a strong gms::generation_id
type and this function should return it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:37:32 +03:00
Benny Halevy
f5f566bdd8 utils: add tagged_integer
A generic template for defining strongly typed
integer types.

Use it here to replace raft::internal::tagged_uint64.
Will be used for defining gms generation and version
as strong and distinguishable types in following patches.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:37:32 +03:00
Benny Halevy
c5d819ce60 gms: versioned_value: make members private
and provide accessor functions to get them.

1. So they can't be modified by mistake, as the versioned value is
   immutable. A new value must have a higher version.
2. Before making the version a strong gms::version_type.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:37:32 +03:00
Benny Halevy
5aaec73612 scylla-gdb: add get_gms_versioned_value
Prepare for next patch that makes gms::versioned_value
members private, and provides methods by the same name
as the current members.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:37:32 +03:00
Benny Halevy
44a8db016a gms: versioned_value: delete unused compare_to function
Not only it is unused, it is wrong since
it doesn't compare the value, only its version.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:37:32 +03:00
Benny Halevy
59e771be5c gms: gossip_digest: delete unused compare_to function
Not only it is unused, it is wrong since
it doesn't compare the digest endpoint member.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:37:32 +03:00
Kefu Chai
c2488fc516 test: object_store: specify timeout
just in case scylla does not behave as expected, so we can identify the
issue and error out sooner without hang forever until the whole test
timesout. this issue was identified by pylint,
see https://pylint.readthedocs.io/en/latest/user_guide/messages/warning/missing-timeout.html

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-22 00:38:37 +08:00
Tomasz Grabiec
bd0b299322 Merge 'Manage CDC generations when bootstrapping nodes using Raft Group 0 topology coordinator' from Kamil Braun
Introduce a new table `CDC_GENERATIONS_V3` (`system.cdc_generations_v3`).
The table schema is a copy-paste of the `CDC_GENERATIONS_V2` schema. The
difference is that V2 lives in `system_distributed_keyspace` and writes to it
are distributed using regular `storage_proxy` replication mechanisms based on
the token ring.  The V3 table lives in `system_keyspace` and any mutations
written to it will go through group 0.

Extend the `TOPOLOGY` schema with new columns:
- `new_cdc_generation_data_uuid` will be stored as part of a bootstrapping
  node's `ring_slice`, it stores UUID of a newly introduced CDC
  generation which is used as partition key for the `CDC_GENERATIONS_V3`
  table to access this new generation's data. It's a regular column,
  meaning that every row (corresponding to a node) will have its own.
- `current_cdc_generation_uuid` and `current_cdc_generation_timestamp`
  together form the ID of the newest CDC generation in the cluster.
  (the uuid is the data key for `CDC_GENERATIONS_V3`, the timestamp is
  when the CDC generation starts operating). Those are static columns
  since there's a single newest CDC generation.

When topology coordinator handles a request for node to join, calculate a new
CDC generation using the bootstrapping node's tokens, translate it to mutation
format, and insert this mutation to the CDC_GENERATIONS_V3 table through group 0
at the same time we assign tokens to the node in Raft topology. The partition
key for this data is stored in the bootstrapping node's `ring_slice`.

After inserting new CDC generation data , we need to pick a timestamp for this
generation and commit it, telling all nodes in the cluster to start using the
generation for CDC log writes once their clocks cross that timestamp.

We introduce a separate step to the bootstrap saga, before
`write_both_read_old`, called `commit_cdc_generation`. In this step, the
coordinator takes the `new_cdc_generation_data_uuid` stored in a bootstrapping
node's `ring_slice` - which serves as the key to the table where the CDC
generation data is stored - and combines it with a timestamp which it generates
a bit into the future (as in old gossiper-based code, we use 2 * ring_delay, by
default 1 minute). This gives us a CDC generation ID which we commit into the
topology state as the `current_cdc_generation_id` while switching the saga to
the next step, `write_both_read_old`.

Once a new CDC generation is committed to the cluster by the topology
coordinator, we also need to publish it to the user-facing description tables so
CDC applications know which streams to read from.

This uses regular distributed table writes underneath (tables living in the
`system_distributed` keyspace) so it requires `token_metadata` to be nonempty.
We need a hack for the case of bootstrapping the first node in the cluster -
turning the tokens into normal tokens earlier in the procedure in
`token_metadata`, but this is fine for the single-node case since no streaming
is happening.

When a node notices that a new CDC generation was introduced in
`storage_service::topology_state_load`, it updates its internal data structures
that are used when coordinating writes to CDC log tables.

We include the current CDC generation data in topology snapshot transfers.

Some fixes and refactors included.

Closes #13385

* github.com:scylladb/scylladb:
  docs: cdc: describe generation changes using group 0 topology coordinator
  cdc: generation_service: add a FIXME
  cdc: generation_service: add legacy_ prefix for gossiper-based functions
  storage_service: include current CDC generation data in topology snapshots
  db: system_keyspace: introduce `query_mutations` with range/slice
  storage_service: hold group 0 apply mutex when reading topology snapshot
  service: raft_group0_client: introduce `hold_read_apply_mutex`
  storage_service: use CDC generations introduced by Raft topology
  raft topology: publish new CDC generation to the user description tables
  raft topology: commit a new CDC generation on node bootstrap
  raft topology: create new CDC generation data during node bootstrap
  service: topology_state_machine: make topology::find const
  db: system_keyspace: small refactor of `load_topology_state`
  cdc: generation: extract pure parts of `make_new_generation` outside
  db: system_keyspace: add storage for CDC generations managed by group 0
  service: topology_state_machine: better error checking for state name (de)serialization
  service: raft: plumbing `cdc::generation_service&`
  cdc: generation: `get_cdc_generation_mutations`: take timestamp as parameter
  cdc: generation: make `topology_description_generator::get_sharding_info` a parameter
  sys_dist_ks: make `get_cdc_generation_mutations` public
  sys_dist_ks: move find_schema outside `get_cdc_generation_mutations`
  sys_dist_ks: move mutation size threshold calculation outside `get_cdc_generation_mutations`
  service/raft: group0_state_machine: signal topology state machine in `load_snapshot`
2023-04-21 18:11:27 +02:00
Kefu Chai
f85da1bd30 test: object_store: s/exit/sys.exit/
the former is expected to be used in an interactive session, not
in an application.

see also:
https://docs.python.org/3/library/constants.html#constants-added-by-the-site-module
and
https://docs.python.org/3/library/sys.html#sys.exit

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-21 23:25:59 +08:00
Kefu Chai
c7b62fbf81 test: object_store: do not declare a global variable for read
we only need to declare a variable with `global` when we need to
write to it, but if we just want to read it, there is no need to
declare it. because the way how python looks up for a variable
when reading from it enables python to find the global variables
(and apparently the functions!). but when we assign a variable in
python, the interpreter would have to tell in which scope the
variable lives. by default the local scope is used, and a new
variable is added to `locals()`.

but in this case, we just read from it. so no need to add the
`global` statement.

see also https://docs.python.org/3/reference/simple_stmts.html#global

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-21 23:25:59 +08:00
Kefu Chai
4989a59a0b test: object_store: remove unused imports
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-21 23:25:59 +08:00
Pavel Emelyanov
2aabaada9e system_keyspace: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-21 17:32:57 +03:00
Pavel Emelyanov
6290849f11 system_keyspace: Coroutinize get_compaction_history()
In order not to copy the rvalue consumer arg -- instantly convert it
into value. No other tricks.
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-21 17:32:02 +03:00
Kefu Chai
576adbdbc5 replica, test: create generation id using generator
reuse generation_generator for generating generation identifiers for
less repeatings. also, add allow update generator to update its
lastest known generation id.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-21 22:02:30 +08:00
Kefu Chai
6e82aa42d5 sstables: add generation_generator
to prepare for the uuid-based generation identifier, where we
will generate uuid-based generation idenfier if corresponding
option is enabled, otherwise an integer based id. to reduce the
repeatings, generation_generator is extracted out so it can be reused.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-21 21:51:13 +08:00
Anna Stuchlik
a68b976c91 doc: document tombstone_gc as not experimental
The tombstone_gc was documented as experimental in version 5.0.
It is no longer experimental in version 5.2.
This commit updates the information about the option.

Closes #13469
2023-04-21 14:43:25 +02:00
Botond Dénes
fcd7f6ac5f Update tools/java submodule
* tools/java c9be8583...eb3c43f8 (1):
  > Use EstimatedHistogram in metricPercentilesAsArray
2023-04-21 14:31:38 +03:00
Kefu Chai
a2aa133822 treewide: use std::lexicographical_compare_threeway
this the standard library offers
`std::lexicographical_compare_threeway()`, and we never uses the
last two addition parameters which are not provided by
`std::lexicographical_compare_threeway()`. there is no need to have
the homebrew version of trichotomic compare function.

in this change,

* all occurrences of `lexicographical_tri_compare()` are replaced
  with `std::lexicographical_compare_threeway()`.
* ``lexicographical_tri_compare()` is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13615
2023-04-21 14:28:18 +03:00
Kefu Chai
51fc0bc698 sstables: use default generated operator==
C++20 compiler is able to generate defaulted operator== and
operator!=. and the default generated operators behaves exactly
the same as the ones crafted by us. so let's it do its job.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13614
2023-04-21 14:25:39 +03:00
Pavel Emelyanov
739455c3aa code: Remove global proxy
No code needs global proxy anymore. Keep on-stack values in main and
cql_test_env and keep the pointer on debug:: namespace.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-21 14:18:59 +03:00
Pavel Emelyanov
f953fb2f52 schema_change_test: Use proxy from cql_test_env
There's one place where test case calls for storage proxy and currently
does it via global refernece. Time to switch it to cql_test_env's one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-21 14:18:00 +03:00
Pavel Emelyanov
681a19f54c test: Carry proxy reference on cql_test_env
All sharded<> services are created by cql_test_env on the stack. The
cql_test_env() is then used to keep references on some of them and to
export them to test cases via its methods. Proxy is missing on that
exportable list, but will be needed, so add one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-21 14:16:54 +03:00
Botond Dénes
10c1f1dc80 Merge 'db: system_keyspace: use microsecond resolution for group0_history range tombstone' from Kamil Braun
in `make_group0_history_state_id_mutation`, when adding a new entry to
the group 0 history table, if the parameter `gc_older_than` is engaged,
we create a range tombstone in the mutation which deletes entries older
than the new one by `gc_older_than`. In particular if
`gc_older_than = 0`, we want to delete all older entries.

There was a subtle bug there: we were using millisecond resolution when
generating the tombstone, while the provided state IDs used microsecond
resolution. On a super fast machine it could happen that we managed to
perform two schema changes in a single millisecond; this happened
sometimes in `group0_test.test_group0_history_clearing_old_entries`
on our new CI/promotion machines, causing the test to fail because the
tombstone didn't clear the entry correspodning to the previous schema
change when performing the next schema change (since they happened in
the same millisecond).

Use microsecond resolution to fix that. The consecutive state IDs used
in group 0 mutations are guaranteed to be strictly monotonic at
microsecond resolution (see `generate_group0_state_id` in
service/raft/raft_group0_client.cc).

Fixes #13594

Closes #13604

* github.com:scylladb/scylladb:
  db: system_keyspace: use microsecond resolution for group0_history range tombstone
  utils: UUID_gen: accept decimicroseconds in min_time_UUID
2023-04-21 14:08:56 +03:00
Kamil Braun
55f43e532c Merge 'get rid of gms/failure_detector' from Benny Halevy
Move gms::arrival_window to api/failure_detector which is its only user.
and get rid of the rest, which is not used, now that we use direct_failure_detector instead.

TODO: integare direct_failure_detector with failure_detector api.

Closes #13576

* github.com:scylladb/scylladb:
  gms: get rid of unused failure_detector
  api: failure_detector: remove false dependency on failure_detector::arrival_window
  test: rest_api: add test_failure_detector
2023-04-21 11:47:44 +02:00
Kamil Braun
f7408130c9 Merge 'Fix topology management when raft-based topology is enabled' from Tomasz Grabiec
Fixes a problem when raft-based topology is enabled, which loads
topology from storage. It starts by clearing topology and then adding
nodes one by one. Before this patch, this violates internal invariant
of topology object which puts the local node as the first node. This
would manifest by triggering an assert in topology::pop_node() which
throws if popping the node at index 0 in order to keep the information
about local node around. This is normally prevented by a check in
topology::remove_node() which avoid calling pop_node() if removing the
local node. But since there is no node which is marked as local, this
check allows the first node to be popped.

To fix the problem I lift the invariant that local node is always in
_nodes. We still have information about local node in config. Instead
of keeping it in _nodes, we recognize it as part of indexing. We also
allow removing the local node like a regular node.

The path which reloads topology works correctly after this, the local
node will be recognized when (if) it is added to the topology.

Fixes #13495

Closes #13498

* github.com:scylladb/scylladb:
  locator: topology: Fix move assignment
  locator: topology: Add printer
  tests: topology: Test that topology clearing preserves information about local node
  locator: topology: Recognize local node as part of indexing it
  locator: topology: Fix get_location(ep) for local node
  locator: topology: Fix typo
  locator: topology: Preserve config when cloning
2023-04-21 11:45:08 +02:00
Alejo Sanchez
ce87aedd30 test: topology smp test with custom cluster
Instead of decommission of initial cluster, use custom cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13589
2023-04-21 10:43:54 +02:00
Kamil Braun
f9d8118c8d db: system_keyspace: use microsecond resolution for group0_history range tombstone
in `make_group0_history_state_id_mutation`, when adding a new entry to
the group 0 history table, if the parameter `gc_older_than` is engaged,
we create a range tombstone in the mutation which deletes entries older
than the new one by `gc_older_than`. In particular if
`gc_older_than = 0`, we want to delete all older entries.

There was a subtle bug there: we were using millisecond resolution when
generating the tombstone, while the provided state IDs used microsecond
resolution. On a super fast machine it could happen that we managed to
perform two schema changes in a single millisecond; this happened
sometimes in `group0_test.test_group0_history_clearing_old_entries`
on our new CI/promotion machines, causing the test to fail because the
tombstone didn't clear the entry correspodning to the previous schema
change when performing the next schema change (since they happened in
the same millisecond).

Use microsecond resolution to fix that. The consecutive state IDs used
in group 0 mutations are guaranteed to be strictly monotonic at
microsecond resolution (see `generate_group0_state_id` in
service/raft/raft_group0_client.cc).

Fixes #13594
2023-04-21 10:33:05 +02:00
Kamil Braun
218a056825 utils: UUID_gen: accept decimicroseconds in min_time_UUID
The function now accepts higher-resolution duration types, such as
microsecond resolution timestamps. Will be used by the next commit.
2023-04-21 10:33:02 +02:00
Kefu Chai
b0ef053552 test: sstables: use generate_n for generating ids for testing
so we don't need to keep a `prev_gen` around, this also prepares for
the coming change to use generation generator.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-21 15:45:16 +08:00
Kefu Chai
ca6ebbd1f0 cql3, db: sstable: specialize fmt::formatter<function_name>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `function_name` without the help of `operator<<`.

the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13608
2023-04-21 10:07:28 +03:00
Botond Dénes
d74f3598f4 Merge 'dht: specialize fmt::formatter<dht::token>' from Kefu Chai
this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `dht::token` without the help of `operator<<`.

the corresponding `operator<<()` is preserved in this change, as it has lots of users in this project, we will tackle them case-by-case in follow-up changes.

also, the forward declaration of `operator<<(ostream&, constdht::token&)` in `dht/i_partitioner.hh` is removed. ias it not necessary.

Refs https://github.com/scylladb/scylladb/issues/13245

Closes #13610

* github.com:scylladb/scylladb:
  dht: remove unnecessarily forward declaration
  dht: specialize fmt::formatter<dht::token>
2023-04-21 09:51:25 +03:00
Kefu Chai
c5fa1ac9f7 sstable: specialize fmt::formatter<component_type>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `component_type` without the help of `operator<<`.

the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now.

also, please note, to enable fmtlib to format `std::set<component_type>`
in `test/boost/sstable_3_x_test.cc` , we need to include
`<fmt/ranges.h>` in that source file.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13598
2023-04-21 09:49:24 +03:00
Kefu Chai
9215adee46 streaming: specialize fmt::formatter<stream_reason>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `stream_reason` without the help of `operator<<`.

please note, because we still cannot use the generic formatter for
std::unordered_map provided by fmtlib, so in order to drop `operator<<`
for `stream_reason`, and to print `unordered_map<stream_reason>`,
`fmt::join()` is used as a temporary solution. we will audit all
`fmt::join()` calls, after removing the homebrew formatter of
`std::unordered_map`.

the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13609
2023-04-21 09:44:23 +03:00
Kefu Chai
ecb5380638 treewide: s/boost::lexical_cast<std::string>/fmt::to_string()/
this change replaces all occurrences of `boost::lexical_cast<std::string>`
in the source tree with `fmt::to_string()`. for couple reasons:

* `boost::lexical_cast<std::string>` is longer than `fmt::to_string()`,
  so the latter is easier to parse and read.
* `boost::lexical_cast<std::string>` creates a stringstream under the
  hood, so it can use the `operator<<` to stringify the given object.
  but stringstream is known to be less performant than fmtlib.
* we are migrating to fmtlib based formatting, see #13245. so
  using `fmt::to_string()` helps us to remove yet another dependency
  on `operator<<`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13611
2023-04-21 09:43:53 +03:00
Benny Halevy
3f1ac846d8 gms: get rid of unused failure_detector
The legacy failure_detector is now unused and can be removed.

TODO: integare direct_failure_detector with failure_detector api.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-21 09:08:27 +03:00
Benny Halevy
d546b92685 api: failure_detector: remove false dependency on failure_detector::arrival_window
Up until 0ef33b71ba
get_endpoint_phi_values retrieved arrival samples
from gms::get_arrival_samples().  That function was
removed since it returned a constant ampty map.

This patch returns empty results without relying
on failure_detector::arrival_window, so the latter can
be retired altogether.

As Tomasz Grabiec <tgrabiec@scylladb.com> said:
> I don't think the logic of arrival_window belongs to api,
> it belongs to the failure detector. If there is no longers
> a failure detector, there should be no arrival_window.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-21 09:08:25 +03:00
Benny Halevy
35de60670c test: rest_api: add test_failure_detector
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-21 09:06:15 +03:00
Nadav Har'El
9c3907bb3c test/cql-pytest: reproducers for incorrect AVG of "decimal" type
This patch contains tests reproducing issue #13601 and the corresponding
Cassandra issue CASSANDRA-18470. These issues are about what the AVG
aggregation does for arbitrary-precision "decimal" numbers - the tests
we add here show examples where the current behavior doesn't make sense:

The problem is that "decimal" has arbitrary precision - so, should an
average of 1/3 be returned as 0.3 or 0.33333333333333333? This is not
specified, so Scylla (and Cassandra) decided to pick the result precision
based on the input precision. In particular, the average of 1 and 2
is returned as 2 (zero digits after the decimal point, like in the
inputs) instead of the expected 1.5. Arguably this isn't useful behavior.

The test adds a second test which fails on Cassandra, but does pass
on Scylla: Cassandra returns as the average of 1, 2, 2, 3 the integer 1
whereas the correct average is 2 (and Scylla returns it correctly).
The reason why this bug is even worse on Cassandra is that Scylla's AVG
only loses precision when dividing the sum and count, but Cassandra
tries to maintain only the average, and loses precision at every step.

Refs #13601

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13603
2023-04-21 08:32:30 +03:00
Kefu Chai
7b21bfd36e mutation: specialize fmt::formatter<apply_resume>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `apply_resume` without the help of `operator<<`.

the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13584
2023-04-21 08:27:57 +03:00
Benny Halevy
77b70dbdb7 sstables: compressed_file_data_source_impl: get: throw malformed_sstable_exception on premature eof
Currently, the reader might dereference a null pointer
if the input stream reaches eof prematurely,
and read_exactly returns an empty temporary_buffer.

Detect this condition before dereferencing the buffer
and sstables::malformed_sstable_exception.

Fixes #13599

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13600
2023-04-21 07:56:58 +03:00
Botond Dénes
d828cfcb23 Merge 'db, cql3: functions: switch argument passing to std::span' from Avi Kivity
Database functions currently receive their arguments as an std::vector. This
is inflexible (for example, one cannot use small_vector to reduce allocations).

This series adapts the function signature to accept parameters using std::span.
Some changes in the keys interface are needed to support this. Lastly, one call
site is migrated to small_vector.

This is in support of changing selectors to use expressions.

Closes #13581

* github.com:scylladb/scylladb:
  cql3: abstract_function_selector: use small_vector for argument buffer
  db, cql3: functions: pass function parameters as a span instead of a vector
  keys: change from_optional_exploded to accept a span instead of a vector
2023-04-21 06:49:07 +03:00
Kefu Chai
fe9f41bd84 dht: remove unnecessarily forward declaration
it turns out the declaration of `operator<<(ostream&, const
dht::token&)` is unnecessarily. so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-21 11:41:54 +08:00
Kefu Chai
53dedca8cd dht: specialize fmt::formatter<dht::token>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `dht::token` without the help of `operator<<`.

the corresponding `operator<<()` is preserved in this change, as it
has lots of users in this project, we will tackle them case-by-case in
follow-up changes.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-21 11:41:54 +08:00
Avi Kivity
0c64dd12b1 test: raft_server_test: fix string compare for clang 15
Clang 15 rejects string compares where the left-hand-side is a C
string, so help it along by converting it ourselves.

Closes #13582
2023-04-21 06:38:10 +03:00
Tomasz Grabiec
0ec700cd00 locator: topology: Fix move assignment
Defaulted assignment doesn't update node::_topology.
2023-04-20 23:39:18 +02:00
Tomasz Grabiec
6ed841b8d7 locator: topology: Add printer 2023-04-20 23:39:18 +02:00
Tomasz Grabiec
3dfd49fe62 tests: topology: Test that topology clearing preserves information about local node 2023-04-20 23:39:18 +02:00
Tomasz Grabiec
7d3384089a locator: topology: Recognize local node as part of indexing it
Fixes a problem when raft-based topology is enabled, which loads
topology from storage. It starts by clearing topology and then adding
nodes one by one. Before this patch, this violates internal invariant
of topology object which puts the local node as the first node. This
would manifest by triggering an assert in topology::pop_node() which
throws if popping the node at index 0 in order to keep the information
about local node around. This is normally prevented by a check in
topology::remove_node() which avoid calling pop_node() if removing the
local node. But since there is no node which is marked as local, this
check allows the first node to be popped.

To fix the problem I lift the invariant that local node is always in
_nodes. We still have information about local node in config. Instead
of keeping it in _nodes, we recognize it as part of indexing. We also
allow removing the local node like a regular node.

The path which reloads topology works correctly after this, the local
node will be recognized when (if) it is added to the topology.

Fixes #13495
2023-04-20 23:39:18 +02:00
Tomasz Grabiec
eb9d6df8bf locator: topology: Fix get_location(ep) for local node
topology config may designate a different node than
get_broadcast_address() as local node. In particular, some tests don't
designate any node as the local node, which leads to logic errors
where current get_location(ep) for ep which happens to have the
address 127.0.0.1 returns location of the first node in _nodes rather
than ep.

Fix by looking up in _nodes first and fall back to local node if it's
equal to configured local node (if any).
2023-04-20 23:39:18 +02:00
Tomasz Grabiec
0a675291dd locator: topology: Fix typo 2023-04-20 23:39:18 +02:00
Tomasz Grabiec
0b1dfb2683 locator: topology: Preserve config when cloning
Config is separate from state of the topology (nodes it
contains). Preserving the config will make it easier in later patches
to maintain invariants for cloned instances.
2023-04-20 23:39:18 +02:00
Botond Dénes
1426c623eb Merge 'Tune up S3 unit tests environment usage (and a bit more)' from Pavel Emelyanov
The tests in question are using MINIO_SERVER_ADDRESS environment variable to export minio server address from pylib to test cases. Also they use hard-coded public bucket name. Both plays badly with AWS S3, the former due to MINIO_... in its name and the latter because public bucket name can be any.

So this PR puts address and public bucket name into S3_..._FOR_TEST environment variables and fixes output stream closure on failure while at it.

Detached from #13493

Closes #13546

* github.com:scylladb/scylladb:
  s3/test: Rename MINIO_SERVER_ADDRESS environment variable
  s3/test: Keep public bucket name in environment
  s3/test: Fix upload stream closure
  test/lib: Add getenv_safe() helper
2023-04-20 18:01:12 +03:00
Kamil Braun
88aff50e8b docs: cdc: describe generation changes using group 0 topology coordinator
Update the `Generation switching` section: most of the existing
description landed in `Gossiper-based topology changes` subsection, and
a new subsection was added to describe Raft group 0 based topology
changes. Marked as WIP - we expect further development in this area
soon.

The existing gossiper-based description was also updated a bit.
2023-04-20 16:36:41 +02:00
Kamil Braun
1688001585 cdc: generation_service: add a FIXME 2023-04-20 16:36:41 +02:00
Kamil Braun
d13a0b1930 cdc: generation_service: add legacy_ prefix for gossiper-based functions
Most of the code in the service exists to handle gossiper-based topology
changes. Name the functions appropriately and add a note in the
comments.
2023-04-20 16:36:41 +02:00
Kamil Braun
8afb15700b storage_service: include current CDC generation data in topology snapshots
Note that we don't need to include earlier CDC generations, just the
current (i.e. latest) one.

We might observe a problem when nodes are being bootstrapped in quick
succession - I left a FIXME describing the problem and possible
solutions.
2023-04-20 16:36:41 +02:00
Kamil Braun
3d96bc5dba db: system_keyspace: introduce query_mutations with range/slice
There is a `query_mutations` function which loads the entire contents of
a given table into memory. There was no function for e.g. loading just a
single partition in the form of mutations. Introduce one.
2023-04-20 16:36:41 +02:00
Kamil Braun
3b26135227 storage_service: hold group 0 apply mutex when reading topology snapshot
This is a bugfix: we need to hold the mutex when loading topology data
from tables, otherwise they might be concurrently modified by
`group0_state_machine::apply` and the snapshot that we send won't make
any sense.

Also specify in comments that the lock must be held during
`topology_transition`, `topology_state_load`, `merge_topology_snapshot`.
2023-04-20 16:36:41 +02:00
Kamil Braun
f081de7cc5 service: raft_group0_client: introduce hold_read_apply_mutex
We'll use it in `storage_service` topology snapshot request handler.
2023-04-20 16:36:41 +02:00
Kamil Braun
4c99b4004b storage_service: use CDC generations introduced by Raft topology
When a node notices that a new CDC generation was introduced in
`storage_service::topology_state_load`, it updates its internal data
structures that are used when coordinating writes to CDC log tables.
2023-04-20 16:36:41 +02:00
Kamil Braun
5f2b297f99 raft topology: publish new CDC generation to the user description tables
Once a new CDC generation is committed to the cluster by the topology
coordinator, we also need to publish it to the user-facing description
tables so CDC applications know which streams to read from.

This uses regular distributed table writes underneath (tables living
in the `system_distributed` keyspace) so it requires `token_metadata`
to be nonempty. We need a hack for the case of bootstrapping the
first node in the cluster - turning the tokens into normal tokens
earlier in the procedure in `token_metadata`, but this is fine for the
single-node case since no streaming is happening.
2023-04-20 16:36:41 +02:00
Kamil Braun
58baf998c1 raft topology: commit a new CDC generation on node bootstrap
After inserting new CDC generation data (see previous commit), we need
to pick a timestamp for this generation and commit it, telling all nodes
in the cluster to start using the generation for CDC log writes once
their clocks cross that timestamp.

We introduce a separate step to the bootstrap saga, before
`write_both_read_old`, called `commit_cdc_generation`. In this step, the
coordinator takes the `new_cdc_generation_data_uuid` stored in a
bootstrapping node's `ring_slice` - which serves as the key to the table
where the CDC generation data is stored - and combines it with a
timestamp which it generates a bit into the future (as in old
gossiper-based code, we use 2 * ring_delay, by default 1 minute). This
gives us a CDC generation ID which we commit into the topology state as
the `current_cdc_generation_id` while switching the saga to the next
step, `write_both_read_old`.

`system_keyspace::load_topology_state` is extended to load
`current_cdc_generation_id`.

For now, nodes don't react to `current_cdc_generation_id`. In later
commit we'll extend `storage_service::topology_state_load` to start
using the current CDC generation for CDC log table writes.

The solution with specifying a timestamp into the future is the same as
it is for gossip-based topology changes and it has the same consistency
problem - if some node is temporarily partitioned away from the quorum,
it might not learn about the new CDC generation before its clock crosses
the generation's timestamp, causing it to temporarily send writes to the
wrong CDC streams (until it learns about the new timestamp). I left a
FIXME which describes an alternative solution which wasn't viable for
gossiper-based topology changes, but it is viable when we have a
fault-tolerant topology coordinator.
2023-04-20 16:36:41 +02:00
Kamil Braun
5942237a79 raft topology: create new CDC generation data during node bootstrap
Calculate a new CDC generation using the bootstrapping node's tokens,
translate it to mutation format, and insert this mutation to the
CDC_GENERATIONS_V3 table through group 0 at the same time we assign
tokens to the node in Raft topology. The partition key for this data is
stored in the bootstrapping node's `ring_slice`.

The data is inserted, but it's not used for anything yet, we'll do it in
later commits.

Two FIXMEs are left for follow-ups:
- in `get_sharding_info` we shouldn't have to use the token owner's IP,
  but get the host ID directly from token metadata (#12279),
- splitting the CDC generation data write into multiple commands. The
  comment elaborates.
2023-04-20 16:35:37 +02:00
Pavel Emelyanov
30b6f34a0b s3/client: Explicitly set _upload_id empty when completing
The upload_sink::_upload_id remains empty until upload starts, remains
non-empty while it proceeds, then becomes empty again after it
completes. The upload_started() method cheks that and on .close()
started upload is aborted.

The final switch to empty is done by std::move()ing the upload id into
completion requrest, but it's better to use std::exchange() to emphasize
the fact the the _upload_id becomes empty at that point for a reason.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13570
2023-04-20 17:32:08 +03:00
Kamil Braun
4e7628fa16 service: topology_state_machine: make topology::find const 2023-04-20 16:16:36 +02:00
Kamil Braun
22094f1509 db: system_keyspace: small refactor of load_topology_state
The variables necessary for constructing a `ring_slice` are now living in
a local block of code. This makes it easier to see which data is part of
the `ring_slice` and will make it easier to add more data to
`ring_slice` in following commits.

Also add some more sanity checking.
2023-04-20 15:40:23 +02:00
Avi Kivity
1cd6d59578 Merge 'Remove global proxy usage from view_info::select_statement()' from Pavel Emelyanov
The method needs proxy to get data_dictionary::database from to pass down to select_statement::prepare(). And a legacy bit that can come with data_dictionary::database as well. Fortunately, all the call traces that end up at select_statement() start inside table:: methods that have view_update_generator, or at view_builder::consumer that has reference to view_builder. Both services can share the database reference. However, the call traces in question pass through several code layers, so the PR adds data_dictionary::database to those layers one by one.

Closes #13591

* github.com:scylladb/scylladb:
  view_info: Drop calls to get_local_storage_proxy()
  view_info: Add data_dictionary argument to select_statement()
  view_info: Add data_dictionary argument to partition_slice() method
  view_filter_checking_visitor: Construct with data_dictionary
  view: Carry data_dictionary arg through standalone helpers
  view_updates: Carry data_dictionary argument throug methods
  view_update_builder: Construct with data dictionary
  table: Push view_update_generator arg to affected_views()
  view: Add database getters to v._update_generator and v._builder
2023-04-20 16:40:06 +03:00
Kamil Braun
3abe0f0ad6 cdc: generation: extract pure parts of make_new_generation outside
`cdc::generation_service::make_new_cdc_generation` would create a new
CDC generation and insert it into the `CDC_GENERATIONS_V2` table these
days. For Raft-based topology chnages we'll do the data insertion
somewhere else - in topology coordinator code. So extract the parts for
calculating the CDC generation to free-standing functions (these are
almost pure calculations, modulo accessing RNG).
2023-04-20 15:38:59 +02:00
Kamil Braun
2233d8f54d db: system_keyspace: add storage for CDC generations managed by group 0
The `CDC_GENERATIONS_V3` table schema is a copy-paste of the
`CDC_GENERATIONS_V2` schema. The difference is that V2 lives in
`system_distributed_keyspace` and writes to it are distributed using
regular `storage_proxy` replication mechanisms based on the token ring.
The V3 table lives in `system_keyspace` and any mutations written to it
will go through group 0.

Also extend the `TOPOLOGY` schema with new columns:
- `new_cdc_generation_data_uuid` will be stored as part of a bootstrapping
  node's `ring_slice`, it stores UUID of a newly introduced CDC
  generation which is used as partition key for the `CDC_GENERATIONS_V3`
  table to access this new generation's data. It's a regular column,
  meaning that every row (corresponding to a node) will have its own.
- `current_cdc_generation_uuid` and `current_cdc_generation_timestamp`
  together form the ID of the newest CDC generation in the cluster.
  (the uuid is the data key for `CDC_GENERATIONS_V3`, the timestamp is
  when the CDC generation starts operating). Those are static columns
  since there's a single newest CDC generation.
2023-04-20 15:38:58 +02:00
Kamil Braun
07382d634a service: topology_state_machine: better error checking for state name (de)serialization
For example:
```
 std::ostream& operator<<(std::ostream& os, ring_slice::replication_state s) {
    os << replication_state_to_name_map[s];
    return os;
 }
```
this would print an empty string if the state was missing from
`replication_state_to_name_map` (because `operator[]` default-construct
a value if it's missing).

Use `find` instead and make it an error if the state is missing.

Also turn `throw std::runtime_error` into `on_internal_error` in
deserialization functions because failure to deserialize a state name is
an internal error, not user error.
2023-04-20 15:38:37 +02:00
Kamil Braun
59b692e799 service: raft: plumbing cdc::generation_service&
Pass a reference to the service into places. It shall be used later, by
the group 0 state machine and topology coordinator.
2023-04-20 15:38:37 +02:00
Kamil Braun
1e9cf3badd cdc: generation: get_cdc_generation_mutations: take timestamp as parameter
The function would generate a mutation timestamp for itself, take it as
parameter instead. We'll use timestamps provided by Group 0 APIs when
creating CDC generations during Group 0- based topology changes.
2023-04-20 15:38:37 +02:00
Kamil Braun
85f4f1830b cdc: generation: make topology_description_generator::get_sharding_info a parameter
The function used to obtain the sharding info for a given node (its
number of shards and ignore_msb_bits) was using gossiper application
states.

We want to reuse `topology_description_generator` to build CDC
generations when doing Raft Group 0-based topology changes, so make
`get_sharding_info` a parameter.
2023-04-20 15:38:37 +02:00
Kamil Braun
3e863d0e58 sys_dist_ks: make get_cdc_generation_mutations public
It was a `static` function inside system_distributed_keyspace. Later it
will be used for another table living in system_keyspace, so move it
outside, to the CDC generations module, and make it accessible from
other places.
2023-04-20 15:38:37 +02:00
Kamil Braun
ed133db709 sys_dist_ks: move find_schema outside get_cdc_generation_mutations
The function will be reused for a different table.
2023-04-20 15:38:37 +02:00
Kamil Braun
0e84662910 sys_dist_ks: move mutation size threshold calculation outside get_cdc_generation_mutations
The function turns a `cdc::topology_description` into a vector of
mutations. It decides when to push_back a new mutation (instead of
extending an existing one) based on certain parameters. This calculation
is specific to where we insert the mutation later.

Move the calculation outside, to the function which does the insertion.
`get_cdc_generation_mutations` will be used outside this function later.
2023-04-20 15:38:37 +02:00
Kamil Braun
52366f33e5 service/raft: group0_state_machine: signal topology state machine in load_snapshot
The `_topology_state_machine.event` condition variable should be
signalled whenever the topology state is updated, including on snapshot
load.
2023-04-20 15:38:37 +02:00
Avi Kivity
43a0b40082 Merge 'Remove global proxy usage from API handlers' from Pavel Emelyanov
There are few places in the API handlers that call global proxy for their needs. Most of those places are easy to patch, because proxy is either at http_ctx thing right inside the handler code. Also there's a handler code in view_builder that needs proxy too, but it really needs topology, not proxy, and can get it elsewhere (the handler is coroutinized while at it)

Closes #13593

* github.com:scylladb/scylladb:
  view: Get topology via database tokens
  view: Indentation fix after previous patch
  view: Coroutinuze view_builder::view_build_statuses()
  api: Use ctx.sp in storage service handler
  api,main: Unset storage_proxy API on stop
  api: Use ctx.sp in set_storage_proxy() routes
2023-04-20 16:31:31 +03:00
Botond Dénes
66ee73641e test/cql-pytest/nodetool.py: no_autocompaction_context: use the correct API
This `with` context is supposed to disable, then re-enable
autocompaction for the given keyspaces, but it used the wrong API for
it, it used the column_family/autocompaction API, which operates on
column families, not keyspaces. This oversight led to a silent failure
because the code didn't check the result of the request.
Both are fixed in this patch:
* switch to use `storage_service/auto_compaction/{keyspace}` endpoint
* check the result of the API calls and report errors as exceptions

Fixes: #13553

Closes #13568
2023-04-20 16:21:16 +03:00
Kamil Braun
8d7b5f1710 Merge 'test/pylib: topology fix asyncio fixture and fix logger' from Alecco
Remove unnecessary asyncio marker and re-introduce top level logger instance.

Closes #13561

* github.com:scylladb/scylladb:
  test/pylib: add missing logger
  test/pylib: remove unnecessary asyncio marker
2023-04-20 14:23:05 +02:00
Alejo Sanchez
11561a73cb test/pylib: ManagerClient helpers to wait for...
server to see other servers after start/restart

When starting/restarting a server, provide a way to wait for the server
to see at least n other servers.

Also leave the implementation methods available for manual use and
update previous tests, one to wait for a specific server to be seen, and
one to wait for a specific server to not be seen (down).

Fixes #13147

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13438
2023-04-20 14:22:31 +02:00
Avi Kivity
342cdb2a63 Update tools/jmx submodule (split Depends line)
* tools/jmx 15fd4ca...fdd0474 (1):
  > dist/debian: split Depends into multiple lines
2023-04-20 15:11:33 +03:00
Pavel Emelyanov
bda2aea5be view: Get topology via database tokens
The view_builder::view_build_statuses() needs topology to walk its
nodes. Now it gets one from global proxy via its token metadata, but
database also has tokens and view_builder has reference to database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 13:18:14 +03:00
Pavel Emelyanov
403463d7eb view: Indentation fix after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 13:18:14 +03:00
Pavel Emelyanov
257814f443 view: Coroutinuze view_builder::view_build_statuses()
Easier to patch it this way further.
Indentation is deliberately left broken until next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 13:17:59 +03:00
Pavel Emelyanov
ece731301c api: Use ctx.sp in storage service handler
Similarly to previous patch, but from another routes group. The storage
service API calls mainly use storage service, but one place needs proxy
to call recalculate_schema_version() with

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 13:14:52 +03:00
Pavel Emelyanov
21136058bd api,main: Unset storage_proxy API on stop
So that the routes referencing and using ctx.sp don't step on a proxy
that's going to be removed (not now, but some time later) fron under
them on shutdown.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 13:14:04 +03:00
Pavel Emelyanov
8d490d20dc api: Use ctx.sp in set_storage_proxy() routes
It's already used in many other places, few methods still stick to
global proxy usage.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 13:12:49 +03:00
Alejo Sanchez
2c1ba377bf test/pylib: add missing logger
The logger instancewas removed in a previous commit but it is used in
the wrapper helper. Add it back.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-04-20 10:36:02 +02:00
Alejo Sanchez
05338a6cd7 test/pylib: remove unnecessary asyncio marker
Remove missing asyncio marker for fixture as this is only needed for
tests.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-04-20 10:36:02 +02:00
Pavel Emelyanov
edcce7d8dd view_info: Drop calls to get_local_storage_proxy()
In both cases the proxy is called to get data_dictionary from. Now its
available as the call argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 11:17:46 +03:00
Pavel Emelyanov
3e4fb7cad6 view_info: Add data_dictionary argument to select_statement()
This method needs data_dictionary to work. Fortunately, all callers of
it already have the dictionary at hand and can just pass it as argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 11:17:46 +03:00
Pavel Emelyanov
4375835cdd view_info: Add data_dictionary argument to partition_slice() method
The caller is calculate_affected_clustering_ranges() with dictionary
arg, the method needs dictionary to call view_info::select_statement()
later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 11:17:46 +03:00
Pavel Emelyanov
0aff55cdb2 view_filter_checking_visitor: Construct with data_dictionary
The visitor is wait-free helper for matches_view_filter() that has
dictionary as its argument. Later the visitor will pass the dictionary
to view_info::select_statement().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 11:17:46 +03:00
Pavel Emelyanov
837fde84b1 view: Carry data_dictionary arg through standalone helpers
There's a bunch of functions in view.{hh|cc} that don't belong to any
class and perform view-related claculations for view updates. Lots of
them eventually call view_info::select_statement() which will later need
the dictionary.

By now all those methods' callers have data dictionary at hand and can
share it via argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 11:17:46 +03:00
Pavel Emelyanov
1301a99ba3 view_updates: Carry data_dictionary argument throug methods
The goal is to have the dictionary at places that later wrap calls to
view_info::select_statement(). This graph of calls starts at the only
public view_updates::generate_update() method which, in turn, is called
from view_update_builder that already has data dictionary at hand.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 11:17:46 +03:00
Pavel Emelyanov
9d3d533561 view_update_builder: Construct with data dictionary
The caller is table with view-update-generator at hand (it calls
mutate_MV on). Builder here is used as a temporary object that destroys
once the caller coroutine co_return-s, so keeping the database obtained
from the view-update-generator is safe.

Later the v.u.b. object will propagate its data dictionary down the
callstacks.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 11:17:38 +03:00
Pavel Emelyanov
4a16ab3bd4 table: Push view_update_generator arg to affected_views()
Caller already has it to call mutate_MV() on. The method in question
will need the generator in one of the next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 10:42:31 +03:00
Pavel Emelyanov
7ddcd0c918 view: Add database getters to v._update_generator and v._builder
Both services carry database which will be used by auxiliary objects
like view_updates, view_update_builder, consumer, etc in next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 10:41:16 +03:00
Warren Krewenki
73eaebe338 Remove visible :orphan:
The text `:orphan:` was showing up in the scylla.yaml documentation with no context.

Closes #13524
2023-04-20 08:24:48 +03:00
Avi Kivity
9fb5443f87 cql3: abstract_function_selector: use small_vector for argument buffer
abstract_function_selector uses a preallocated vector to store the
arguments to aggregate functions, to prevent an allocation for every row.
Use small_vector to prevent an allocation per query, if the number of
arguments happens to be small.

This isn't expected to make a significant performance difference.
2023-04-19 20:42:25 +03:00
Avi Kivity
3e0aacc8b5 db, cql3: functions: pass function parameters as a span instead of a vector
Spans are more flexible and can be constructed from any contiguous
container (such as small_vector), or a subrange of such a container.
This can save allocations, so change the signature to accept a span.

Spans cannot be constructed from std::initializer_list, so one such
call site is changed to use construct a span directly from the single
argument.
2023-04-19 20:38:55 +03:00
Avi Kivity
9072763a52 keys: change from_optional_exploded to accept a span instead of a vector
A span is more generic than a vector, and can be constructed
from any contiguous container (like small_vector), or a subset
of a container.

To support this, helpers in compound.hh need to use make_iterator_range,
since a span doesn't fit the container concept (since spans don't own
their contents).

This is needed to make a similar change to function evaluation, as
the token function passes its parameters to from_optional_exploded().
2023-04-19 20:18:50 +03:00
Avi Kivity
6ca1b14488 Update tools/jmx submodule (drop java 8 on debian)
* tools/jmx 3316f7a...15fd4ca (1):
  > dist/debian: drop dependencies on jdk-8
2023-04-19 19:51:03 +03:00
Botond Dénes
0c430c01e9 Merge 'cql: allow SUM() aggregations which result in a NaN' from Nadav Har'El
This short PR fixes a bug in SUM() aggregation where if the data contains +Inf and -Inf the returned sum should be NaN but we returned an error instead. This is a recent regression uncovered by a dtest (see issue #13551), but in the first patch we add additional tests in the cql-pytest framework which reproduce this bug and explore various other areas (wrongly) implicated by the failing dtest.

Fixes #13551

Closes #13564

* github.com:scylladb/scylladb:
  cql3: allow SUM() aggregation to result in a NaN
  test/cql-pytest: add tests for data casts and inf in sums
2023-04-19 13:50:23 +03:00
Pavel Emelyanov
a77ca69360 s3/test: Rename MINIO_SERVER_ADDRESS environment variable
Using it the pylib minio code export minio address for tests. This
creates unneeded WTFs when running the test over AWS S3, so it's better
to rename to variable not to mention MINIO at all.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-19 12:51:12 +03:00
Pavel Emelyanov
12c4e7d605 s3/test: Keep public bucket name in environment
Local test.py runs minio with the public 'testbucket' bucket and all
test cases know that. This series adds an ability to run tests over real
S3 so the bucket name should be configurable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-19 12:51:12 +03:00
Pavel Emelyanov
91674da982 s3/test: Fix upload stream closure
If multipart upload fails for some reason the output stream remains not
closed and the respective assertion masquerades the original failure.
Fix that by closing the stream in all cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-19 12:51:12 +03:00
Pavel Emelyanov
b239e0d368 test/lib: Add getenv_safe() helper
The helper is like ::getenv() but checks if the variable exists and
throws descriptive exception. So instead of

    fatal error: in "...": std::logic_error: basic_string: construction from null is not valid

one could get something like

   fatal error: in "...": std::logic_error: Environment variable ... not set

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-19 12:49:26 +03:00
Botond Dénes
ad065aaa62 Update tools/jmx submodule
* tools/jmx e9bfaabd...3316f7a9 (2):
  > select-java: avoid exec multiple paths
  > select-java: extract function out
2023-04-19 11:18:19 +03:00
Nadav Har'El
81e0f5b581 cql3: allow SUM() aggregation to result in a NaN
When floating-point data contains +Inf and -Inf, the sum is NaN.

Our SUM() aggregation calculated this sum correctly, but then instead
of returning it, complained that the sum overflowed by narrowing.
This was a false positive: The sum() finalizer wanted to test that no
precision was lost when casting the accumulator to the result type,
so checked that the result before and after the cast are the same.
But specifically for NaN, it is never equal to anything - not even
to itself. This check is wrong for floating point, but moreover -
isn't even necessary when the two types (accumulator type and result
type) are identical so in this patch we skip it in this case.

Note that in the current code, a different accumulator and result type
is only used in the case of integer types; When accumulating floating
point sums, the same type is used, so the broken check will be avoided.

The test for this issue starts to pass with this patch, so the xfail
tag is removed.

Fixes #13551

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-04-19 09:31:41 +03:00
Nadav Har'El
5b792dde68 Merge 'Extend aws_sigv4 code to suite S3 client needs' from Pavel Emelyanov
The AWS signature-generating code was moved from alternator some time ago as is. Now it's clear that in which places it should be extended to work for S3 client as well. The enhancements are

- Support UNSIGNED-PAYLOAD to omit calculating checksums for request body
- Include full URL path into the signature, not just hard-coded "/" string
- Don't check datastamp expiration if not asked for

This is a part of #13493

Closes #13535

* github.com:scylladb/scylladb:
  utils/aws: Brush up the aws_sigv4.hh header
  utils/aws: Export timepoint formatter
  utils/aws: Omit datestamp expiration checks when not needed
  utils/aws: Add canonical-uri argument
  utils/aws: Support unsigned-payload signatures
2023-04-18 16:33:52 +03:00
Pavel Emelyanov
9628d07adb Put storage_service.hh on a diet
By removing unneeded headers inclusions. At the cost of few more forward
declarations and a couple of extra includes in other .cc files.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13552
2023-04-18 14:53:17 +03:00
Nadav Har'El
78555ba7f1 test/cql-pytest: add tests for data casts and inf in sums
This patch adds tests to reproduce issue #13551. The issue, discovered
by a dtest (cql_cast_test.py), claimed that either cast() or sum(cast())
from varint type broke. So we add two tests in cql-pytest:

1. A new test file, test_cast_data.py, for testing data casts (a
   CAST (...) as ... in a SELECT), starting with testing casts from
   varint to other types.

   The test uncovers a lot of interesting cases (it is heavily
   commented to explain these cases) but nothing there is wrong
   and all tests pass on Scylla.

2. An xfailing test for sum() aggregate of +Inf and -Inf. It turns out
   that this caused #13551. In Cassandra and older Scylla, the sum
   returned a NaN. In Scylla today, it generates a misleading
   error message.

As usual, the tests were run on both Cassandra (4.1.1) and Scylla.

Refs #13551.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-04-18 13:38:42 +03:00
Anna Stuchlik
3d25edf539 doc: remove the sequential repair option from docs
Fixes https://github.com/scylladb/scylladb/issues/12132

The sequential repair mode is not supported. This commit
removes the incorrect information from the documentation.

Closes #13544
2023-04-18 09:45:48 +03:00
Tomasz Grabiec
a8f8f9f0ea Merge 'raft topology: store shard_count and ignore_msb in topology' from Kamil Braun
Add new columns to the `system.topology` table: `shard_count` and `ignore_msb`. When a node bootstraps or restarts and observes that the values stored in `topology` are different than the local values, it updates them. This is done in the `update_topology_with_local_metadata` function (the 'metadata' here being the two values).

Additional flag persisted in `system.scylla_local` is used to safely avoid performing read barriers when the values didn't change on node restart. A comment in `update_topology_with_local_metadata` explains why this flag is needed.

An example use case where `shard_count` and `ignore_msb` are needed is creating CDC generations.

Fixes: #13508

Closes #13521

* github.com:scylladb/scylladb:
  raft topology: update `release_version` in topology on restart
  raft topology: store `shard_count` and `ignore_msb` in topology
2023-04-18 01:18:50 +02:00
Anna Stuchlik
da7a75fe7e doc: remove in-memory tables from OSS docs
Related: https://github.com/scylladb/scylladb/issues/13119

This commit removes the information about in-memory tables
from the Open Source documentation, as it is an Enterprise-only
feature.

Closes #13496
2023-04-17 16:00:09 +03:00
Botond Dénes
de67978211 Update tools/jmx submodule
* tools/jmx 826da61d...e9bfaabd (1):
  > metrics: revert 'metrics: EstimatedHistogram::getValues() returns bucketOffsets'
2023-04-17 15:42:11 +03:00
Avi Kivity
7724223134 Merge 'utils: big_decimal: optimize big_decimal::compare() and use <=> operator' from Kefu Chai
in this series, we use <=> operator to replace `big_decimal::compare()` for better readability. also, we trade the chained ternary expression with a more verbose if-else statement for better performance and readability.

Closes #13478

* github.com:scylladb/scylladb:
  utils: big_decimal: replace compare() with <=> operator
  utils: big_decimal: optimize big_decimal::compare()
2023-04-17 14:33:53 +03:00
Avi Kivity
7a42927a3d treewide: stop using 'using namespace std' in namespace scope
Such namespace-wide imports can create conflicts between names that
are the same in seastar and std, such as {std,seastar}::future and
{std,seastar}::format, since we also have 'using namespace seastar'.

Replace the namespace imports with explicit qualification, or with
specific name imports.

Closes #13528
2023-04-17 14:08:37 +03:00
Botond Dénes
38c14a556a Merge 'A couple of s3/client fixes found when testing over AWS S3' from Pavel Emelyanov
This is a part of PR #13493 that contains found fixes for the client code itself. The original PR has some questions to resolve, so it's worth merging the fixes separately.

Closes #13534

* github.com:scylladb/scylladb:
  s3/client: Add comments about multipart upload completion message
  s3/client: Fix succeeded/failed part upload final checking
  s3/client: Fix parts to start from 1
2023-04-17 13:33:12 +03:00
Botond Dénes
b8e47569e6 Merge 'doc: extend the information about the recommended RF on the Tracing page' from Anna Stuchlik
Fixes https://github.com/scylladb/scylla-doc-issues/issues/823.
This PR extends the note on the Tracing page to explain what is meant by setting the RF to ALL and adds a link for reference.

Closes #12418

* github.com:scylladb/scylladb:
  docs: add an explanation to recommendation in the Note box
  doc: extend the information about the recommended RF on the Tracing page
2023-04-17 13:28:19 +03:00
Anna Stuchlik
2d2d92cf18 docs: add an explanation to recommendation in the Note box 2023-04-17 11:39:06 +02:00
Kamil Braun
a4159cc281 raft topology: update release_version in topology on restart
Check on node start if local value of `release_version` changed. If
it did, update it in `system.topology` like we do with `shard_count` and
`ignore_msb`.
2023-04-17 10:52:05 +02:00
Kamil Braun
f9051dccaa raft topology: store shard_count and ignore_msb in topology
Add new columns to the `system.topology` table: `shard_count` and
`ignore_msb`. When a node bootstraps or restarts and observes that the
values stored in `topology` are different than the local values, it
updates them. This is done in the `update_topology_with_local_metadata`
function (the 'metadata' here being the two values).

Additional flag persisted in `system.scylla_local` is used to safely
avoid performing read barriers when the values didn't change on node
restart. A comment in `update_topology_with_local_metadata` explains why
this flag is needed.

An example use case where `shard_count` and `ignore_msb` are needed is
creating CDC generations.

Fixes: #13508
2023-04-17 10:45:30 +02:00
Pavel Emelyanov
d09d6adbf4 utils/aws: Brush up the aws_sigv4.hh header
Add lost pragma-once directive.

Remove the hashers.hh inclusion. It was carried in when the whole code
was detached from alternator (f5de0582c8), but this header is not needed
in the header, only in the .cc file which uses sha256_hasher.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-17 11:16:45 +03:00
Pavel Emelyanov
792490e095 utils/aws: Export timepoint formatter
The format of timestamp for AWS requests is defined in documentation,
there's already the code that prepares it in this form. This patch
exports this method so that S3 client could use it in next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-17 11:14:45 +03:00
Pavel Emelyanov
706b60a0b0 utils/aws: Omit datestamp expiration checks when not needed
The signing code is used in two ways -- by alternator to verify the
arrived signed request and by S3 client to prepare the signed request.
In the former case date expiration check is performed, but for the
latter this is not required, because date stamp is most likely now (or
close to it).

So this patch makes the orig_datestamp argument optional meaning that
expiration checks can be omited.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-17 11:14:45 +03:00
Pavel Emelyanov
c5ccef078a utils/aws: Add canonical-uri argument
Current signing code hard-codes the "/" as the URL, likely this just
works for alternator. For S3 client the URL would include bucket and
object name and should thus become the argument, not constant.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-17 11:14:45 +03:00
Pavel Emelyanov
8eabe9c4ef utils/aws: Support unsigned-payload signatures
For S3 signing the whole request payload can be too resource consuming.
Fortunately, payload signing is only enforced if used with plain http,
but with real S3 we're going to use signed requests over https only (see
next patch why).

Said that, the patch turns body-content into optional reference (i.e. --
a pointer) so that the signing code could inject the UNSIGNED-PAYLOAD
mark instead of the payload signature and omit heavy payload signing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-17 11:14:45 +03:00
Pavel Emelyanov
7c7a3416c5 s3/client: Add comments about multipart upload completion message
The message length is pre-calculated in advance to provide correct
content-length request header. This math is not obvious and deserves a
comment.

Also, the final message preparation code is also implicitly checking
if any part failed to upload. There's a comment in the upload_sink's
upload_part() method about it, but the finalization place deserves one
too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-17 11:08:34 +03:00
Pavel Emelyanov
3f86bed600 s3/client: Fix succeeded/failed part upload final checking
When all parts upload complete the final message is prepared and sent
out to the server. The preparation code is also responsible for checking
if all parts uploaded OK by checking the part etag to be non-empty. In
that check a misprint crept in -- the whole list is checked to be empty,
not the individual etag itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-17 11:08:15 +03:00
Botond Dénes
6c889213bf Merge 'Topology add node exception safety' from Benny Halevy
Currently if index_node throws when trying to
add an already indexed node, pop_node might
unindex the existing node instead of the new one.

Instead, with this change, unindex_node looks up
the node by its pointer and removed it from the
index map only if it's found there so to clean up
safely after index_node throws (at any stage).

Add a unit test to verify that.

In addition, added a unit test to reproduce #13502 and test the fix.

Closes #13512

* github.com:scylladb/scylladb:
  test: locator_topology: add test_update_node
  topology: add_node, unindex_node: make exception safe
2023-04-17 11:02:15 +03:00
Pavel Emelyanov
79379760e6 s3/client: Fix parts to start from 1
Docs say, that part numbers should start from 1, while the code follows
the tradition and starts from 0. Minio is conveniently incompatible in
this sense so test had been passing so far. On real S3 part number 0
ends up with failed request.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-17 10:43:12 +03:00
Botond Dénes
4c37dc5507 Merge 'keys: specialize fmt::formatter<partition_key> and friends' from Kefu Chai
this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print following classes without the help of `operator<<`.

- partition_key_view
- partition_key
- partition_key::with_schema_wrapper
- key_with_schema
- clustering_key_prefix
- clustering_key_prefix::with_schema_wrapper

the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now. the helper
of `print_key()` is removed, as its only caller is
`operator<<(std::ostream&, const
clustering_key_prefix::with_schema_wrapper&)`.

the reason why all these operators are replaced in one go is that
we have a template function of `key_to_str()` in `db/large_data_handler.cc`.
this template function is actually the caller of operator<< of
`partition_key::with_schema_wrapper` and
`clustering_key_prefix::with_schema_wrapper`.
so, in order to drop either of these two operator<<, we need to remove
both of them, so that we can switch over to `fmt::to_string()` in this
template function.

Refs scylladb#13245

Closes #13513

* github.com:scylladb/scylladb:
  keys: consolidate the formatter for partition_keys
  keys: specialize fmt::formatter<partition_key> and friends
2023-04-17 10:27:31 +03:00
Benny Halevy
58129fad92 locator/topology: call seastar::current_backtrace only when log_level is enabled
`seastar::current_backtrace()` can be quite heavey.
When we pass it to a log message in relatively detailed log_level
(debug/trace), we pay the price of `current_backtrace` every time,
but we rarely print the message.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-16 14:22:06 +03:00
Benny Halevy
490a0ae89b schema_tables: call seastar::current_backtrace only when log_level is enabled
`seastar::current_backtrace()` can be quite heavey.
When we pass it to a log message in relatively detailed log_level
(debug/trace), we pay the price of `current_backtrace` every time,
but we rarely print the message.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-16 14:22:06 +03:00
Kefu Chai
6bb32efac0 utils: big_decimal: replace compare() with <=> operator
now that we are using C++20, it'd be more convenient if we can use
the <=> operator for comparing. the compiler creates the 6 other
operators for us if the <=> operator is defined. so the code is more
compacted.

in this change, `big_decimal::compare()` is replaced with `operator<=>`,
and its caller is updated accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-15 12:52:30 +08:00
Kefu Chai
e991e6087e utils: big_decimal: optimize big_decimal::compare()
before this change in the worst case, the underlying
`number::compare()` gets called twice. as it is used by Boost::multiprecision
to implement the comparing operators of `number`. but since we can
have the result in one go, there is no need to to perform the
comparison multiple times.

so, in this change, we just call `number::compare()` explicitly,
and use it to implement `compare()`. this should save a call of
`number::compare()`. also, the chained ternary expression is
replaced using if-else statement for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-15 12:52:30 +08:00
Pavel Emelyanov
c501163f95 Merge 'reader_permit: give better names to active* states' from Botond Dénes
The names of these states have been the source of confusion ever since they were introduced. Give them names which better reflects their true meaning and gives less room for misinterpretation. The changes are:
* active/unused  -> active
* active/used    -> active/need_cpu
* active/blocked -> active/await

Hopefully the new names do a better job at conveying what these states really mean:
* active - a regular admitted permit, which is active (as opposed to an inactive permit).
* active/need_cpu - an active permit which was marked as needing CPU for the read to make progress. This permit prevents admission of new permits while it is in this state.
* active/await - a former active/need_cpu permit, which has to wait on I/O or a remote shard. While in this state, it doesn't block the admission of new permits (pending other criteria such as resource availability).

Closes #13482

* github.com:scylladb/scylladb:
  docs/dev/reader-concurrency-semaphore.md: expand on how the semaphore works
  reader_permit: give better names to active* states
2023-04-14 20:39:05 +03:00
Pavel Emelyanov
4e7f4b9303 Merge 'scripts/open-coredump.sh: allow user to plug in scylla package' from Botond Dénes
Lately we have observed that some builds are missing the package_url in the build metadata. This is usually caused by changes in how build metadata is stored on the servers and the s3 reloc server failing to dig them out of the metadata files. A user can usually still obtain the package url but currently there is no way to plug in user-obtained scylla package into the script's workflow.
This PR fixes this by allowing the user to provide the package as `$ARTIFACT_DIR/scylla.package` (in unpacked form).

Closes #13519

* github.com:scylladb/scylladb:
  scripts/open-coredump.sh: allow bypassing the package downloading
  scripts/open-coredump.sh: check presence of mandatory field in build json object
  scripts/open-coredump.sh: more consistent error messaging
2023-04-14 20:35:06 +03:00
Benny Halevy
e18eb71fa3 test: locator_topology: add test_update_node
Reproduces issue fixed in PR #13502

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-14 17:51:07 +03:00
Benny Halevy
e29994b2aa topology: add_node, unindex_node: make exception safe
Current if index_node throws when trying to
add an already indexed node, pop_node might
unindex the existing node instead of the new one.

Instead, with this change, unindex_node looks up
the node by its pointer and removed it from the
index map only if it's found there so to clean up
safely after index_node throws (at any stage).

Add a unit test to verify that.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-14 17:51:05 +03:00
Tomasz Grabiec
952b455310 Merge ' tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes
scylla-sstable currently has two ways to obtain the schema:

    * via a `schema.cql` file.
    * load schema definition from memory (only works for system tables).

This meant that for most cases it was necessary to export the schema into a CQL format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable is inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a schema.cql is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.

This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.

Example:

```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```

As seen above, subdirectories like qurantine, staging etc are also supported.

Fixes: https://github.com/scylladb/scylladb/issues/10126

Closes #13448

* github.com:scylladb/scylladb:
  test/cql-pytest: test_tools.py: add tests for schema loading
  test/cql-pytest: add no_autocompaction_context
  docs: scylla-sstable.rst: remove accidentally added copy-pasta
  docs: scylla-sstable.rst: remove paragraph with schema limitations
  docs: scylla-sstable.rst: update schema section
  test/cql-pytest: nodetool.py: add flush_keyspace()
  tools/scylla-sstable: reform schema loading mechanism
  tools/schema_loader: add load_schema_from_schema_tables()
  db/schema_tables: expose types schema
2023-04-14 16:46:26 +02:00
Botond Dénes
edc75f51ff docs/dev/reader-concurrency-semaphore.md: expand on how the semaphore works
Greatly expand on the details of how the semaphore works.
Organize the content into thematic chapters to improve navigation.
Improve formatting while at it.
2023-04-14 08:51:24 -04:00
Botond Dénes
943ae7fc69 reader_permit: give better names to active* states
The names of these states have been the source of confusion ever since
they were introduced. Give them names which better reflects their true
meaning and gives less room for misinterpretation. The changes are:
* active/unused  -> active
* active/used    -> active/need_cpu
* active/blocked -> active/await

Hopefully the new names do a better job at conveying what these states
really mean:
* active - a regular admitted permit, which is active (as opposed to
  an inactive permit).
* active/need_cpu - an active permit which was marked as needing CPU for
  the read to make progress. This permit prevents admission of new
  permits while it is in this state.
* active/await - a former active/need_cpu permit, which has to wait on
  I/O or a remote shard. While in this state, it doesn't block the
  admission of new permits (pending other criteria such as resource
  availability).
2023-04-14 08:40:46 -04:00
Botond Dénes
cae79ef2c3 scripts/open-coredump.sh: allow bypassing the package downloading
By allowing the user to plug a manually downloaded package. Consequently
the "package_url" field of the build metadata is checked only if there
is no user-provided extracted package.
This allows working around builds for which the metadata server returns
no "package_url", by allowing the user to locate and download the
package themselves, providing it to the script by simply extracting it
as $ARTIFACT_DIR/scylla.package.
2023-04-14 07:48:21 -04:00
Kamil Braun
200123624f Merge 'test: reproducers for store mutation with schema change and host down' from Alecco
Reproducers for https://github.com/scylladb/scylladb/issues/10770.

(Already fixed in 15ebd59071)

Includes necessary improvements and fixes to `pylib`.

Closes #12699

* github.com:scylladb/scylladb:
  test/pytest: reproducers for store mutation...
  test: pylib: Add a way to create cql connections with particular coordinators
  test/pylib: get gossiper alive endpoints
  test/topology: default replication factor 3
  test/pylib: configurable replication factor
2023-04-14 13:47:51 +02:00
Botond Dénes
45fbdbe5f7 scripts/open-coredump.sh: check presence of mandatory field in build json object
Mandatory fields missing in the build json object lead to obscure,
unrelated error messages down the road. Avoid this by checking that all
required fields all present and print an error message if any is
missing.
2023-04-14 07:33:46 -04:00
Botond Dénes
4df5ec4080 scripts/open-coredump.sh: more consistent error messaging
Start all erro messages with "error: ..." and log them to stderr.
2023-04-14 07:24:14 -04:00
Botond Dénes
38d6635afd Update tools/java submodule
* tools/java eddef023...c9be8583 (1):
  > README.md: drop cqlsh from README.md
2023-04-14 11:53:16 +03:00
Botond Dénes
7586491e1e Update tools/jmx/ submodule
* tools/jmx/ 57c16938...826da61d (4):
  > install.sh: do not create /usr/scylla/jmx in nonroot mode
  > install.sh: remove "echo done"
  > reloc-pkg: rename symlinks/scylla-jmx to select-java
  > install.sh: select java executable at runtime
2023-04-14 11:47:54 +03:00
Kefu Chai
c580e30ec7 cql3: expr: return more accurate error message for invalidated token() args
before this change, we just print out the addresses of the elements
in `column_defs`, if the arguments passed to `token()` function are
not valid. this is not quite helpful from the user's perspective. as
user would be more interested in the values. also, we could print
more accurate error message for different error.

in this change, following Cassandra 4.1's behavior, three cases are
identified, and corresponding errors are returned respectively:

* duplicated partition keys
* wrong order of partition key
* missing keys

where, if the partition key order is wrong, instead of printing the
keys specified by user, the correct order is printed in the error
message for helping user to correct the `token()` function.

for better performance, the checks are performed only if the keys
do not match, based on the assumption that the error handling path
is not likely to be executed.

tests are added accordingly. they tested with Canssandra 4.1.1 also.

Fixes #13468
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13470
2023-04-14 11:46:18 +03:00
Botond Dénes
4eb1bb460a Update tools/python3 submodule
* tools/python3 d2f57dd9...30b8fc21 (1):
  > create-relocatable-package.py: fix timestamp of executable files
2023-04-14 11:39:17 +03:00
Raphael S. Carvalho
47b2a0a1f6 data_directory: Describe storage options of a keyspace
Description of storage options is important for S3, as one
needs to know if underlying storage is either local or
remote, and if the latter, details about it.

This relies on server-side desc statement.

$ ./bin/cqlsh.py -e "describe keyspace1;"

CREATE KEYSPACE keyspace1 WITH replication = { ... } AND
	storage = {'type': 'S3', 'bucket': 'sstables',
		   'endpoint': '127.0.0.1:9000'} AND
	durable_writes = true;

Fixes #13507.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13510
2023-04-14 11:34:35 +03:00
Benny Halevy
054667d5b6 storage_service: node_ops_ctl: send_to_all: print correct set of nodes in nodes_down error message
nodes_failed are printed by mistake, instead of nodes_down

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13509
2023-04-14 11:31:20 +03:00
Botond Dénes
289ff821c9 Merge 'Remove global proxy usage from view builder's value_getter' from Pavel Emelyanov
There's a legacy safety check in view code that needs to find a base table from its schema ID. To do it it calls for global storage proxy instance. The comment says that this code can be removed once computes_column feature is known by everyone. I'm not sure if that's the case, so here's more complicated yet less incompatible way to stop using global proxy instance.

Closes #13504

* github.com:scylladb/scylladb:
  view: Remove unused view_ptr reference
  view: Carry backing-secondary-index bit via view builder
  view: Keep backing-seconday-index bool on value_getter
  table: Add const index manager sgetter
2023-04-14 11:23:23 +03:00
Kefu Chai
60ff230d54 create-relocatable-package.py: use f-string
in dcce0c96a9, we should have used
f-string for printing the return code of gzip subprocess. but the
"f" prefix was missed. so, in this change, it is added.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13500
2023-04-14 08:29:33 +03:00
Raphael S. Carvalho
a47bac931c Move TWCS option from table into TWCS itself
enable_optimized_twcs_queries is specific to TWCS, therefore it
belongs to TWCS, not replica::table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13489
2023-04-14 08:28:16 +03:00
Anna Stuchlik
989a75b2f7 doc: update the metrics between 5.2 and 2023.1
Related: https://github.com/scylladb/scylla-enterprise/issues/2794

This commit adds the information about the metric changes
in version 2023.1 compared to version 5.2.

This commit is part of the 5.2-to-2023.1 upgrade guide and
must be backported to branch-5.2.

Closes #13506
2023-04-14 08:23:53 +03:00
Kefu Chai
85b21ba049 keys: consolidate the formatter for partition_keys
since there are two places formatting `with_schema_wrapper`, it'd
be desirable if we can consolidate them. so, in this change, the
formatting code is extracted into a helper, so we only have a single
place for formatting the `with_schema_wrapper`s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-14 13:21:30 +08:00
Kefu Chai
3738fcbe05 keys: specialize fmt::formatter<partition_key> and friends
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print following classes without the help of `operator<<`.

- partition_key_view
- partition_key
- partition_key::with_schema_wrapper
- key_with_schema
- clustering_key_prefix
- clustering_key_prefix::with_schema_wrapper

the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now. the helper
of `print_key()` is removed, as its only caller is
`operator<<(std::ostream&, const
clustering_key_prefix::with_schema_wrapper&)`.

the reason why all these operators are replaced in one go is that
we have a template function of `key_to_str()` in `db/large_data_handler.cc`.
this template function is actually the caller of operator<< of
`partition_key::with_schema_wrapper` and
`clustering_key_prefix::with_schema_wrapper`.
so, in order to drop either of these two operator<<, we need to remove
both of them, so that we can switch over to `fmt::to_string()` in this
template function.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-14 13:21:30 +08:00
Botond Dénes
1da02706dd Merge 'Discard SSTable bloom filter on load-and-stream' from Raphael "Raph" Carvalho
Load-and-stream reads the entire content from SSTables, therefore it can
afford to discard the bloom filter that might otherwise consume a significant
amount of memory. Bloom filters are only needed by compaction and other
replica::table operations that might want to check the presence of keys
in the SSTable files, like single-partition reads.

It's not uncommon to see Data:Filter ratio of less than 100:1, meaning
that for ~300G of data, filters will take ~3G.

In addition to saving memory footprint, it also reduces operation time
as load-and-stream no longer have to read, parse and build the filters
from disk into memory.

Closes #13486

* github.com:scylladb/scylladb:
  sstable_loader: Discard SSTable bloom filter on load-and-stream
  sstables: Allow SSTable loading to discard bloom filter
  sstables: Allow sstable_directory user to feed custom sstable open config
  sstables: Move sstable_open_info into open_info.hh
2023-04-14 06:18:54 +03:00
Alejo Sanchez
9597822214 test/pytest: reproducers for store mutation...
with schema change and host down

Reproducers for a failure during lwt operation due to missing of a
column mapping in schema history table.

Issue #10770
2023-04-13 21:23:03 +02:00
Tomasz Grabiec
041ee3ffdd test: pylib: Add a way to create cql connections with particular coordinators
Usage:

  await manager.driver_connect(server=servers[0])
  manager.cql.execute(f"...", execution_profile='whitelist')
2023-04-13 21:23:03 +02:00
Alejo Sanchez
62a945ccd5 test/pylib: get gossiper alive endpoints
Helper to get list of gossiper alive endpoints from REST API.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-04-13 21:23:03 +02:00
Alejo Sanchez
08d754e13f test/topology: default replication factor 3
For most tests there will be nodes down, increase replication factor to
3 to avoid having problems for partitions belonging to down nodes.

Use replication factor 1 for raft upgrade tests.
2023-04-13 21:23:02 +02:00
Alejo Sanchez
3508a4e41e test/pylib: configurable replication factor
Make replication factor configurable for the RandomTables helper.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-04-13 21:23:02 +02:00
Benny Halevy
b71f229fc2 topology: node: update_node: do not override internal changed flag by state option
Currently, opt_st overrides the internal `changed` flag
by setting it with the opt_st changed status.
Instead, it should use `|=` to keep it true if it is already so.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13502
2023-04-13 17:46:59 +02:00
Raphael S. Carvalho
fe6df3d270 sstable_loader: Discard SSTable bloom filter on load-and-stream
Load-and-stream reads the entire content from SSTables, therefore it can
afford to discard the bloom filter that might otherwise consume a significant
amount of memory. Bloom filters are only needed by compaction and other
replica::table operations that might want to check the presence of keys
in the SSTable files, like single-partition reads.

It's not uncommon to see Data:Filter ratio of less than 100:1, meaning
that for ~300G of data, filters will take ~3G.

In addition to saving memory footprint, it also reduces operation time
as load-and-stream no longer have to read, parse and build the filters
from disk into memory.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-13 11:34:22 -03:00
Raphael S. Carvalho
17261369ea sstables: Allow SSTable loading to discard bloom filter
If bloom filter is not loaded, it means that an always-present filter
is used, which translates into the SSTable being opened on every
single read.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-13 11:34:22 -03:00
Raphael S. Carvalho
1427a5ce98 sstables: Allow sstable_directory user to feed custom sstable open config
This will be used by load-and-stream to load SSTables in its own
customized way.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-13 11:34:16 -03:00
Raphael S. Carvalho
86516f4cef sstables: Move sstable_open_info into open_info.hh
So sstable_directory can access its definition without having to
include sstables.hh.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-13 11:31:14 -03:00
Pavel Emelyanov
097cea11b2 view: Remove unused view_ptr reference
After previous patch the value_getter::_view becomes unused and can be
dropped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-13 16:51:27 +03:00
Pavel Emelyanov
821c8b19a6 view: Carry backing-secondary-index bit via view builder
When view builder constructs it populates itself with view updates.
Later the updates may instantiate the value_getter-s which, in turn,
would need to check if the view is backing secondary index.

Good news is that when view builder constructs it has all the
information at hand needed to evaluate this "backing" bit. It's then
propagated down to value_getter via corresponding view_updates.

The getter's _view field becomes unused after this change and is
(void)-ed to make this patch compile.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-13 16:48:36 +03:00
Pavel Emelyanov
e8b5022343 view: Keep backing-seconday-index bool on value_getter
The getter needs to check if the view is backing a secondary index.
Currentl it's done inside the handle_computed_column() method, but it's
more convenient if this bit is known during construction, so move it
there. There are no places that can change this property between
view_getter is created and the method in question is called.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-13 16:45:59 +03:00
Pavel Emelyanov
0d9da46428 table: Add const index manager sgetter
To be used by next patch that will call this helper inside non-mutable
lambda

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-13 16:45:16 +03:00
Botond Dénes
bd57471e54 reader_concurrency_semaphore: don't evict inactive readers needlessly
Inactive readers should only be evicted to free up resources for waiting
readers. Evicting them when waiters are not admitted for any other
reason than resources is wasteful and leads to extra load later on when
these evicted readers have to be recreated end requeued.
This patch changes the logic on both the registering path and the
admission path to not evict inactive readers unless there are readers
actually waiting on resources.
A unit-test is also added, reproducing the overly-agressive eviction and
checking that it doesn't happen anymore.

Fixes: #11803

Closes #13286
2023-04-13 15:20:18 +03:00
Pavel Emelyanov
b1501d4261 s3/client: Don't use designated initialization of sys stat struct
It makes compiler complan about mis-ordered initialization of st_nlink
vs st_mode on different arches. Current code (st_nlink before st_mode)
compiled fine on x86, but fails on ARM which wants st_mode to come
before st_nlink. Changing the order would, apparently, break x86 build
with similar message.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13499
2023-04-13 15:13:56 +03:00
Kefu Chai
87170bf07a build: cmake: add more tests
this change should add the remaining tests under boost/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13494
2023-04-13 14:57:00 +03:00
Botond Dénes
e103ef3bcb Update seastar submodule
* seastar 1204efbc...ed7a0f54 (46):
  > gate: s/intgernal/internal/
  > reactor: set reactor::_stopping to true on all shards
  > condition-variable: replace the coroutine wakeup task with a promise
  > tutorial: explain the buffer_size_t param of generator coroutine
  > log: call log_level_map explicitly in constructor
  > future: de-variadicate make_ready_future() and similar helpers
  > timer-set,scollectd: remove unnecessary ";"
  > util/conversion: remove inclusion guards
  > foreign_ptr: destroy: use run_in_background
  > abort_source, abortable_fifo: use is_nothrow_invocable_r_v<>
  > alien: add type constraint for alien::run_on and alien::submit_to
  > alien: add noexcept specifier for lambda passed to run_on()
  > test: alien_test: test alien::run_on() also
  > test: alien_test: throw if unexpected things happens
  > future: make API level 6 mandatory
  > api-level: update IDE fallback
  > core/on_internal_error: always log error with backtrace
  > future: make API level 5 mandatory
  > websocket: fix frame parsing.
  > websocket: fix frame assembling.
  > when_all: drop code for API_LEVEL < 4
  > future: drop internal call_then_impl
  > future: when_all_succeed(): make API level 4 mandatory
  > reactor: trade comment for type constraints
  > sstring: s/is_invocable_r/is_invocable_r_v/
  > doc: compatibility: document API levels 5 and 6
  > demos: file_demo: pass a string_view to_open_file_dma()
  > TLS: Add issuer/subject info to verification error message
  > test: fstream_test: drop unnecessary API_LEVEL check
  > manual_clock: advance: use run_in_background to expire_timers
  > reactor: add run_in_backround and close
  > websocket: shutdown input first.
  > websocket: use gate to guard background tasks.
  > websocket: remove trailing spaces.
  > websocket_demo: ignore sleep_aborted exception.
  > websocket_demo: fix coredump.
  > fstream: drop API level 2 (make_file_output_stream() returning non-future)
  > core/sstring: do not use ostream_formatter
  > metrics: use fmt::to_string() when creating a label
  > backtrace: fix size calculation in dl_iterate_phdr
  > Downgrade expected stall detector warning to info
  > fix: Add missing inline code blocks
  > spawn_test: fix /bin/cat stuck in reading input.
  > reactor: pass fd opened in blocking mode to spawned process
  > reactor: skip sigaction if handler has been registered before.
  > reactor: allow registering handler multiple times for a signal.
2023-04-13 14:28:30 +03:00
Kefu Chai
29ca0009a2 dist/debian: do not Depend on ${shlibs:Depends}
the substvar of `${shlibs:Depends}` is set by dh_shlibdeps, which
inspects the ELF images being packaged to figure out the shared
library dependencies for packages. but since f3c3b9183c,
we just override the `override_dh_shlibdeps` target in debian/rules
with no-op. as we take care of the shared library dependencies by
vendoring the runtime dependencies by ourselves using the relocatable
package. so this variable is never set. that's why `dpkg-gencontrol`
complains when processing `debian/control` and trying to materialize
the substvars.

in this change, the occurances of `${shlibs:Depends}` are removed
to silence the warnings from `dpkg-gencontrol`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13457
2023-04-13 08:34:05 +03:00
Raphael S. Carvalho
9760149e8d compaction: Don't bump compaction shares during major execution
Commit 49892a0, back in 2018, bumps the compaction shares by 200 to
guarantee a minimum base line.

However, after commit e3f561d, major compaction runs in maintenance
group meaning that bumping shares became completely irrelevant and
only causes regular compaction to be unnecessarily more aggressive.

Fixes #13487.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13488
2023-04-13 08:20:25 +03:00
Botond Dénes
50ee4033a9 Update tools/jmx submodule
* tools/jmx 602329c9...57c16938 (1):
  > install.sh: replace tab with spaces
2023-04-12 13:28:23 +03:00
Botond Dénes
5d0c0ae0c4 Merge 'token_metadata: use topology nodes for endpoint_to_host_id map' from Benny Halevy
Currently, token_metadata_impl maintains a "shadow" endpoint to host_id map on top of the maps in topology.
This series first reimplements the functions that currently use this map to use topology instead.
Then the important users of `get_endpoint_to_host_id_map_for_reading`: node_ops_ctl and view_builder
and converted to use a new `topology::for_each_node` function to process all nodes in topology directly, without going through `get_endpoint_to_host_id_map_for_reading`.

Closes #13476

* github.com:scylladb/scylladb:
  view_builder: view_build_statuses: use topology::for_each_node
  storage_service: node_ops_ctl: refresh_sync_nodes: use topology::for_each_node
  topology: add for_each_node
  token_metadata: get endpoint to node map from topology
2023-04-12 10:33:02 +03:00
Botond Dénes
1440efa042 test/cql-pytest: test_tools.py: add tests for schema loading
A set of comprehensive tests covering all the supported ways of providing
the schema to scylla-sstable, either explicitely or implicitely
(auto-detect).
2023-04-12 03:14:43 -04:00
Botond Dénes
76a7d3448f test/cql-pytest: add no_autocompaction_context 2023-04-12 03:14:43 -04:00
Botond Dénes
b7a4304b69 docs: scylla-sstable.rst: remove accidentally added copy-pasta 2023-04-12 03:14:43 -04:00
Botond Dénes
1673f10f7a docs: scylla-sstable.rst: remove paragraph with schema limitations
The above file contained a paragraph explaining the limitations of
`scylla-sstable.rst` w.r.t. automatically finding the schema. This no
longer applies so remove it.
2023-04-12 03:14:43 -04:00
Botond Dénes
9f9beef8fd docs: scylla-sstable.rst: update schema section
With the recent changes to the ways schema can be provided to the tool.
2023-04-12 03:14:43 -04:00
Botond Dénes
222f624757 test/cql-pytest: nodetool.py: add flush_keyspace()
It would have been better if `flush()` could have been called with a
keyspace and optional table param, but changing it now is too much
churn, so we add a dedicated method to flush a keyspace instead.
2023-04-12 03:14:43 -04:00
Botond Dénes
ffec1e5415 tools/scylla-sstable: reform schema loading mechanism
So far, schema had to be provided via a schema.cql file, a file which
contains the CQL definition of the table. This is flexible but annoying
at the same time. Many times sstables the tool operates on are located
in their table directory in a scylla data directory, where the schema
tables are also available. To mitigate this, an alternative method to
load the schema from memory was added which works for system tables.
In this commit we extend this to work for all kind of tables: by
auto-detecting where the scylla data directory is, and loading the
schema tables from disk.
2023-04-12 03:14:43 -04:00
Botond Dénes
fd4c2f2077 tools/schema_loader: add load_schema_from_schema_tables()
Allows loading the schema for the designated keyspace and table, from
the system table sstables located on disk. The sstable files opened for
read only.
2023-04-12 03:14:43 -04:00
Botond Dénes
63b266a988 db/schema_tables: expose types schema 2023-04-12 02:43:53 -04:00
Botond Dénes
0c51f72ad6 Merge 'utils, mutation: replace operator<<(..) with fmt formatter' from Kefu Chai
this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `tombstone` and `shadowable_tombstone` without the help of fmt::ostream. and their `operator<<(ostream,..)` are dropped, as there are no users of them anymore.

Refs #13245

Closes #13474

* github.com:scylladb/scylladb:
  mutation: specialize fmt::formatter<tombstone> and fmt::formatter<shadowable_tombstone>
  utils: specialize fmt::formatter<optional<>>
2023-04-12 09:32:56 +03:00
Kefu Chai
ff202723c6 utils: big_decimal: specialize fmt::formatter<big_decimal>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `big_decimal` without the help of `operator<<`. this operator
is droppe in this change, as all its callers are now using fmtlib
for formatting now. we might need to use fmtlib to implement `big_decimal::to_string()`,
and use `fmt::to_string()` instead, but let's leave it for a follow-up
change.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13479
2023-04-12 09:20:50 +03:00
Botond Dénes
f82287a9af Update tools/jmx/ submodule
* tools/jmx/ b7ae52bc...602329c9 (1):
  > metrics: EstimatedHistogram::getValues() returns bucketOffsets
2023-04-12 09:17:57 +03:00
Botond Dénes
525b21042f Merge 'Rewrite sstables keyspace compaction task' from Aleksandra Martyniuk
Task manager task implementations of classes that cover
rewrite sstables keyspace compaction which can be start
through /storage_service/keyspace_compaction/ api.

Top level task covers the whole compaction and creates child
tasks on each shard.

Closes #12714

* github.com:scylladb/scylladb:
  test: extend test_compaction_task.py to test rewrite sstables compaction
  compaction: create task manager's task for rewrite sstables keyspace compaction on one shard
  compaction: create task manager's task for rewrite sstables keyspace compaction
  compaction: create rewrite_sstables_compaction_task_impl
2023-04-12 08:38:59 +03:00
Aleksandra Martyniuk
25cfffc3ae compaction: rename local_offstrategy_keyspace_compaction_task_impl to shard_offstrategy_keyspace_compaction_task_impl
Closes #13475
2023-04-12 08:38:25 +03:00
Kefu Chai
1cb95b8cff mutation: specialize fmt::formatter<tombstone> and fmt::formatter<shadowable_tombstone>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `tombstone` and `shadowable_tombstone` without the
help of `operator<<`.

in this change, only `operator<<(ostream&, const shadowable_tombstone&)`
is dropped, and all its callers are now using fmtlib for formatting the
instances of `shadowable_tombstone` now.
`operator<<(ostream&, const tombstone&)` is preserved. as it is still
used by Boost::test for printing the operands in case the comparing tests
fail.

please note, before this change we were using a concrete string
for indent. after this change, some of the places are changed to
using fmtlib for indent.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-12 10:57:03 +08:00
Kefu Chai
c980bd54ad utils: specialize fmt::formatter<optional<>>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `optional<T>` without the help of `operator<<()`.

this change also enables us to ditch more `operator<<()`s in future.
as we are relying on `operator<<(ostream&, const optional<T>&)` for
printing instances of `optional<T>`, and `operator<<(ostream&, const optional<T>&)`
in turn uses the `operator<<(ostream&, const T&)`. so, the new
specialization of `fmt::formatter<optional<>>` will remove yet
another caller of these operators.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-12 10:57:03 +08:00
Benny Halevy
535b71eba3 view_builder: view_build_statuses: use topology::for_each_node
Instead of tmptr->get_endpoint_to_host_id_map_for_reading.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-11 18:14:51 +03:00
Benny Halevy
d89fb02d24 storage_service: node_ops_ctl: refresh_sync_nodes: use topology::for_each_node
Instead of tmptr->get_endpoint_to_host_id_map_for_reading.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-11 18:14:47 +03:00
Kefu Chai
59579d5876 utils: fragment_range: specialize fmt::formatter<FragmentedView>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print classes fulfill the requirement of `FragmentedView` concept
without the help of template function of `to_hex()`, this function is
dropped in this change, as all its callers are now using fmtlib
for formatting now. the helper of `fragment_to_hex()` is dropped
as well, its only caller is `to_hex()`.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13471
2023-04-11 16:09:38 +03:00
Benny Halevy
7b76369ffc topology: add for_each_node
To eventually replace token_metadata::get_endpoint_to_host_id_map_for_reading

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-11 15:55:39 +03:00
Benny Halevy
e635aa30d6 token_metadata: get endpoint to node map from topology
Don't maintain a "shadow" endpoint_to_host_id_map in token_metadata_impl.

Instead, get the nodes_by_endpoint map from topology
and use it to build the endpoint_to_host_id_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-11 15:48:30 +03:00
Botond Dénes
f1bbf705f9 Merge 'Cleanup sstables in resharding and other compaction types' from Benny Halevy
This series extends sstable cleanup to resharding and other (offstrategy, major, and regular) compaction types so to:
* cleanup uploaded sstables (#11933)
* cleanup staging sstables after they are moved back to the main directory and become eligible for compaction (#9559)

When perform_cleanup is called, all sstables are scanned, and those that require cleanup are marked as such, and are added for tracking to table_state::cleanup_sstable_set.  They are removed from that set once released by compaction.
Along with that sstables set, we keep the owned_ranges_ptr used by cleanup in the table_state to allow other compaction types (offstrategy, major, or regular) to cleanup those sstables that are marked as require_cleanup and that were skipped by cleanup compaction for either being in the maintenance set (requiring offstrategy compaction) or in staging.

Resharding is using a more straightforward mechanism of passing the owned token ranges when resharding uploaded sstables and using it to detect sstable that require cleanup, now done as piggybacked on resharding compaction.

Closes #12422

* github.com:scylladb/scylladb:
  table: discard_sstables: update_sstable_cleanup_state when deleting sstables
  compaction_manager: compact_sstables: retrieve owned ranges if required
  sstables: add a printer for shared_sstable
  compaction_manager: keep owned_ranges_ptr in compaction_state
  compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup
  compaction: refactor compaction_state out of compaction_manager
  compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh
  compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state
  compaction_manager: refactor get_candidates
  compaction_manager: get_candidates: mark as const
  table, compaction_manager: add requires_cleanup
  sstable_set: add for_each_sstable_until
  distributed_loader: reshard: update sstable cleanup state
  table, compaction_manager: add update_sstable_cleanup_state
  compaction_manager: needs_cleanup: delete unused schema param
  compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges
  distributed_loader: reshard: consider sstables for cleanup
  distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard
  distributed_loader: reshard: add optional owned_ranges_ptr param
  distributed_loader: reshard: get a ref to table_state
  distributed_loader: reshard: capture creator by ref
  distributed_loader: reshard: reserve num_jobs buckets
  compaction: move owned ranges filtering to base class
  compaction: move owned_ranges into descriptor
2023-04-11 14:52:29 +03:00
Botond Dénes
38c98b370f Update tools/jmx/ submodule
* tools/jmx/ 48e16998...b7ae52bc (1):
  > install.sh: do not fail if jre-11 is not installed
2023-04-11 14:51:31 +03:00
Kefu Chai
dcce0c96a9 create-relocatable-package.py: error out if pigz fails
before this change, we don't error out even if pigz fails. but
there is chance that pigz fails to create the gzip'ed relocatable
tarball either due to environmental issues or some other problems,
and we are not aware of this until packaging scripts like
`reloc/build_rpm.sh` tries to ungzip this corrupted gzip file.

in this change, if pigz's status code is not 0, the status code
is printed, and create-relocatable-package.py will return 1.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13459
2023-04-11 14:29:25 +03:00
Aleksandra Martyniuk
e170fa1c99 test: extend test_compaction_task.py to test rewrite sstables compaction 2023-04-11 13:07:22 +02:00
Aleksandra Martyniuk
a93f044efa compaction: create task manager's task for rewrite sstables keyspace compaction on one shard
Implementation of task_manager's task that covers rewrite sstables keyspace
compaction on one shard.
2023-04-11 13:07:17 +02:00
Botond Dénes
a8e59d9fb2 Merge 'Metrics relabel from file' from Amnon Heiman
This series adds an option to read the relabel config from file.

Most of Scylla's metrics are reported per-shard, some times they are also reported per scheduling
groups or per tables.  With modern hardware, this can quickly grow to a large number of metrics that
overload Scylla and the  collecting server.

One of the main issues around metrics reduction is that many of the metrics are only
helpful in certain situations.

For example, Scylla monitoring only looks at a subset of the metrics. So in large deployments
it would be helpful to scrap only those.

An option to do that, would be to mark all dashboards related metrics with a label value, and then Prometheus
will request only metrics with that label value.

There are two main limitations to scrap by label values:
1. some of the metrics we want to report are in seastar, so we'll need to label them somehow (we cannot just add random labels to seastar metrics)
2. things change, new metrics are introduce and we may want them, it's not practicall to re-compile and wait
for a new release whenever we want to change a label just for monitoring.

It will be best to have the option to add metrics freely and choose at runtime what to report.

This series make use of Seastar API to perform metrics manipulation dynamically. It includes adding, removing, and changing labels and also enable and disable metrics, and enable and disable the skip_when_empty option.

After this series the configuration could be used with:
```--relabel-config-file conf.yaml```

The general logic and format follows Prometheus metrics_relabel_config configuration.

Where the configuration file looks like:
```
$ cat conf.yaml
relabel_configs:
  - source_labels: [shard]
    action: drop
    target_label: shard
    regex: (2)
  - source_labels: [shard]
    action: replace
    target_label: level
    replacement: $1
    regex: (.*3)

```

Closes #12687

* github.com:scylladb/scylladb:
  main: Load metrics relabel config from a file if it exists
  Add relabel from file support.
2023-04-11 12:47:09 +03:00
Aleksandra Martyniuk
c4098df4ec compaction: create task manager's task for rewrite sstables keyspace compaction
Implementation of task_manager's task covering rewrite sstables keyspace
compaction that can be started through storage_service api.
2023-04-11 11:04:21 +02:00
Aleksandra Martyniuk
814254adfd compaction: create rewrite_sstables_compaction_task_impl
rewrite_sstables_compaction_task_impl serves as a base class of all
concrete rewrite sstables compaction task classes.
2023-04-11 11:03:09 +02:00
Botond Dénes
dba1d36aa6 Merge 'alternator: fix isolation of concurrent modifications to tags' from Nadav Har'El
Alternator's implementation of TagResource, UntagResource and UpdateTimeToLive (the latter uses tags to store the TTL configuration) was unsafe for concurrent modifications - some of these modifications may be lost. This short series fixes the bug, and also adds (in the last patch) a test that reproduces the bug and verifies that it's fixed.

The cause of the incorrect isolation was that we separately read the old tags and wrote the modified tags. In this series we introduce a new function, `modify_tags()` which can do both under one lock, so concurrent tag operations are serialized and therefore isolated as expected.

Fixes #6389.

Closes #13150

* github.com:scylladb/scylladb:
  test/alternator: test concurrent TagResource / UntagResource
  db/tags: drop unsafe update_tags() utility function
  alternator: isolate concurrent modification to tags
  db/tags: add safe modify_tags() utility functions
  migration_manager: expose access to storage_proxy
2023-04-11 11:17:23 +03:00
Anna Stuchlik
2921059ebb doc: add a disclaimer about unsupported upgrade
Fixes https://github.com/scylladb/scylla-enterprise/issues/2805

This commit adds the disclaimer that an upgrade by replacing
the cluster nodes with nodes with a different release
is not supported.

Closes #13445
2023-04-11 10:47:39 +03:00
Kefu Chai
86b66a9875 build: cmake: drop test_table.CC
this change mirrors the corresponding change in `configure.py` in
4b5b6a9010 .

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13461
2023-04-11 09:42:58 +03:00
Nadav Har'El
79114c5030 cql-pytest: translate Cassandra's tests for DELETE operations
This is a translation of Cassandra's CQL unit test source file
validation/operations/DeleteTest.java into our cql-pytest framework.

There are 51 tests, and they did not reproduce any previously-unknown
bug, but did provide additional reproducers for three known issues:

Refs  #4244 Add support for mixing token, multi- and single-column
            restrictions

Refs #12474 DELETE prints misleading error message suggesting ALLOW
            FILTERING would work

Refs #13250 one-element multi-column restriction should be handled like
            a single-column restriction

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13436
2023-04-11 09:10:11 +03:00
Botond Dénes
355583066e Merge 'Reduce memory footprint of SSTable index summary' from Raphael "Raph" Carvalho
SSTable summary is one of the components fully loaded into memory that may have a significant footprint.

This series reduces the summary footprint by reducing the amount of token information that we need to keep
in memory for each summary entry.

Of course, the benefit of this size optimization is proportional to the amount of summary entries, which
in turn is proportional to the number of partitions in a SSTable.

Therefore we can say that this optimization will benefit the most tables which have tons of small-sized
partitions, which will result in big summaries.

Results:

```
BEFORE

[1000000  pkeys]		 data size: 	4035888890,  summary -> memory footprint: 	5843232,  entries: 88158
[10000000 pkeys]		 data size: 	40368888890, summary -> memory footprint: 	55787128, entries: 844925

AFTER

[1000000  pkeys]		 data size: 	4035888890,  summary -> memory footprint: 	4351536,  entries: 88158
[10000000 pkeys]		 data size: 	40368888890, summary -> memory footprint: 	42211984, entries: 844925
```

That shows a 25% reduction in footprint, for both 1 and 10 million pkeys.

Closes #13447

* github.com:scylladb/scylladb:
  sstables: Store raw token into summary entries
  sstables: Don't store token data into summary's memory pool
2023-04-11 08:29:11 +03:00
Botond Dénes
05b381bfa2 Merge 'Simple S3 storage for sstables' from Pavel Emelyanov
The PR adds sstables storage backend that keeps all component files as S3 objects and system.sstables_registry ownership table that keeps track of what sstables objects belong to local node and their names.

When a keyspace is configured with 'STORAGE = { 'type': 'S3' }' the respective class table object eventually gets the storage_options instance pointing to the target S3 endpoint and bucket. All the sstables created for that table attach the S3 storage implementation that maintains components' files as S3 objects. Writing to and reading from components is handled by the S3 client facilities from utils/. Changing the sstable state, which is -- moving between normal, staging and quarantine states -- is not yet implemented, but would eventually happen by updating entries in the sstables registry.

To keep track of which node owns which objects, to provide bucket-wide uniqueness of object names and to maintain sstable state the storage driver keeps records in the system.sstables_registry ownership table. The table maps sstable location and generation to the object format, version, status-state (*) and (!) unique identifier (some time soon this identifier is supposed to be replaced with UUID sstables generations). The component object name is thus s3://bucket/uuid/component_basename. The registry is also used on boot. The distributed loader picks up sstables from all the tables found in schema and for S3-backed keyspaces it lists entries in the registry to a) identify those and b) get their unique S3-side identifiers to open by name.

(*) About sstable's status and state.

The state field is the part of today's sstable path on disk -- staging, quarantine, normal (root table data dir), etc. Since S3 doesn't have the renaming facility, moving sstable between those states is only possible by updating the entry in the registry. This is not yet implemented in this set (#13017)

The status field tracks sstable' transition through its creation-deletion. It first starts with 'creating' status which corresponds to the today's TemporaryTOC file. After being created and written to the sstable moves into 'sealed' state which corresponds to the today's normal sstable being with the TOC file. To delete sstable atomically it first moves into 'removing' state which is equivalent to being in the deletion-log for the on-disk sstable. Once removed from the bucket, the entry is removed from the registry.

To play with:

1. Start minio (installed by install-dependencies.sh)
```
export MINIO_ROOT_USER=${root_user}
export MINIO_ROOT_PASSWORD=${root_pass}
mkdir -p ${root_directory}
minio server ${root_directory}
```

2. Configure minio CLI, create anonymous bucket
```
mc config host rm local
mc config host add local http://127.0.0.1:9000 ${root_user} ${root_pass}
mc mb local/sstables
mc anonymous set public local/sstables
```

3. Start Scylla with object-storage feature enabled
``` scylla ... --experimental-features=keyspace-storage-options --workdir ${as_usual}```

4. Create KS with S3 storage
``` create keyspace ... storage = { 'type': 'S3', 'endpoint': '127.0.0.1:9000', 'bucket': 'sstables' };```

The S3 client has a logger named "s3", it's useful to use on with `trace` verbosity.

Closes #12523

* github.com:scylladb/scylladb:
  test: Add object-storage test
  distributed_loader: Print storage type when populating
  sstable_directory: Add ownership table components lister
  sstable_directory: Make components_lister and API
  sstable_directory: Create components lister based on storage options
  sstables: Add S3 storage implementation
  system_keyspace: Add ownership table
  system_keyspace: Plug to user sstables manager too
  sstable: Make storage instance based on storage options
  sstable_directory: Keep storage_options aboard
  sstable: Virtualize the helper that gets on-disk stats for sstable
  sstable, storage: Virtualize data sink making for small components
  sstable, storage: Virtualize data sink making for Data and Index
  sstable/writer: Shuffle writer::init_file_writers()
  sstable: Make storage an API
  utils: Add S3 readable file impl for random reads
  utils: Add S3 data sink for multipart upload
  utils: Add S3 client with basic ops
  cql-pytest: Add option to run scylla over stable directory
  test.py: Equip it with minio server
  sstables: Detach write_toc() helper
2023-04-11 08:17:25 +03:00
Benny Halevy
96660b2ef7 table: discard_sstables: update_sstable_cleanup_state when deleting sstables
We need to remove the deleted sstables from
update_sstable_cleanup_state otherwise their data and index
files will remain opened and their storage space won't be reclaimed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:37:56 +03:00
Benny Halevy
4db961ecac compaction_manager: compact_sstables: retrieve owned ranges if required
If any of the sstables to-be-compacted requires cleanup,
retrive the owned_ranges_ptr from the table_state.

With that, staging sstables will eventually be cleaned up
via regular compaction.

Refs #9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:36:10 +03:00
Benny Halevy
9105f9800c sstables: add a printer for shared_sstable
Refactor the printing logic in compaction::formatted_sstables_list
out to sstables::to_string(const shared_sstable&, bool include_origin)
and operator<<(const shared_sstable) on top of it.

So that we can easily print std::vector<shared_sstable>
from compaction_manager in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:31:35 +03:00
Benny Halevy
d87925d9fc compaction_manager: keep owned_ranges_ptr in compaction_state
When perform_cleanup adds sstables to sstables_requiring_cleanup,
also save the owned_ranges_ptr in the compaction_state so
it could be used by other compaction types like
regular, reshape, or major compaction.

When the exhausted sstables are released, check
if sstables_requiring_cleanup is empty, and if it is,
clear also the owned_ranges_ptr.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:30:53 +03:00
Benny Halevy
c2bf0e0b72 compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup
As a first step towards parallel cleanup by
(regular) compaction and cleanup compaction,
filter all sstables in perform_cleanup
and keep the set of sstables in the compaction_state.

Erase from that set when the sstables are unregistered
from compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:30:39 +03:00
Benny Halevy
b3192b9f16 compaction: refactor compaction_state out of compaction_manager
To use it both from compaction_manager and compaction_descriptor
in a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:28:16 +03:00
Benny Halevy
73280c0a15 compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh
So it can be used in the next patch that will refactor
compaction_state out of class compaction_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:19:04 +03:00
Benny Halevy
690697961c compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state
To be used for managing sstables requiring cleanup.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:18:02 +03:00
Benny Halevy
cac60a09ac compaction_manager: refactor get_candidates
Allow getting candidates for compaction
from an arbitrary range of sstable, not only
the in_strategy_sstables.

To be used by perform_cleanup to mark all sstables
that require cleanup, even if they can't be
compacted at this time.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:16:57 +03:00
Benny Halevy
bbfe839a73 compaction_manager: get_candidates: mark as const
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:16:12 +03:00
Benny Halevy
6ebafe74b9 table, compaction_manager: add requires_cleanup
Returns true iff any of the sstables in the set
requries cleanup.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:14:36 +03:00
Benny Halevy
d765686491 sstable_set: add for_each_sstable_until
Calls a function on all sstables or until the
function returns stop_iteration::yes.

Change the sstable_set_impl interface to expose
only for_each_sstable_until and let
sstable_set::for_each_sstable use that, wrapping
the void-returning function passed to it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:11:58 +03:00
Benny Halevy
db7fa9f3be distributed_loader: reshard: update sstable cleanup state
Since the sstables are loaded from foreign open info
we should mark them for cleanup if needed (and owned_ranges_ptr is provided).

This will allow a later patch to enable filtering
for cleanup only for sstable sets containing
sstables that require cleanup.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:11:00 +03:00
Benny Halevy
d0690b64c1 table, compaction_manager: add update_sstable_cleanup_state
update_sstable_cleanup_state calls needs_cleanup and
inserts (or erases) the sstable into the respective
compaction_state.sstables_requiring_cleanup set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:10:55 +03:00
Benny Halevy
1baca96de1 compaction_manager: needs_cleanup: delete unused schema param
It isn't needed.  The sstable already has a schema.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:03:53 +03:00
Benny Halevy
ac9f8486ba compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges
I'm not sure why this was originally supported,
maybe for upgrade sstables where we may want to
rewrite the sstables without filtering any tokens,
but perform_sstable_upgrade is now following a
different code path and uses `rewrite_sstables`
directly, without pigybacking on cleanup.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:03:03 +03:00
Benny Halevy
ecbd112979 distributed_loader: reshard: consider sstables for cleanup
When called from `process_upload_dir` we pass a list
of owned tokens to `reshard`.  When they are available,
run resharding, with implicit cleanup, also on unshared
sstables that need cleanup.

Fixes #11933

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:01:38 +03:00
Benny Halevy
3ccbb28f2a distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard
To facilitate implicit cleanup of sstables via resharding.

Refs #11933

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:59:38 +03:00
Benny Halevy
aa4b18f8fb distributed_loader: reshard: add optional owned_ranges_ptr param
For passing owned_ranges_ptr from
distributed_loader::process_upload_dir.

Refs #11933

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:57:41 +03:00
Benny Halevy
f540af930b distributed_loader: reshard: get a ref to table_state
We don't reference the table itself, only as_table_state.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:57:11 +03:00
Benny Halevy
c6b7fcc26f distributed_loader: reshard: capture creator by ref
Now that reshard is a coroutine, creator is preserved
in the coroutine frame until completion so we can
simply capture it by reference now.

Note that previously it was moved into the compaction
descriptor, but the capture wasn't mutable so it was
copied anyhow and this change doesn't introduced a
regression.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:56:07 +03:00
Benny Halevy
7c9d16ff96 distributed_loader: reshard: reserve num_jobs buckets
We know in advance how many buckets we need.
We still need to emplace the first bucket upfront.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:55:35 +03:00
Benny Halevy
0c6ce5af74 compaction: move owned ranges filtering to base class
Move the token filtering logic down from cleanup_compaction
to regular_compaction and class compaction so it can be
reused by other compaction types.

Create a _owned_ranges_checker in class compaction
when _owned_ranges is engaged, and use it in
compaction::setup to filter partitions based on the owned ranges.

Ref scylladb/scylladb#12998

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:55:09 +03:00
Benny Halevy
09df04c919 compaction: move owned_ranges into descriptor
Move the owned_ranges_ptr, currently used only by
cleanup and upgrade compactions, to the generic
compaction descriptor so we apply cleanup in other
compaction types.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:52:12 +03:00
Pavel Emelyanov
fd817e199c Merge 'auth: replace operator<<(..) with fmt formatter' from Kefu Chai
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `auth::auth_authentication_options` and `auth::resource_kind`
without the help of fmt::ostream. and their `operator<<(ostream,..)` are
dropped, as there are no users of them anymore.

Refs #13245

Closes #13460

* github.com:scylladb/scylladb:
  auth: remove unused operator<<(.., resource_kind)
  auth: specialize fmt::formatter<resource_kind>
  auth: remove unused operator<<(.., authentication_option)
  auth: specialize fmt::formatter<authentication_option>
2023-04-10 17:05:09 +03:00
Pavel Emelyanov
21ef5bcc22 test: Add object-storage test
The test does

- starts scylla (over stable directory
- creates S3-backed keyspace (minio is up and running by test.py
  already)
- creates table in that keyspace and populates it with several rows
- flushes the keyspace to make sstables hit the storage
- checks that the ownership table is populated properly
- restarts scylla
- makes sure old entries exist

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:44:29 +03:00
Pavel Emelyanov
8b9e9671de distributed_loader: Print storage type when populating
On boot it's very useful to know which storage a table comes from, so
add the respective info to existing log messages.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:44:29 +03:00
Pavel Emelyanov
f04c6cdf9a sstable_directory: Add ownership table components lister
When sstables are stored on object storage, they are "registered" in the
system.sstables_registry ownership table. The sstable_directory is
supposed to list sstables from this table, so here's the respective
components lister.

The lister is created by sstables_manager, by the time it's requested
from the the system keyspace is already plugged. The lister only handles
"sealed" sstables. Dangling ones are still ignored, this is to be fixed
later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:44:29 +03:00
Pavel Emelyanov
8bd9f7accf sstable_directory: Make components_lister and API
Now the lister is filesystem-specific. There will soon come another one
for S3, so the sstable_directory should be prepared for that by making
the lister an abstract class.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:44:29 +03:00
Pavel Emelyanov
5f7f0117e1 sstable_directory: Create components lister based on storage options
The directory's lister is storage-specific and should be created
differently for different storage options.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:44:29 +03:00
Pavel Emelyanov
950ee0efe8 sstables: Add S3 storage implementation
The driver puts all componenets into

s3://bucket/uuid/component_name

objects where 'bucket' is the keyspace options configuration parameter,
and the 'uuid' is the value obtained from the ownership table.

E.g.

s3://test_bucket/d0a743b0-ad38-11ed-85b5-39b6b0998182/Data.db

The life-time is straightforward.  Until sealed, the sstable has
'creating' status in the table, then it's updated to be 'sealed'. Prior
to removing the objects the status is set to 'deleting' thus allowing
the distributed loader to pick up the dangling objects un re-load (not
yet implemented). Finally, the entry is deleted from the table.

It needs the PR #12648 not to generate empty ks/cf directories on the
local filesystem.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:44:29 +03:00
Pavel Emelyanov
08e9046d07 system_keyspace: Add ownership table
The schema is

CREATE TABLE system.sstables (
    location text,
    generation bigint,
    format text,
    status text,
    uuid uuid,
    version text,
    PRIMARY KEY (location, generation)
)

A sample entry looks like:

 location                                                            | generation | format | status | uuid                                 | version
---------------------------------------------------------------------+------------+--------+--------+--------------------------------------+---------
 /data/object_storage_ks/test_table-d096a1e0ad3811ed85b539b6b0998182 |          2 |    big | sealed | d0a743b0-ad38-11ed-85b5-39b6b0998182 |      me

The uuid field points to the "folder" on the storage where the sstable
components are. Like this:

s3
`- test_bucket
   `- f7548f00-a64d-11ed-865a-0c1fbc116bb3
      `- Data.db
       - Index.db
       - Filter.db
       - ...

It's not very nice that the whole /var/lib/... path is in fact used as
location, it needs the PR #12707 to fix this place.

Also, the "status" part is not yet fully functional, it only supports
three options:

- creating -- the same as TemporaryTOC file exists on disk
- sealed -- default state
- deleting -- the analogy for the deletion log on disk

The latter needs support from the distributed_loader, which's not yet
there. In fact, distributes_loader also needs to be patched to actualy
select entries from this table on load. Also it needs the mentioned
PR #12707 to support staging and quarantine sstables.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:44:28 +03:00
Pavel Emelyanov
e34b86dd61 system_keyspace: Plug to user sstables manager too
The sharded<sys_ks> instances are plugged to large data handler and
compaction manager to maintain the circular dependency between these
components via the interposing database instance. Do the same for user
sstables manager, because S3 driver will need to update the local
ownership table.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
4bb885b759 sstable: Make storage instance based on storage options
This patch adds storage options lw-ptr to sstables_manager::make_sstable
and makes the storage instance creation depend on the options. For local
it just creates the filesystem storage instance, for S3 -- throws, but
next patch will fix that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
df026e2cb5 sstable_directory: Keep storage_options aboard
The class in question will need to know the table's storage it will need
to list sstables from. For that -- construct it with the storage options
taken from table.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
c060f3a52f sstable: Virtualize the helper that gets on-disk stats for sstable
When opening an existing (or just sealed) sstable its components are
stat()-ed to get the on-disk sizes and a bit more. Stat-ing a file by
name on S3 is not (yet) implemented and doing it file-by-file can be
quite terrible. So add a method to return sstable stats in a
storage-specific manner. For S3 this can be implemented by getting the
info from the ownership table (in the future).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
0ddd27cb29 sstable, storage: Virtualize data sink making for small components
This time sstable needs to create a data sink for a component without
having the file at hand. That's pretty much the same as in previous
patch, but the mathod declaration differs slightly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
ac1e56c9d9 sstable, storage: Virtualize data sink making for Data and Index
Add the make_data_or_index_sink() virtual method and its implementation for
filesystem_storage.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
1d4fcce5dd sstable/writer: Shuffle writer::init_file_writers()
The method needs to create two data sinks -- for Data and for Index
files -- and then wrap it with more stuff (compression, checksums,
streams, etc.). With S3 backend using file-output-stream won't work,
becase S3 storage cannot provide writable file API (it has data_sink
instead).

This patch extracts file_data_sink creation so that it could be
virtualized with storage API later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
525a261a4e sstable: Make storage an API
Currently sstable carries a filesystem_storage instance on board. Next
patches will make it possible to use some other storage with different
data accessing methods. This patch makes sstable carry abstract storage
interface and make the existing filesystem_storage implement it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
033fa107f8 utils: Add S3 readable file impl for random reads
Sometimes an sstable is used for random read, sometimes -- for streamed
read using the input stream. For both cases the file API can be
provided, because S3 API allows random reads of arbitrary lengths.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
a4a64149a6 utils: Add S3 data sink for multipart upload
Putting a large object into S3 using plain PUT is bad choice -- one need
to collect the whole object in memory, then send it as a content-length
request with plain body. Less memory stress is by using multipart
upload, but multipart upload has its limitation -- each part should be
at least 5Mb in size. For that reason using file API doesn't work --
file IO API operates with external memory buffers and the file impl
would only have raw pointers to it. In order to collect 5Mb of chunk in
RAM the impl would have to copy the memory which is not good. Unlike the
file API data_sink API is more flexible, as it has temporary buffers at
hand and can cache them in zero-copy manner.

Having sad that, the S3 data_sink implementation is like this:

* put(buffer):
  move the buffer into local cache, once the local cache grows above 5Mb
  send out the part

* flush:
  send out whatever is in cache, then send upload completion request

* close:
  check that the upload finihsed (in flush), abort the upload otherwise

User of the API may (actually should) wrap the sink with output_stream
and use it as any other output_stream.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
3745b5c715 utils: Add S3 client with basic ops
Those include -- HEAD to get size, PUT to upload object in one go, GET
to read the object as contigious buffer and DELETE to drop one.

The client uses http client from seastar and just implements the S3
protocol using it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
ced8a07d09 cql-pytest: Add option to run scylla over stable directory
The facilities in run.py script allow launching scylla over temporary
directory, waiting for it to come alive, killing, etc. The limitation of
those is that the work-dir create for scylla is tighly coupled with its
pid. The object-storage test in next patches will need to check that the
sstables are preserved on scylla restart and this hard binding of
workdir to pid won't work.

This patch generalizes the scylla run/abort helpers to accept an
external directory to work on and adds a call to restart scylla process
over existing directory.

And one small related change here -- log file is opened in O_APPEND mode
so that restarted scylla process continues writing into the old file.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
6dbe41d277 test.py: Equip it with minio server
When test.py starts it activates a minio server inside test-dir and
configures an anonymous bucket for test cases to run on

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
93c8b4b46b sstables: Detach write_toc() helper
When sstable is opened it generates a certain content into TOC file. In
filesystem storage this first gets into TemporaryTOC one. Future S3
driver will need the same to put into TOC object. Not to produce
duplicate code detach the content generation into a helper. Next patches
will make use of it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:00 +03:00
Raphael S. Carvalho
01466be7b9 sstables: Store raw token into summary entries
Scylla stores a dht::token into each summary entry, for convenience.

But that costs us 16 bytes for each summary entry. That's because
dht::token has a kind field in addition to data, both 64 bits.

With 1kk partitions, each averaging 4k bytes, summary may end up
with ~90k summary entries. So dht::token only will add ~1.5M to the
memory footprint of summary.

We know summary samples index keys, therefore all tokens in all
summary entries cannot have any token kind other than 'key'.
Therefore, we can save 8 bytes for each summary entry by storing
a 64-bit raw token and converting it back into token whenever
needed.

Memory footprint of summary entries in a summary goes from
	sizeof(summary_entry) * entries.size(): 1771520
to
	sizeof(summary_entry) * entries.size(): 1417216

which is explained by the 8 bytes reduction per summary entry.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-10 10:26:04 -03:00
Raphael S. Carvalho
6b5cd9ac7b sstables: Don't store token data into summary's memory pool
summary has a memory pool, which is implemented as a set of contiguous
buffer of exponentially increasing size, with the max size of 128k.

This pool served for both storing keys of summary entries and their
respective tokens. The summary entry itself just stores a string_view,
which points to the actual data in the memory pool.

Since this series 31593e1451, which removed token_view, summary_entry
stores the actual token, not just the view.

Therefore, memory is being wasted, as SSTable loader / writer is
unnecessarily storing the token data into the pool.

With 11k summary entries, the footprint drops from 756004 to 624932.
A 18% reduction. Of course, the reduction depends on factors like key
size, where the key size can outweigh significantly this waste.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-10 09:59:11 -03:00
Tomasz Grabiec
64a87f4257 Merge 'Standardize node ops sync_nodes selection' from Benny Halevy
Use token_metadata get_endpoint_to_host_id_map_for_reading
to get all normal token owners for all node operations,
rather than using gossip for some operation and
token_metadata for others.

Fixes #12862

Closes #13256

* github.com:scylladb/scylladb:
  storage_service: node ops: standardize sync_nodes selection
  storage_service: get_ignore_dead_nodes_for_replace: make static and rename to parse_node_list
2023-04-10 13:14:55 +02:00
Benny Halevy
cc42f00232 view: view_builder: start: demote sleep_aborted log error
This is not really an error, so print it in debug log_level
rather than error log_level.

Fixes #13374

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13462
2023-04-09 22:49:06 +03:00
Nadav Har'El
d26bb8c12d Merge 'tree: migrate from std::regex to boost::regex' from Botond Dénes
Except for where usage of `std::regex` is required by 3rd party library interfaces.
As demonstrated countless times, std::regex's practice of using recursion for pattern matching can result in stack overflow, especially on AARCH64. The most recent incident happened after merging https://github.com/scylladb/scylladb/pull/13075, which (indirectly) uses `sstables::make_entry_descriptor()` to test whether a certain path is a valid scylla table path in a trial-and-error manner. This resulted in stacks blowing up in AARCH64.
To prevent this, use the already tried and tested method of switching from `std::regex` to `boost::regex`. Don't wait until each of the `std::regex` sites explode, replace them all preemptively.

Refs: https://github.com/scylladb/scylladb/issues/13404

Closes #13452

* github.com:scylladb/scylladb:
  test: s/std::regex/boost::regex/
  utils: s/std::regex/boost::regex/
  db/commitlog: s/std::regex/boost::regex/
  types: s/std::regex/boost::regex/
  index: s/std::regex/boost::regex/
  duration.cc: s/std::regex/boost::regex/
  cql3: s/std::regex/boost::regex/
  thrift: s/std::regex/boost::regex/
  sstables: use s/std::regex/boost::regex/
2023-04-09 18:47:41 +03:00
Kefu Chai
7a05cc3a06 thrift: initiaize _config first to avoid dangling reference
in c642ca9e73, a reference to the
a parameter `config` passed to the `thrift_server` 's constructor is
passed down to `create_handler_factory()`, which keeps it so it can
create connection handler on demand. but unfortunately,

- the `config` parameter is a temporary variable
- the `config` parameter is moved away in the constructor after
  `create_handler_factory()` is called

hence we have a dangling reference when the factory created by
`create_handler_factory()` tries to deference the reference when
handling a new incoming connection.

in this change,

- the definitions of `_config` and `_handler_factory` member
  variables are transposed, so that the former is initialized
  first.
- `_handler_factory` now keeps a reference to `_config`'s member
  variable, so that the weak reference it holds is always valid.

Fixes #13455
Branches: none
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13456
2023-04-09 11:34:34 +03:00
Amnon Heiman
928727a57d main: Load metrics relabel config from a file if it exists
This patch reads the relabel config from a file if it exists.  A problem
with the file or metrics would stop Scylla from starting. This is on
purpose, as it's a configuration problem that should be addressed.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2023-04-09 09:10:07 +03:00
Amnon Heiman
990545f616 Add relabel from file support.
This patch adds a configuration with an optional file name for
relabeling metrics.  It also adds a function that accepts a file name
and loads the relabel config from a file.

An example for such a file:
```
$cat conf.yml
relabel_configs:
  - source_labels: [shard]
    action: drop
    target_label: shard
    regex: (2)
  - source_labels: [shard]
    action: replace
    target_label: level
    replacement: $1
    regex: (.*3)
```

update_relabel_config_from_file throws an exception on failure, it's up
to the caller to decide what to do in such cases.
2023-04-09 09:10:02 +03:00
Kefu Chai
9d5fbe226e auth: remove unused operator<<(.., resource_kind)
since the only user of operator<<(..., resource_kind) is now
`auth_resource_test`, let's just move it into this test. and
there is no need to keep this operator in the header file where
`resource_kind` is defined.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-07 20:32:28 +08:00
Kefu Chai
ca50a8d6c7 auth: specialize fmt::formatter<resource_kind>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `auth::resource_kind`
without the help of fmt::ostream. its `operator<<(ostream,..)` is
reimplemented using fmtlib accordingly to ease the review.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-07 18:59:13 +08:00
Kefu Chai
ca0ca92e68 auth: remove unused operator<<(.., authentication_option)
since we already have fmt::formatter<authentication_option>, and
there is no exiting users of `operator<<(ostream&,
authentication_option)`, let's just drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-07 18:15:35 +08:00
Kefu Chai
ba0f9036ec auth: specialize fmt::formatter<authentication_option>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `auth::auth_authentication_options`
without the help of fmt::ostream. its `operator<<(ostream,..)` is
reimplemented using fmtlib accordingly to ease the review.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-07 18:15:25 +08:00
Botond Dénes
452cb1a712 test: s/std::regex/boost::regex/
The former is prone to producing stack-overflow as it uses recursion in
it match implementation.

The migration is entirely mechanical.
2023-04-06 09:51:32 -04:00
Botond Dénes
985e33a768 utils: s/std::regex/boost::regex/
The former is prone to producing stack-overflow as it uses recursion in
it match implementation.

The migration is entirely mechanical.
2023-04-06 09:51:28 -04:00
Botond Dénes
52e66e38e7 db/commitlog: s/std::regex/boost::regex/
The former is prone to producing stack-overflow as it uses recursion in
it match implementation.

The migration is entirely mechanical.
2023-04-06 09:51:24 -04:00
Botond Dénes
712889c99f types: s/std::regex/boost::regex/
The former is prone to producing stack-overflow as it uses recursion in
it match implementation.

The migration is entirely mechanical is for the most part.
escape() needs some special treatment, looks like boost::regex wants
double escaped bacspace.
2023-04-06 09:50:45 -04:00
Botond Dénes
cf188f40b9 index: s/std::regex/boost::regex/
The former is prone to producing stack-overflow as it uses recursion in
it match implementation.

The migration is entirely mechanical.
2023-04-06 09:50:41 -04:00
Botond Dénes
4a0188ea6a duration.cc: s/std::regex/boost::regex/
The former is prone to producing stack-overflow as it uses recursion in
it match implementation.

The migration is entirely mechanical.
2023-04-06 09:50:37 -04:00
Botond Dénes
de402878e4 cql3: s/std::regex/boost::regex/
The former is prone to producing stack-overflow as it uses recursion in
it match implementation.

The migration is entirely mechanical.
2023-04-06 09:50:32 -04:00
Botond Dénes
c0b72f70d4 thrift: s/std::regex/boost::regex/
The former is prone to producing stack-overflow as it uses recursion in
it match implementation.

The migration is entirely mechanical.
2023-04-06 09:50:27 -04:00
Botond Dénes
ba031ad181 sstables: use s/std::regex/boost::regex/
The former is prone to producing stack-overflow as it uses recursion in
it match implementation.

The migration is entirely mechanical.
2023-04-06 09:50:12 -04:00
Botond Dénes
c65bd01174 Merge 'Debloat system_keyspace.hh (and a bit of .cc)' from Pavel Emelyanov
The system_keyspace.hh now includes raft stuff, topology changes stuff, task_manager stuff, etc. It's going to include tablets.hh (but maybe not). Anything that deals with system keyspace, and includes system_keyspace.hh, would transitively pull these too. This header is becoming a central hub for all the features.

This PR removes all the headers from system_keyspace.hh that correspond to other "subsystems" keeping only generic mutations/querying and seastar ones.

Closes #13450

* github.com:scylladb/scylladb:
  system_keyspace.hh: Remove unneeded headers
  system_keyspace: Move topology_mutation_builder to storage_service
  system_keyspace: Move group0_upgrade_state conversions to group0 code
2023-04-06 16:39:20 +03:00
Kamil Braun
c2a2996c2b docs: cleaning up after failed membership change
After a failed topology operation, like bootstrap / decommission /
removenode, the cluster might contain a garbage entry in either token
ring or group 0. This entry can be cleaned-up by executing removenode on
any other node, pointing to the node that failed to bootstrap or leave
the cluster.

Document this procedure, including a method of finding the host ID of a
garbage entry.

Add references in other documents.

Fixes: #13122

Closes #13186
2023-04-06 13:48:37 +02:00
Botond Dénes
0a46a574e6 Merge 'Topology: introduce nodes' from Benny Halevy
As a first step towards using host_id to identify nodes instead of ip addresses
this series introduces a node abstraction, kept in topology,
indexed by both host_id and endpoint.

The revised interface also allows callers to handle cases where nodes
are not found in the topology more gracefully by introducing `find_node()` functions
that look up nodes by host_id or inet_address and also get a `must_exist` parameter
that, if false (the default parameter value) would return nullptr if the node is not found.
If true, `find_node` throws an internal error, since this indicates a violation of an internal
assumption that the node must exist in the topology.

Callers that may handle missing nodes, should use the more permissive flavor
and handle the !find_node() case gracefully.

Closes #11987

* github.com:scylladb/scylladb:
  topology: add node state
  topology: remove dead code
  locator: add class node
  topology: rename update_endpoint to add_or_update_endpoint
  topology: define get_{rack,datacenter} inline
  shared_token_metadata: mutate_token_metadata: replicate to all shards
  locator: endpoint_dc_rack: refactor default_location
  locator: endpoint_dc_rack: define default operator==
  test: storage_proxy_test: provide valid endpoint_dc_rack
2023-04-06 13:47:22 +03:00
Pavel Emelyanov
18333b4225 system_keyspace.hh: Remove unneeded headers
Now this header can replace lots of used types with plain forward
declarations

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-06 12:37:00 +03:00
Pavel Emelyanov
1af373cf0a system_keyspace: Move topology_mutation_builder to storage_service
The latter is the only user of the class. This keeps system keyspace
code free from unrelated logic and from raft::server_id type.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-06 12:36:02 +03:00
Pavel Emelyanov
45de375126 system_keyspace: Move group0_upgrade_state conversions to group0 code
In order to keep system keyspace free from group0 logic and from the
service::group0_upgrade_state type

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-06 12:35:07 +03:00
Kefu Chai
0d4ffe1d69 scripts/refresh-submodules.sh: include all commits in summary
before this change, we suse `git submodule summary ${submodule}`
for collecting the titles of commits in between current HEAD and
origin/master. normally, this works just fine. but it fails to
collect all commits if the origin/master happens to reference
a merge commit. for instance, if we have following history like:

1. merge foo
2. bar
3. foo
4. baz  <--- submodule is pointing here.

`git submodule summary` would just print out the titles of commits
of 1 and 3.

so, in this change, instead of relying on `git submodule summary`,
we just collect the commits using `git log`. but we preserve the
output format used by `git submodule summary` to be consistent with
the previous commits bumping up the submodules. please note, in
this change instead of matching the output of `git submodule summary`,
we use `git merge-base --is-ancestor HEAD origin/master` to check
if we are going to create a fastforward change, this is less fragile.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13366
2023-04-06 11:27:14 +03:00
Botond Dénes
9a02315c6b Merge 'Compaction reevaluation bug fixes' from Raphael "Raph" Carvalho
A problem in compaction reevaluation can cause the SSTable set to be left uncompacted for indefinite amount of time, potentially causing space and read amplification to be suboptimal.

Two revaluation problems are being fixed, one after off-strategy compaction ended, and another in compaction manager which intends to periodically reevaluate a need for compaction.

Fixes https://github.com/scylladb/scylladb/issues/13429.
Fixes https://github.com/scylladb/scylladb/issues/13430.

Closes #13431

* github.com:scylladb/scylladb:
  compaction: Make compaction reevaluation actually periodic
  replica: Reevaluate regular compaction on off-strategy completion
2023-04-05 13:51:21 +03:00
Tomasz Grabiec
9802bb6564 Merge 'Remove explicit flush() from sstable component writer' from Pavel Emelyanov
Writing into sstable component output stream should be done with care. In particular -- flushing can happen only once right before closing the  stream. Flushing the stream in between several writes is not going to work, because file stream would step on unaligned IO and S3 upload stream would send completion message to the server and would lose any  subsequent write.

Most of the file_writer users already obey that and flush the writer once right before closing it. The do_write_simple() is extra careful about exceptions handling, but it's an overkill (see first patch).

It's better to make file_writer API explicitly lack the ability to flush itself by flushing the stream when closing the writer.

Closes #13338

* github.com:scylladb/scylladb:
  sstables: Move writer flush into close (and remove it)
  sstables: Relax exception handling in do_write_simple
2023-04-05 12:09:31 +02:00
Tomasz Grabiec
bbabf07f69 Merge 'test/boost/multishard_mutation_query: use random schema' from Botond Dénes
This test currently uses `test/lib/test_table.hh` to generate data for its test cases. This data generation facility is used by no other tests. Worse, it is redundant as we already have a random data generator with fixed schema, in `test/lib/mutation_source_test.hh`. So in this series, we migrate the test cases in said test file to random schema and its random data generation facilities. These are used by several other test cases and using random schema allows us to cover a wider (quasi-infinite) number of possibilities.
After migrating all tests away from it, `test/lib/test_table.hh` is removed.
This series also reduces the runtime of `fuzzy_test` drastically. It should now run in a few minutes or even in seconds (depending on the machine).

Fixes: #12944

Closes #12574

* github.com:scylladb/scylladb:
  test/lib: rm test_table.hh
  test/boos/multishard_mutation_query_test: migrate other tests to random schema
  test/boost/multishard_mutation_query_test: use ks keyspace
  test/boost/multishard_mutation_query_test: improve test pager
  test/boost/multishard_mutation_query_test: refactor fuzzy_test
  test/boost: add multishard_mutation_query_test more memory
  types/user: add get_name() accessor
  test/lib/random_schema: add create_with_cql()
  test/lib/random_schema: fix udt handling
  test/lib/random_schema: type_generator(): also generate frozen types
  test/lib/random_schema: type_generator(): make static column generation conditional
  test/lib/random_schema: type_generator(): don't generate duration_type for keys
  test/lib/random_schema: generate_random_mutations(): add overload with seed
  test/lib/random_schema: generate_random_mutations(): respect range tombstone count param
  test/lib/random_schema: generate_random_mutations(): add yields
  test/lib/random_schema: generate_random_mutations(): fix indentation
  test/lib/random_schema: generate_random_mutations(): coroutinize method
  test/lib/random_schema: generate_random_mutations(): expand comment
2023-04-05 10:32:58 +02:00
Michał Chojnowski
df0905357e mutation_partition_v2: add sentinel to the tracker *after* adding it to the tree
Every tracker insertion has to have a corresponding removal or eviction,
(otherwise the number of rows in the tracker will be misaccounted).

If we add the row to the tracker before adding it to the tree,
and the tree insertion fails (with bad_alloc), this contract will be violated.
Fix that.

Note: the problem is currently irrelevant because an exception during
sentinel insertion will abort the program anyway.

Closes #13336
2023-04-05 09:52:44 +02:00
Raphael S. Carvalho
457c772c9c replica: Make compaction_group responsible for deleting off-strategy compaction input
Compaction group is responsible for deleting SSTables of "in-strategy"
compactions, i.e. regular, major, cleanup, etc.

Both in-strategy and off-strategy compaction have their completion
handled using the same compaction group interface, which is
compaction_group::table_state::on_compaction_completion(...,
				sstables::offstrategy offstrategy)

So it's important to bring symmetry there, by moving the responsibility
of deleting off-strategy input, from manager to group.

Another important advantage is that off-strategy deletion is now throttled
and gated, allowing for better control, e.g. table waiting for deletion
on shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13432
2023-04-05 08:37:48 +03:00
Botond Dénes
f7421aab2c Merge 'cmake: sync with configure.py (16/n)' from Kefu Chai
this is the 15th changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals:

- to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience
- to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules.

also, i just found that the scylla executable built with cmake building system segfault in master HEAD. like
```
AddressSanitizer:DEADLYSIGNAL
=================================================================
==3974496==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0x7ffd48549f70 sp 0x7ffd48549728 T0)
==3974496==Hint: pc points to the zero page.
==3974496==The signal is caused by a READ memory access.
==3974496==Hint: address points to the zero page.
    #0 0x0  (<unknown module>)
    #1 0x14e785a5 in wasmtime_runtime::traphandlers::unix::trap_handler::h1f510afc2968497f /home/kefu/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/wasmtime-runtime-5.0.1/src/traphandlers/unix.rs:159:9
    #2 0x7f3462e5eb9f  (/lib64/libc.so.6+0x3db9f) (BuildId: 6107835fa7d4725691b2b7f6aaee7abe09f493b2)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (<unknown module>)
==3974496==ABORTING
Aborting on shard 0.
Backtrace:
  0xd16c38a
  0x13c5aab0
  0x13b9821e
  0x13c2fdc7
  /lib64/libc.so.6+0x3db9f
  /lib64/libc.so.6+0x8eb93
  /lib64/libc.so.6+0x3daed
  /lib64/libc.so.6+0x2687e
  0xd1e5f8a
  0xd1e3d34
  0xd1ca059
  0xd1c5e29
  0xd1c5605
  0x14e785a5
  /lib64/libc.so.6+0x3db9f
```
decoded:
```
__interceptor_backtrace at ??:?
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /home/kefu/dev/scylladb/seastar/include/seastar/util/backtrace.hh:60
seastar::backtrace_buffer::append_backtrace() at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:778
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:808
seastar::print_with_backtrace(char const*, bool) at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:820
 (inlined by) seastar::sigabrt_action() at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:3882
 (inlined by) operator() at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:3858
 (inlined by) __invoke at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:3854
/lib64/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=6107835fa7d4725691b2b7f6aaee7abe09f493b2, for GNU/Linux 3.2.0, not stripped

__GI___sigaction at :?
__pthread_kill_implementation at ??:?
__GI_raise at :?
__GI_abort at :?
__sanitizer::Abort() at ??:?
__sanitizer::Die() at ??:?
__asan::ScopedInErrorReport::~ScopedInErrorReport() at ??:?
__asan::ReportDeadlySignal(__sanitizer::SignalContext const&) at ??:?
__asan::AsanOnDeadlySignal(int, void*, void*) at ??:?
wasmtime_runtime::traphandlers::unix::trap_handler at /home/kefu/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/wasmtime-runtime-5.0.1/src/traphandlers/unix.rs:159
__GI___sigaction at :?
```

this led me to this change. but unfortunately, this changeset  does not address the segfault. will continue the investigation in my free cycles.

Closes #13434

* github.com:scylladb/scylladb:
  build: cmake: include cxx.h with relative path
  build: cmake: set stack frame limits
  build: cmake: pass -fvisibility=hidden to compiler
  build: cmake: use -O0 on aarch64, otherwise -Og
2023-04-05 06:57:23 +03:00
Yaron Kaikov
c80ab78741 doc: update supported os for 2022.1
ubuntu22.04 is already supported on both `5.0` and `2022.1`

updating the table

Closes #13340
2023-04-05 06:43:58 +03:00
Pavel Emelyanov
f5de0582c8 alternator,util: Move aws4-hmac-sha256 signature generator to util
S3 client cannot perform anonymous multipart uploads into any real S3
buckets regardless of their configuration. Since multipart upload is
essential part of the sstables backend, we need to implement the
authorisation support for the client early.

(side note): with minio anonymous multipart upload works, with aws s3
anonymous PUT and DELETE can be configured, it's exactly the combination
of aws + multipart upload that does need authorization.

Fortunately, the signature generation and signature checking code is
symmetrical and we have the checking option already in alternator :) So
what this patch does is just moves the alternator::get_signature()
helper into utils/. A sad side effect of that is all tests now need to
link with gnutls :( that is used to compute the hash value itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13428
2023-04-04 18:24:48 +03:00
Nadav Har'El
aeabfcb93f Merge 'Revert scylla sstable schema improvements' from Botond Dénes
This PR reverts the scylla sstable schema loading improvements as they fail in CI every other run. I am already working on fixes for these but I am not sure I understand all the failures so it is best to revert and re-post the series later.

Fixes: #13404
Fixes: #13410

Closes #13419

* github.com:scylladb/scylladb:
  Revert "Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes"
  Revert "tools/schema_loader: don't require results from optional schema tables"
2023-04-04 18:22:14 +03:00
Anna Stuchlik
447ce58da5 doc: update Raft doc for versions 5.2 and 2023.1
Fixes https://github.com/scylladb/scylladb/issues/13345
Fixes https://github.com/scylladb/scylladb/issues/13421

This commit updates the Raft documentation page to be up to date in versions 5.2 and 2023.1.

- Irrelevant information about previous releases is removed.
- Some information is clarified.
- Mentions of version 5.2 are either removed (if possible) or version 2023.1 is added.

Closes #13426
2023-04-04 15:15:56 +02:00
Raphael S. Carvalho
156ac0a67a compaction: Make compaction reevaluation actually periodic
The manager intended to periodically reevaluate compaction need for
each registered table. But it's not working as intended.
The reevaluation is one-off.

This means that compaction was not kicking in later for a table, with
low to none write activity, that had expired data 1 hour from now.

Also make sure that reevaluation happens within the compaction
scheduling group.

Fixes #13430.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-04 09:16:19 -03:00
Raphael S. Carvalho
2652b41606 replica: Reevaluate regular compaction on off-strategy completion
When off-strategy compaction completes, regular compaction is not triggered.

If off-strategy output causes the table's SSTable set to not conform the strategy
goal, it means that read and space amplification will be suboptimal until the next
compaction kicks in, which can take undefinite amount of time (e.g. when active
memtable is flushed).

Let's reevaluate compaction on main SSTable set when off-strategy ends.

Fixes #13429.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-04 09:16:16 -03:00
Kefu Chai
dceb364c5c build: cmake: include cxx.h with relative path
before this change, the wasm binding source files includes the
cxxbridge header file of `cxx.h` with its full path.
to better mirror the behavior of configure.py, let's just
include this header file with relative path.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-04 15:33:20 +08:00
Kefu Chai
ecd5bf98d9 build: cmake: set stack frame limits
* transpose include(mode.common) and include (mode.${build_mode}),
  so the former can reference the value defined by the latter.
* set stack_usage_threshold for supported build modes.

please note, this compiler option (-Wstack-usage=<bytes>) is only
supported by GCC so far.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-04 15:33:20 +08:00
Kefu Chai
6cc8800c85 build: cmake: pass -fvisibility=hidden to compiler
this mirrors the behavior of `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-04 15:33:20 +08:00
Kefu Chai
066e9567ee build: cmake: use -O0 on aarch64, otherwise -Og
this addresses an oversight in b234c839e4,
which is supposed to mirror the behavior of `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-04 15:33:20 +08:00
Anna Stuchlik
595325c11b doc: add upgrade guide from 5.2 to 2023.1
Related: https://github.com/scylladb/scylla-enterprise/issues/2770

This commit adds the upgrade guide from ScyllaDB Open Source 5.2
to ScyllaDB Enterprise 2023.1.
This commit does not cover metric updates (the metrics file has no
content, which needs to be added in another PR).

As this is an upgrade guide, this commit must be merged to master and
backported to branch-5.2 and branch-2023.1 in scylla-enterprise.git.

Closes #13294
2023-04-04 08:24:00 +03:00
Botond Dénes
8167f11a23 Merge 'Move compaction manager tasks out of compaction manager' from Aleksandra Martyniuk
Task manager compaction tasks that cover compaction group
compaction need access to compaction_manager::tasks.

To avoid circular dependency and be able to rely on forward
declaration, task needs to be moved out of compaction manager.

To avoid naming confusion compaction_manager::task is renamed.

Closes #13226

* github.com:scylladb/scylladb:
  compaction: use compaction namespace in compaction_manager.cc
  compaction: rename compaction::task
  compaction: move compaction_manager::task out of compaction manager
  compaction: move sstable_task definition to source file
2023-04-03 15:40:42 +03:00
Botond Dénes
54c0a387a2 Revert "Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes"
This reverts commit 32fff17e19, reversing
changes made to 164afe14ad.

This series proved to be problematic, the new test introduced by it
failing quite often. Revert it until the problems are tracked down and
fixed.
2023-04-03 13:54:00 +03:00
Botond Dénes
04b1219694 Revert "tools/schema_loader: don't require results from optional schema tables"
This reverts commit c15f53f971.

Said commit is based on a commit which we want to revert because it's
unit test if flaky.
2023-04-03 13:53:06 +03:00
Petr Gusev
09636b20f3 scylla_cluster.py: optimize node logs reading
There are two occasions in scylla_cluster
where we read the node logs, and in both of
them we read the entire file in memory.
This is not efficient and may cause an OOM.

In the first case we need the last line of the
log file, so we seek at the end and move backwards
looking for a new line symbol.

In the second case we look through the
log file to find the expected_error.
The readlines() method returns a Python
list object, which means it reads the entire
file in memory. It's sufficient to just remove
it since iterating over the file instance
already yields lines lazily one by one.

This is a follow-up for #13134.

Closes #13399
2023-04-03 12:28:08 +02:00
Marcin Maliszkiewicz
99f8d7dcbe db: view: use deferred_close for closing staging_sstable_reader
When consume_in_thread throws the reader should still be closed.

Related https://github.com/scylladb/scylla-enterprise/issues/2661

Closes #13398
Refs: scylladb/scylla-enterprise#2661
Fixes: #13413
2023-04-03 09:02:55 +03:00
Botond Dénes
ca062d1fba Merge ' mutation: replace operator<<(..) with fmt formatter' from Kefu Chai
this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `position_in_partition` and `partition_region` without using ostream<<. also, this change removes `operator<<(ostream, const position_in_partition_view&)` ,  `operator<<(ostream, const partition_region&)` along with their callers.

Refs #13245

Closes #13391

* github.com:scylladb/scylladb:
  mutation: drop operator<< for position_in_partition and friends
  partition_snapshot_row_cursor: do not use operator<< when printing position
  mutation: specialize fmt::formatter<position_in_partition>
  mutation: specialize fmt::formatter<partition_region>
2023-04-03 08:34:55 +03:00
Kefu Chai
6c37829224 wasm: add noexcept specifier for alien::run_on()
as alien::run_on() requires the function to be noexcept, let's
make this explicit. also, this paves the road to the type constraint
added to `alien::run_on()`. the type contraint will enforce this
requirement to the function passed to `alien::run_on()`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13375
2023-04-03 08:19:00 +03:00
Botond Dénes
36e53d571c Merge 'Treewide use-after-move bug fixes' from Raphael "Raph" Carvalho
That's courtersy of 153813d3b8, which annotates Seastar smart pointer classes with Clang's consumed attributes, to help Clang to statically spot use-after-move bugs.

Closes #13386

* github.com:scylladb/scylladb:
  replica: Fix use-after-move in table::make_streaming_reader
  index/built_indexes_virtual_reader.hh: Fix use-after-move
  db/view/build_progress_virtual_reader: Fix use-after-move
  sstables: Fix use-after-move when making reader in reverse mode
2023-04-03 06:57:54 +03:00
Benny Halevy
c17df1759e topology: add node state
Add a simple node state model with:
`joining`, `normal`, `leaving`, and `left` states
to help managing nodes during replace
with the the same ip address.

Later on, this could also help prevent nodes
that were decommissioned, removed, or replaced
from rejoining the cluster.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:18:31 +03:00
Benny Halevy
027f188a97 topology: remove dead code
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:13:04 +03:00
Benny Halevy
f3d5df5448 locator: add class node
And keep per node information (idx, host_id, endpoint, dc_rack, is_pending)
in node objects, indexed by topology on several indices like:
idx, host_id, endpoint, current/pending, per dc, per dc/rack.

The node index is a shorthand identifier for the node.

node* and index are valid while the respective topology instance is valid.
To be used, the caller must hold on to the topology / token_metadata object
(e.g. via a token_metadata_ptr or effective_replication_map)

Refs #6403

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

topology: add node idx

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:13:02 +03:00
Benny Halevy
006e02410f topology: rename update_endpoint to add_or_update_endpoint
To reflect what it does,

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:08:03 +03:00
Benny Halevy
df1c92649e topology: define get_{rack,datacenter} inline
Define get_location() that gets the location
for the local node, and use either this entry point
or get_location(inet_address) to get the respective
dc or rack.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:07:49 +03:00
Benny Halevy
fd1a2591b5 shared_token_metadata: mutate_token_metadata: replicate to all shards
storage_service::replicate_to_all_cores has a sophisticated way
to mutate the token_metadata and effective_replication_map
on shard 0 and cloning those to all other shards, applying
the changes only mutate and clone succeeded on all shards
so we don't end up with only some of the shards with the mutated
copy if an error happend mid-way (and then we would need to
roll-back the change for exception safety).

shared_token_metadata::mutate_token_metadata is currently only called from
a unit test that needs to mutate the token metadata only on shard 0,
but a following patch will require doing that on all shards.

This change adds this capbility by enforcing the call to be
on shard 0m mutating the token_metdata into a temporary pending copy
and cloning it on all other shards.  Only then, when all shard
succeeded, set the modified token_metadata on all shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:07:17 +03:00
Benny Halevy
9cce01a12c locator: endpoint_dc_rack: refactor default_location
Refactor the thread_local default_location out of
topology::get_location so it can be used elsewhere.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:06:53 +03:00
Benny Halevy
5ba5371631 locator: endpoint_dc_rack: define default operator==
and get rid of the ad-hoc implementation in network_topology_strategy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 20:06:52 +03:00
Benny Halevy
5874a0d0ca test: storage_proxy_test: provide valid endpoint_dc_rack
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 19:13:05 +03:00
Benny Halevy
ca61d88764 storage_service: node ops: standardize sync_nodes selection
Use token_metadata get_endpoint_to_host_id_map_for_reading
to get all normal token owners for all node operations,
rather than using gossip for some operation and
token_metadata for others.

Fixes #12862

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-02 09:17:07 +03:00
Raphael S. Carvalho
d2d151ae5b Fix use-after-move when initializing row cache with dummy entry
Courtersy of clang-tidy:
row_cache.cc:1191:28: warning: 'entry' used after it was moved [bugprone-use-after-move]
_partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{_schema});
^
row_cache.cc:1191:60: note: move occurred here
_partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{_schema});
^
row_cache.cc:1191:28: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
_partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{*_schema});

The use-after-move is UB, as for it to happen, depends on evaluation order.

We haven't hit it yet as clang is left-to-right.

Fixes #13400.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13401
2023-03-31 19:46:53 +03:00
Botond Dénes
c15f53f971 tools/schema_loader: don't require results from optional schema tables
When loading a schema from disk, only the `tables` and `columns` tables
are required to have an entry to the loaded schema. All the others are
optional. Yet the schema loader expects all the tables to have a
corresponding entry, which leads to errors when trying to load a schema
which doesn't. Relax the loader to only require existing entries in the
two mandatory tables and not the others.

Closes #13393
2023-03-31 16:35:42 +02:00
Kefu Chai
c24a9600af docs: dev: correct a typo
s/By expending/By expanding/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13392
2023-03-31 17:19:08 +03:00
Raphael S. Carvalho
04932a66d3 replica: Fix use-after-move in table::make_streaming_reader
Variant used by
streaming/stream_transfer_task.cc:        , reader(cf.make_streaming_reader(cf.schema(), std::move(permit_), prs))

as full slice is retrieved after schema is moved (clang evaluates
left-to-right), the stream transfer task can be potentially working
on a stale slice for a particular set of partitions.

static report:
In file included from replica/dirty_memory_manager.cc:6:
replica/database.hh:706:83: error: invalid invocation of method 'operator->' on object 'schema' while it is in the 'consumed' state [-Werror,-Wconsumed]
        return make_streaming_reader(std::move(schema), std::move(permit), range, schema->full_slice());

Fixes #13397.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-31 08:44:46 -03:00
Raphael S. Carvalho
f8df3c72d4 index/built_indexes_virtual_reader.hh: Fix use-after-move
static report:
./index/built_indexes_virtual_reader.hh:228:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed]
                _db.find_column_family(s->ks_name(), system_keyspace::v3::BUILT_VIEWS),

Fixes #13396.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-31 08:41:44 -03:00
Raphael S. Carvalho
1ecba373d6 db/view/build_progress_virtual_reader: Fix use-after-move
use-after-free in ctor, which potentially leads to a failure
when locating table from moved schema object.

static report
In file included from db/system_keyspace.cc:51:
./db/view/build_progress_virtual_reader.hh:202:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed]
                _db.find_column_family(s->ks_name(), system_keyspace::v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS),

Fixes #13395.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-31 08:40:30 -03:00
Raphael S. Carvalho
213eaab246 sstables: Fix use-after-move when making reader in reverse mode
static report:
sstables/mx/reader.cc:1705:58: error: invalid invocation of method 'operator*' on object 'schema' while it is in the 'consumed' state [-Werror,-Wconsumed]
            legacy_reverse_slice_to_native_reverse_slice(*schema, slice.get()), pc, std::move(trace_state), fwd, fwd_mr, monitor);

Fixes #13394.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-31 08:39:11 -03:00
Kefu Chai
6e956c5358 mutation: drop operator<< for position_in_partition and friends
now that all their callers are removed, let's just drop these
operators.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-31 19:03:14 +08:00
Kefu Chai
76dde9fd50 partition_snapshot_row_cursor: do not use operator<< when printing position
in order to prepare for dropping the `operator<<()` for `position_in_partition_view`,
let's use fmtlib to print `position()`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-31 19:03:14 +08:00
Kefu Chai
4ec4859179 mutation: specialize fmt::formatter<position_in_partition>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print

- position_in_partition
- position_in_partition_view
- position_in_partition_view::printer

without the help of fmt::ostream. their `operator<<(ostream,..)` are
reimplemented using fmtlib accordingly to ease the review.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-31 19:03:14 +08:00
Kefu Chai
500eeeb12c mutation: specialize fmt::formatter<partition_region>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `partition_region` with the help of fmt::ostream.

to help with the review process, the corresponding `to_string()` is
dropped, and its callers now switch over to `fmt::to_string()` in
this change as well. to use `fmt::to_string()` helps with consolidating
all places to use fmtlib for printing/formatting.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-31 19:03:14 +08:00
Tomasz Grabiec
99cb948eac direct_failure_detector: Avoid throwing exceptions in the success path
sleep_abortable() is aborted on success, which causes sleep_aborted
exception to be thrown. This causes scylla to throw every 100ms for
each pinged node. Throwing may reduce performance if happens often.

Also, it spams the logs if --logger-log-level exception=trace is enabled.

Avoid by swallowing the exception on cancellation.

Fixes #13278.

Closes #13279
2023-03-31 12:40:43 +02:00
Alejo Sanchez
81b40c10de test/pylib: RandomTables.add_column with value column
When adding extra columns in a test, make them value column. Name them
with the "v_" prefix and use the value column number counter.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13271
2023-03-31 11:19:49 +02:00
Alejo Sanchez
e3b462507d test/pylib: topology: support clusters of initial size 0
To allow tests with custom clusters, allow configuration of initial
cluster size of 0.

Add a proof-of-concept test to be removed later.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13342
2023-03-31 11:17:58 +02:00
Benny Halevy
56be654edc storage_service: get_ignore_dead_nodes_for_replace: make static and rename to parse_node_list
Let the caller pass the string to parse to the function
rather than the function itself get to it via _db.local().get_config()
so it could be used as a general purpose function.

Make it static now that it doesn't require an instance.

Rename to `parse_node_list` as that's what the function does.
It doesn't care if the nodes are to be ignored or something else
(e.g. removed), they only need to be in token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-31 10:20:17 +03:00
Kefu Chai
e107b31d23 test: sstable: remove unused class in sstable test
generation_for_sharded_test is not used by any of these sstable
tests, so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13388
2023-03-31 08:02:22 +03:00
Botond Dénes
f777916055 Merge 'Offstrategy keyspace compaction task' from Aleksandra Martyniuk
Task manager task implementations of classes that cover
offstrategy keyspace compaction which can be start through
/storage_service/keyspace_compaction/ api.

Top level task covers the whole compaction and creates child
tasks on each shard.

Closes #12713

* github.com:scylladb/scylladb:
  test: extend test_compaction_task.py to test offstrategy compaction
  compaction: create task manager's task for offstrategy keyspace compaction on one shard
  compaction: create task manager's task for offstrategy keyspace compaction
  compaction: create offstrategy_compaction_task_impl
2023-03-31 07:09:17 +03:00
Pavel Emelyanov
7d6ab5c84d code: Remove some headers from query_processor.hh
The forward_service.hh and raft_group0_client.hh can be replaced with
forward declarations. Few other files need their previously indirectly
included headers back.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13384
2023-03-31 07:08:41 +03:00
Tomasz Grabiec
4d6443e030 Merge 'Schema commitlog separate dir' from Gusev Petr
The commitlog api originally implied that the commitlog_directory would contain files from a single commitlog instance. This is checked in segment_manager::list_descriptors, if it encounters a file with an unknown prefix, an exception occurs in `commitlog::descriptor::descriptor`, which is logged with the `WARN` level.

A new schema commitlog was added recently, which shares the filesystem directory with the main commitlog. This causes warnings to be emitted on each boot. This patch solves the warnings problem by moving the schema commitlog to a separate directory. In addition, the user can employ the new `schema_commitlog_directory` parameter to move the schema commitlog to another disk drive.

This is expected to be released in 5.3.
As #13134 (raft tables->schema commitlog) is also scheduled for 5.3, and it already requires a clean rolling restart (no cl segments to replay), we don't need to specifically handle upgrade here.

Fixes: #11867

Closes #13263

* github.com:scylladb/scylladb:
  commitlog: use separate directory for schema commitlog
  schema commitlog: fix commitlog_total_space_in_mb initialization
2023-03-30 23:48:58 +02:00
Petr Gusev
0152c000bb commitlog: use separate directory for schema commitlog
The commitlog api originally implied that
the commitlog_directory would contain files
from a single commitlog instance. This is
checked in segment_manager::list_descriptors,
if it encounters a file with an unknown
prefix, an exception occurs in
commitlog::descriptor::descriptor, which is
logged with the WARN level.

A new schema commitlog was added recently,
which shares the filesystem directory with
the main commitlog. This causes warnings
to be emitted on each boot. This patch
solves the warnings problem by moving
the schema commitlog to a separate directory.
In addition, the user can employ the new
schema_commitlog_directory parameter to move
the schema commitlog to another disk drive.

By default, the schema commitlog directory is
nested in the commitlog_directory. This can help
avoid problems during an upgrade if the
commitlog_directory in the custom scylla.yaml
is located on a separate disk partition.

This is expected to be released in 5.3.
As #13134 (raft tables->schema commitlog)
is also scheduled for 5.3, and it already
requires a clean rolling restart (no cl
segments to replay), we don't need to
specifically handle upgrade here.

Fixes: #11867
2023-03-30 21:55:50 +04:00
Petr Gusev
f31bd26971 schema commitlog: fix commitlog_total_space_in_mb
initialization

It seems there was a typo here, which caused
commitlog_total_space_in_mb to always be zero
and the schema commitlog to be effectively
unlimited in size.
2023-03-30 21:55:50 +04:00
Botond Dénes
207dcbb8fa Merge 'sstables: prepare for uuid-based generation_type' from Benny Halevy
Preparing for #10459, this series defines sstables::generation_type::int_t
as `int64_t` at the moment and use that instead of naked `int64_t` variables
so it can be changed in the future to hold e.g. a `std::variant<int64_t, sstables::generation_id>`.

sstables::new_generation was defined to generation new, unique generations.
Currently it is based on incrementing a counter, but it can be extended in the future
to manufacture UUIDs.

The unit tests are cleaned up in this series to minimize their dependency on numeric generations.
Basically, they should be used for loading sstables with hard coded generation numbers stored under `test/resource/sstables`.

For all the rest, the tests should use existing and mechanisms introduced in this series such as generation_factory, sst_factory and smart make_sstable methods in sstable_test_env and table_for_tests to generate new sstables with a unique generation, and use the abstract sst->generation() method to get their generation if needed, without resorting the the actual value it may hold.

Closes #12994

* github.com:scylladb/scylladb:
  everywhere: use sstables::generation_type
  test: sstable_test_env: use make_new_generation
  sstable_directory::components_lister::process: fixup indentation
  sstables: make highest_generation_seen return optional generation
  replica: table: add make_new_generation function
  replica: table: move sstable generation related functions out of line
  test: sstables: use generation_type::int_t
  sstables: generation_type: define int_t
2023-03-30 17:05:07 +03:00
Pavel Emelyanov
92318fdeae Merge 'Initialize Wasm together with query_processor' from Wojciech Mitros
The wasm engine is moved from replica::database to the query_processor.
The wasm instance cache and compilation thread runner were already there,
but now they're also initialized in the query_processor constructor.

By moving the initialization to the constructor, we can now
be certain that all wasm-related objects (wasm instance cache,
compilation thread runner, and wasm engine, which was already
passed in the constructor) are initialized when we try to use
them because we have to use the query processor to access them
anyway.

The change is also motivated by the fact that we're planning
to take Wasm UDFs out of experimental, after which they should
stop getting special treatment.

Closes #13311

* github.com:scylladb/scylladb:
  wasm: move wasm initialization to query_processor constructor
  wasm: return wasm instance cache as a reference instead of a pointer
  wasm: move wasm engine to query_processor
2023-03-30 14:30:23 +03:00
Nadav Har'El
59ab9aac44 Merge 'functions: reframe aggregate functions in terms of scalar functions' from Avi Kivity
Currently, aggregate functions are implemented in a statefull manner.
The accumulator is stored internally in an aggregate_function::aggregate,
requiring each query to instantiate new instances (see
aggregate_function_selector's constructor, and note how it's called
from selector::new_instance()).

This makes aggregates hard to use in expressions, since expressions
are stateless (with state only provided to evaluate()). To facilitate
migration towards stateless expressions, we define a
stateless_aggregate_function (modeled after user-defined aggregates,
which are already stateless). This new struct defines the aggregate
in terms of three scalar functions: one to aggregate a new input into
an accumulator (provided in the first parameter), one to finalize an
accumulator into a result, and one to reduce two accumulators for
parallelized aggregation.

All existing native aggregate functions are converted to the new model, and
the old interface is removed. This series does not yet convert selectors to
expressions, but it does remove one of the obstacles.

Performance evaluation: I created a table with a million ints on a single-node cluster, and ran the avg() function on them. I measured the number of instructions executed with `perf stat -p $(pgrep scylla) -e instructions` while the query was running. The query executed from cache, memtables were flushed beforehand. The instruction count per row increased from roughly 49k to roughly 52k, indicating 3k extra instructions per row. While 3k instructions to execute a function is huge, it is currently dwarfed by other overhead (and will be even less important in a cluster where it CL>1 will cause non-coordinator code to run multiple times).

Closes #13105

* github.com:scylladb/scylladb:
  cql3/selection, forward_service: use use stateless_aggregate_function directly
  db: functions: fold stateless_aggregate_function_adapter into aggregate_function
  cql3: functions: simplify accumulator_for template
  cql3: functions: base user-defined aggregates on stateless aggregates
  cql3: functions: drop native_aggregate_function
  cql3: functions: reimplement count(column) statelessly
  cql3: functions: reimplement avg() statelessly
  cql3: functions: reimplement sum() statelessly
  cql3: functions: change wide accumulator type to varint
  cql3: functions: unreverse types for min/max
  cql3: functions: rename make_{min,max}_dynamic_function
  cql3: functions: reimplement min/max statelessly
  cql3: functions: reimplement count(*) statelessly
  cql3: functions: simplify creating native functions even more
  cql3: functions: add helpers for automating marshalling for scalar functions
  types: fix big_decimal constructor from literal 0
  cql3: functions: add helper class for internal scalar functions
  db: functions: add stateless aggregate functions
  db, cql3: move scalar_function from cql3/functions to db/functions
2023-03-30 13:58:47 +03:00
Aleksandra Martyniuk
306d44568f test: extend test_compaction_task.py to test offstrategy compaction 2023-03-30 10:52:27 +02:00
Aleksandra Martyniuk
8afa54d4f6 compaction: create task manager's task for offstrategy keyspace compaction on one shard
Implementation of task_manager's task that covers local offstrategy keyspace compaction.
2023-03-30 10:49:09 +02:00
Aleksandra Martyniuk
73860b7c9d compaction: create task manager's task for offstrategy keyspace compaction
Implementation of task_manager's task covering offstrategy keyspace compaction
that can be started through storage_service api.
2023-03-30 10:44:56 +02:00
Aleksandra Martyniuk
e8ef8a51d5 compaction: create offstrategy_compaction_task_impl
offstrategy_compaction_task_impl serves as a base class of all
concrete offstrategy compaction task classes.
2023-03-30 10:28:17 +02:00
Nadav Har'El
32fff17e19 Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes
`scylla-sstable` currently has two ways to obtain the schema:
* via a `schema.cql` file.
* load schema definition from memory (only works for system tables).

This meant that for most cases it was necessary to export the schema into a `CQL` format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable *is* inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a `schema.cql` is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.

This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.

Example:
```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```
As seen above, subdirectories like `qurantine`, `staging` etc are also supported.

Fixes: https://github.com/scylladb/scylladb/issues/10126

Closes #13075

* github.com:scylladb/scylladb:
  docs/operating-scylla/admin-tools: scylla-sstable.rst: update schema section
  test/cql-pytest: test_tools.py: add test for schema loading
  test/cql-pytest: nodetool.py: add flush_keyspace()
  tools/scylla-sstable: reform schema loading mechanism
  tools/schema_loader: add load_schema_from_schema_tables()
  db/schema_tables: expose types schema
2023-03-30 09:35:59 +03:00
Pavel Emelyanov
886a1392a8 sstables: Move writer flush into close (and remove it)
Writing into sstable component output stream should be done with care.
In particular -- flushing can happen only once right before closing the
stream. Flushing the stream in between several writes is not going to
work, because file stream would step on unaligned IO and S3 upload
stream would send completion message to the server and would lose any
subsequent write.

Having said that, it's better to remove the flush() ability from the
component writer not to tempt the developers.

refs: #13320

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-30 09:34:04 +03:00
Pavel Emelyanov
77169e2647 sstables: Relax exception handling in do_write_simple
This effectively reverts 000514e7cc (sstable: close file_writer if an
exception in thrown) because it became obsoleted by 60873d2360 (sstable:
file_writer: auto-close in destructor).

The change is in fact idempotent.

Before the patch writer was closed regardless of write/flush failing or
not. After the patch writer will close itself in destrictor for sure.

Before the patch an exception from write/flush was caught, then close
was called and regardless of close failed or not the former exception
was re-thrown. After the patch an exception from write/flush will result
inin writer destruction that would ignore close exception (if any).

Before the patch throwing close after successfull write/flush re-threw
the close exception. After the patch writer will be closed "by hand" and
any exception will be reported.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-30 09:32:56 +03:00
Botond Dénes
164afe14ad Merge 'compound_compat: replace operator<<(..) with fmt formatter ' from Kefu Chai
this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `composite` and `composite_view` without using ostream<<. also, this change removes `operator<<(ostream, const composite&)` ,  `operator<<(ostream, const composite_view&)` along with their callers.

Refs #13245

Closes #13360

* github.com:scylladb/scylladb:
  compound_compat: remove operator<<(ostream, composite)
  compound_compat: remove operator<<(ostream, composite_view)
  sstables: do not use operator<< to print composite_view
  compound_compat.hh: specialize fmt::formatter<composite>
  compound_compat.hh: specialize fmt::formatter<composite_view>
  compound_compat.hh: specialize fmt::formatter<component_view>
2023-03-30 08:47:17 +03:00
Botond Dénes
972b24a969 Merge 'Break the proxy -> database -> [views] -> proxy loop' from Pavel Emelyanov
... and drop usage of global storage proxy from several places of mutate_MV().

This is the last dependency loop around storage proxy left as long as the last user of the global storage proxy. The trouble is that while proxy naturally depends on database, the database SUDDENLY requires proxy to push view updates from the guts of database::do_apply().

Similar loop existed in a form of database -> { large_data_handler, compaction manager } -> system keyspace -> database and it was cut in 917fdb9e53 (Cut database-system_keyspace circular dependency) by introducing a soft dependency link from l. d. handler / compaction manager to system keyspace. The similar solution is proposed here.

The database instance gets a soft dependency (shared_ptr) to view_update_generator instance. On start the link is nullptr and pushing view updates is not possible until view_updates_generator starts and plugs itself to the database. The plugging happens naturally, because v.u.generator needs proxy as explicit dependency and, thus, can reach database via proxy. This (seems to) works because tables that need view updates don't start being mutated until late enough, as late as v.u.generator starts.

As a nice side effect this allows removing a bunch of global storage proxy usages from mutate_MV() which opens a pretty  short way towards de-globalizing proxy (after it only qctx, tracing and schema registry will be left).

Closes #13367

* github.com:scylladb/scylladb:
  view: Drop global storage_proxy usage from mutate_MV()
  view: Make mutate_MV() method of view_update_generator
  table: Carry v.u.generator down to populate_views()
  table: Carry v.u.generator down to do_push_view_replica_updates()
  view: Keep v.u.generator shared pointer on view_builder::consumer
  view: Capture v.u.generator on view_updating_consumer lambda
  view: Plug view update generator to database
  view: Add view_builder -> view_update_generator dependency
  view: Add view_update_generator -> sharded<storage_proxy> dependency
2023-03-30 08:29:29 +03:00
Takuya ASADA
160c184d0b scylla_kernel_check: suppress verbose iotune messages
Stop printing verbose iotune messages while the check, just print error
message.

Fixes #13373.

Closes #13362
2023-03-30 07:30:07 +03:00
Pavel Emelyanov
9a66174a94 Merge 'config: make query timeouts live update-able' from Kefu Chai
in this change, following query timeouts config options are marked live update-able:

- range_request_timeout_in_ms
- read_request_timeout_in_ms
- counter_write_request_timeout_in_ms
- cas_contention_timeout_in_ms
- truncate_request_timeout_in_ms
- write_request_timeout_in_ms
- request_timeout_in_ms

as per https://github.com/scylladb/scylladb/issues/10172,

> Many users would like to set the driver timers based on server timers.
> For example: expire a read timeout before or after the server read time
> out.

with this change, we are able to set the timeouts on the fly. these timeout options specify how long coordinator waits for the completion of different kinds of operations. but these options are cached by the servers consuming them, so in this series, helpers are added to update the cached values when the options gets modified. also, since the observers are not copyable, sharded_parameter is used to initialize the config when creating these sharded servers.

Fixes #12232
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12531

* github.com:scylladb/scylladb:
  timeout_config: remove unused make_timeout_config()
  client_state: split the param list of ctor into multi lines
  redis,thrift,transport: make timeout_config live-updateable
  config: mark query timeouts live update-able
  transport: mark cql_server::timeout_config() const
  auth: remove unused forward declaration
  redis: drop unused member function
  transport: drop unused member function
  thrift: keep a reference of timeout_config in handler_factory
  redis,thrift,transport: initialize _config with std::move(config)
  redis,thrift,transport: pass config via sharded_parameter
  utils: config_file: add a space after `=`
2023-03-29 19:38:26 +03:00
Kefu Chai
4670ba90e5 scripts: remove git-archive-all
since we don't build the rpm/deb packages from source tarball anymore,
instead we build the rpm/deb packages from precompiled relocatable
package. there is no need to keep git-archive-all in the repo. in this
change, the git-archive-all script and its license file are removed.
they were added for building rpm packages from source tarball in
f87add31a7.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13372
2023-03-29 18:59:23 +03:00
Avi Kivity
472b155d76 Merge 'Allow each compaction group to have its own compaction strategy state' from Raphael "Raph" Carvalho
This is important for multiple compaction groups, as they cannot share state that must span a single SSTable set.

The solution is about:

1) Decoupling compaction strategy from its state; making compaction_strategy a pure stateless entity
2) Each compaction group storing its own compaction strategy state
3) Compaction group feeds its state into compaction strategy whenever needed

Closes #13351

* github.com:scylladb/scylladb:
  compaction: TWCS: wire up compaction_strategy_state
  compaction: LCS: wire up compaction_strategy_state
  compaction: Expose compaction_strategy_state through table_state
  replica: Add compaction_strategy_state to compaction group
  compaction: Introduce compaction_strategy_state
  compaction: add table_state param to compaction_strategy::notify_completion()
  compaction: LCS: extract state into a separate struct
  compaction: TWCS: prepare for stateless strategy
  compaction: TWCS: extract state into a separate struct
  compaction: add const-qualifier to a few compaction_strategy methods
2023-03-29 18:57:11 +03:00
Pavel Emelyanov
cc262d814b view: Drop global storage_proxy usage from mutate_MV()
Now the mutate_MV is the method of v.u.generator which has reference to
the sharded<storage_proxy>. Few helper static wrappers are patched to
get the needed proxy or database reference from the mutate_MV call.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 18:48:14 +03:00
Pavel Emelyanov
7cabdc54a6 view: Make mutate_MV() method of view_update_generator
Nowadays its a static helper, but internally it depends on storage
proxy, so it grabs its global instance. Making it a method of view
update generator makes it possible to use the proxy dependency from the
generator.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 18:48:14 +03:00
Pavel Emelyanov
e78e64a920 table: Carry v.u.generator down to populate_views()
The method is called by view_builder::consumer when building a view and
the consumer already has stable dependency reference on the view updates
generator.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 18:48:13 +03:00
Botond Dénes
bae62f899d mutation/mutation_compactor: consume_partition_end(): reset _stop
The purpose of `_stop` is to remember whether the consumption of the
last partition was interrupted or it was consumed fully. In the former
case, the compactor allows retreiving the compaction state for the given
partition, so that its compaction can be resumed at a later point in
time.
Currently, `_stop` is set to `stop_iteration::yes` whenever the return
value of any of the `consume()` methods is also `stop_iteration::yes`.
Meaning, if the consuming of the partition is interrupted, this is
remembered in `_stop`.
However, a partition whose consumption was interrupted is not always
continued later. Sometimes consumption of a partitions is interrputed
because the partition is not interesting and the downstream consumer
wants to stop it. In these cases the compactor should not return an
engagned optional from `detach_state()`, because there is not state to
detach, the state should be thrown away. This was incorrectly handled so
far and is fixed in this patch, but overwriting `_stop` in
`consume_partition_end()` with whatever the downstream consumer returns.
Meaning if they want to skip the partition, then `_stop` is reset to
`stop_partition::no` and `detach_state()` will return a disengaged
optional as it should in this case.

Fixes: #12629

Closes #13365
2023-03-29 17:48:45 +03:00
Aleksandra Martyniuk
0ceee3e4b3 compaction: use compaction namespace in compaction_manager.cc 2023-03-29 15:28:14 +02:00
Takuya ASADA
497dd7380f create-relocatable-package.py: stop using filter function on tools
We introduced exclude_submodules at 19da4a5b8f
to exclude tools/java and tools/jmx since they have their own
relocatable packages, so we don't want to package same files twice.

However, most of the files under tools/ are not needed for installation,
we just need tools/scyllatop.
So what we really need to do is "ar.reloc_add('tools/scyllatop')", not
excluding files from tools/.

related with #13183

Closes #13215
2023-03-29 16:23:43 +03:00
Aleksandra Martyniuk
d7d570e39d compaction: rename compaction::task
To avoid confusion with task manager tasks compaction::task is renamed
to compaction::compaction_task_exector. All inheriting classes are
modified similarly.
2023-03-29 15:23:18 +02:00
Aleksandra Martyniuk
f24391fbe4 compaction: move compaction_manager::task out of compaction manager
compaction_manager::task needs to be accessed from task manager compaction
tasks. Thus, compaction_manager::task and all inheriting classes are moved
from compaction manager to compaction namespace.
2023-03-29 15:21:24 +02:00
Wojciech Mitros
cfd2a4588d wasm: move wasm initialization to query_processor constructor
By moving the initialization to the constructor, we can now
be certain that all wasm-related objects (wasm instance cache,
compilation thread runner, and wasm engine, which was already
passed in the constructor) are initialized when we try to use
them because we have to use the query processor to access them
anyway.

The change is also motivated by the fact that we're planning
to take Wasm UDFs out of experimental, after which they should
stop getting special treatment.
2023-03-29 14:55:36 +02:00
Aleksandra Martyniuk
37cafec9d5 compaction: move sstable_task definition to source file 2023-03-29 14:53:43 +02:00
Botond Dénes
72772d5072 Merge 'auth: replace operator<<(..) with fmt formatter' from Kefu Chai
this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `authenticated_user` without using ostream<<. also, this change removes all existing callers of `operator<<(ostream, const authenticated_user&)`.

Refs #13245

Closes #13359

* github.com:scylladb/scylladb:
  auth: drop operator<<(ostream, authenticated_user)
  cql3: do not use operator<< to print authenticated_user
  auth: specialize fmt::formatter<authenticated_user>
2023-03-29 15:24:07 +03:00
Kefu Chai
0b7c345bec timeout_config: remove unused make_timeout_config()
it is replaced by the ctor of updateable_timeout_config, so it does not
have any callers now. let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:17:45 +08:00
Kefu Chai
98b9cbbc92 client_state: split the param list of ctor into multi lines
it is 215-chars long, so let's breaks it into multiple lines for
better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:17:45 +08:00
Kefu Chai
ebf5e138e8 redis,thrift,transport: make timeout_config live-updateable
* timeout_config
  - add `updated_timeout_config` which represents an always-updated
    options backed by `utils::updateable_value<>`. this class is
    used by servers which need to access the latest timeout related
    options. the existing `timeout_config` is more like a snapshot
    of the `updated_timeout_config`. it is used in the use case where
    we don't need to most updated options or we update the options
    manually on demand.
* redis, thrift, transport: s/timeout_config/updated_timeout_config/
  when appropriate. use the improved version of timeout_config where
  we need to have the access to the most-updated version of the timeout
  options.

Fixes #10172
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:17:45 +08:00
Kefu Chai
11cea36c12 docs: dev: write mathematical expressions in LaTeX
for better readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13341
2023-03-29 15:07:14 +03:00
Kefu Chai
f789d8d3cd config: mark query timeouts live update-able
in this change, following query timeouts config options are marked
live update-able:

- range_request_timeout_in_ms
- read_request_timeout_in_ms
- counter_write_request_timeout_in_ms
- cas_contention_timeout_in_ms
- truncate_request_timeout_in_ms
- write_request_timeout_in_ms
- request_timeout_in_ms

as per https://github.com/scylladb/scylladb/issues/10172,

> Many users would like to set the driver timers based on server timers.
> For example: expire a read timeout before or after the server read time
> out.

with this change, these options are *marked* live-updateable, but since
they are cached by their consumers locally, so we will have another commit
to update the local copies when these options get updated.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:02 +08:00
Kefu Chai
1cc28679bc transport: mark cql_server::timeout_config() const
this function returns a const reference to member variable, so we
can mark it with the `const` specifier for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:02 +08:00
Kefu Chai
ca83dc0101 auth: remove unused forward declaration
`timeout_config` is not used by auth/common.hh. presumably, this
class is not a public interface exposed by auth, as it is not
inherently related auth. timeout_config is a shared setting across
related services, specifically, redis_server, thrift and cql_server.
so, in this change, let's drop this forward declaration.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:02 +08:00
Kefu Chai
9a159445f0 redis: drop unused member function
now that `redis_server::connection::timeout_config()` and
`redis_server::timeout_config()` are used nowhere, let's drop them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:02 +08:00
Kefu Chai
d72ab78ffd transport: drop unused member function
since `cql_server::connection::timeout_config()` is used nowhere,
let's just drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:02 +08:00
Kefu Chai
fec35b97ad thrift: keep a reference of timeout_config in handler_factory
this change should keep the timeout settings of handler_factory sync'ed
with the ones used by `thrift_server`. so far, the `timeout_config`
instance in `thrift_server` is not live-updateable, but in a follow-up
change, we will make it so. so, this change prepares the handler_factory
for a live-updateable timeout_config.

instead keeping a snapshot of the timeout_config, keep a reference of
it in handler_factory. the reference points to `thrift_server::_config`.
so despite that `thrift_server::_handler_factory` is a shared_ptr,
the member variable won't outlive its container, as the only reason to
have it as a shared_ptr is to appease the ctor of
`CassandraAsyncProcessorFactory`. and the constructed
`_processor_factory` is also a member variable of `thrift_server`, so we
won't take the risk of a dangling reference held by `handler_factory`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:02 +08:00
Kefu Chai
c642ca9e73 redis,thrift,transport: initialize _config with std::move(config)
instead of copying the `config` parameter, move away from it.

this change also prepares for a non-copyable config. if the class
of `config` is not copyable, we will not be able to initialize
the member variable by copying from the given `config` parameter.
after the live-updateable config change, the `_config` member
variable will contain instances of utils::observer<>, which is
not copyable, but is move-constructable, hence in this change,
we just move away from the give `config`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:02 +08:00
Kefu Chai
e0ac2eb770 redis,thrift,transport: pass config via sharded_parameter
* pass config via sharded_parameter
* initialize config using designated initializer

this change paves the road to servers with live-updateable timeout
options.

before this change, the servers initialize a domain specific combo
config, like `redis_server_config`,  with the same instance of a
timeout_config, and pass the combox config as a ctor parameter to
construct each sharded service instance. but this design assumes
the value semantic of the config class, say, it should be copyable.
but if we want to use utils::updateable_value<> to get updated
option values, we would have to postpone the instantiation of the
config until the sharded service is about to be initialized.

so, in this change, instead of taking a domain specific config created
before hand, all services constructed with a `timeout_config` will
take a `sharded_parameter()` for creating the config. also, take
this opportunity to initialize the config using designated initializer.
for two reasons:

* less repeatings this way. we don't have to repeat the variable
  name of the config being initialized for each member variable.
* prepare for some member variables which do not have a default
  constructor. this applies to the timeout_config's updater which
  will not have a default constructor, as it should be initialized
  by db::config and a reference to the timeout_config to be updated.

we will update the `timeout_config` side in a follow-up commit.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 20:06:00 +08:00
Kefu Chai
99bf8bc0f4 bytes, gms: s/format_to/fmt::format_to/
to disambiguate `fmt::format_to()` from `std::format_to()`. turns out,
we have `using namespace std` somewhere in the source tree, and with
libstdc++ shipped by GCC-13, we have `std::format_to()`, so without
exactly which one to use, compiler complains like

```
/optimized_clang/stage-1-X86/build/bin/clang++ -MD -MT build/dev/mutation/mutation.o -MF build/dev/mutation/mutation.o.d -I/optimized_clang/scylla-X86/seastar/include -I/optimized_clang/scylla-X86/build/dev/seastar/gen/include -U_FORTIFY_SOURCE -DSEASTAR_SSTRING -Werror=unused-result -fstack-clash-protection -DSEASTAR_API_LEVEL=6 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_TYPE_ERASE_MORE -DFMT_SHARED -I/usr/include/p11-kit-1   -ffile-prefix-map=/optimized_clang/scylla-X86=. -march=westmere -DDEVEL -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSCYLLA_ENABLE_ERROR_INJECTION -O2 -DSCYLLA_BUILD_MODE=dev -iquote. -iquote build/dev/gen --std=gnu++20  -ffile-prefix-map=/optimized_clang/scylla-X86=. -march=westmere  -DBOOST_TEST_DYN_LINK   -DNOMINMAX -DNOMINMAX -fvisibility=hidden  -Wall -Werror -Wno-mismatched-tags -Wno-tautological-compare -Wno-parentheses-equality -Wno-c++11-narrowing -Wno-missing-braces -Wno-ignored-attributes -Wno-overloaded-virtual -Wno-unused-command-line-argument -Wno-unsupported-friend -Wno-delete-non-abstract-non-virtual-dtor -Wno-braced-scalar-init -Wno-implicit-int-float-conversion -Wno-delete-abstract-non-virtual-dtor -Wno-psabi -Wno-narrowing -Wno-nonnull -Wno-uninitialized -Wno-error=deprecated-declarations -DXXH_PRIVATE_API -DSEASTAR_TESTING_MAIN -DFMT_DEPRECATED_OSTREAM  -c -o build/dev/mutation/mutation.o mutation/mutation.cc
In file included from mutation/mutation.cc:9:
In file included from mutation/mutation.hh:13:
In file included from mutation/mutation_partition.hh:21:
In file included from ./schema/schema_fwd.hh:13:
In file included from ./utils/UUID.hh:22:
./bytes.hh:116:21: error: call to 'format_to' is ambiguous
                    format_to(out, "{}{:02x}", _delimiter, std::byte(v[i]));
                    ^~~~~~~~~
./bytes.hh:134:43: note: in instantiation of function template specialization 'fmt::formatter<fmt_hex>::format<fmt::basic_format_context<fmt::appender, char>>' requested here
        return fmt::formatter<::fmt_hex>::format(::fmt_hex(bytes_view(s)), ctx);
                                          ^
/usr/include/fmt/core.h:813:64: note: in instantiation of function template specialization 'fmt::formatter<seastar::basic_sstring<signed char, unsigned int, 31, false>>::format<fmt::basic_format_context<fmt::appender, char>>' requested here
    -> decltype(typename Context::template formatter_type<T>().format(
                                                               ^
/usr/include/fmt/core.h:824:10: note: while substituting deduced template arguments into function template 'has_const_formatter_impl' [with Context = fmt::basic_format_context<fmt::appender, char>, T = seastar::basic_sstring<signed char, unsigned int, 31, false>]
  return has_const_formatter_impl<Context>(static_cast<T*>(nullptr));
```

to address this FTBFS, let's be more explicit by adding "fmt::" to
specify which `format_to()` to use.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13361
2023-03-29 14:47:28 +03:00
Kefu Chai
ea2badb25f utils: config_file: add a space after =
for better readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 19:22:21 +08:00
Pavel Emelyanov
a95d3446fd table: Carry v.u.generator down to do_push_view_replica_updates()
The latter is the place where mutate_MV is called and it needs the
view updates generator nearby.

The call-stack starts at database::do_apply(). As was described in one
of the previous patches, applying mutations that need updating views
happen late enough, so if the view updates generator is not plugged to
the database yet, it's OK to bail out with exception. If it's plugged,
it's carried over thus keeping the generator instance alive and waited
for on its stop.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 14:12:01 +03:00
Pavel Emelyanov
ddc8c8b019 view: Keep v.u.generator shared pointer on view_builder::consumer
This is another mutations consumer that pushes view updates forward and
thus also needs the view updates generator pointer. It gets one from the
view builder that already has the dependency on generator.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 14:11:30 +03:00
Pavel Emelyanov
2652dffd89 view: Capture v.u.generator on view_updating_consumer lambda
The consumer is in fact pushing the updates and _that_'s the component
that would really need the view_update_generator at hand. The consumer
is created from the generator itself so no troubles getting the pointer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 14:10:55 +03:00
Pavel Emelyanov
d5557ef0e2 view: Plug view update generator to database
The database is low-level service and currently view update generator
implicitly depend on it via storage proxy. However, database does need
to push view updates with the help of mutate_MV helper, thus adding the
dependency loop.

This patch exploits the fact that view updates start being pushed late
enough, by that time all other service, including proxy and view update
generator, seem to be up and running. This allows a "weak dependency"
from database to view update generator, like there's one from database
to system keyspace already.

So in this patch the v.u.g. puts the shared-from-this pointer onto the
database at the time it starts. On stop it removes this pointer after
database is drained and (hopefully) all view updates are pushed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 14:09:49 +03:00
Pavel Emelyanov
3455b1aed8 view: Add view_builder -> view_update_generator dependency
The builder will need generator for view_builder::consumer in one of the
next patches.

The builder is a standalone service that starts one of the latest and no
other services need builder as their dependency.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 14:08:47 +03:00
Pavel Emelyanov
3fd12d6a0e view: Add view_update_generator -> sharded<storage_proxy> dependency
The generator will be responsible for spreading view updates with the
help of mutate_MV helper. The latter needs storage proxy to operate, so
the generator gets this dependency in advance.

There's no need to change start/stop order at the moment, generator
already starts after and stops before proxy. Also, services that have
generator as dependency are not required by proxy (even indirectly) so
no circular dependency is produced at this point.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-29 14:08:47 +03:00
Kefu Chai
c307c60d04 scripts: correct a typo in comment
s/refreh/refresh/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13357
2023-03-29 13:44:47 +03:00
Kefu Chai
55a8b50bbd release: correct a typo in comment
s/to levels of indirection/two levels of indirection/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13358
2023-03-29 13:42:38 +03:00
Kefu Chai
dfb55975fc Update tools/jmx submodule
this helps to use OpenJDK 11 instead of OpenJDK 8 for running scylla-jmx,
in hope to alleviate the pain of the crashes found in the JRE shipped along
with OpenJDK 8, as it is aged, and only security fixes are included now.

* tools/jmx 88d9bdc...48e1699 (3):
  > Merge 'dist/redhat: support jre 11 instead of jre 8' from Kefu Chai
  > install.sh: point java to /usr/bin/java
  > Merge 'use OpenJDK 11 instead of OpenJDK 8' from Kefu Chai

Refs https://github.com/scylladb/scylla-jmx/issues/194

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13356
2023-03-29 13:00:40 +03:00
Kefu Chai
57f51603dc compound_compat: remove operator<<(ostream, composite)
since we don't have any callers of this operator, let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 16:13:59 +08:00
Kefu Chai
212641abda compound_compat: remove operator<<(ostream, composite_view)
since we don't have any callers of this operator, let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 16:13:59 +08:00
Kefu Chai
cdb972222e sstables: do not use operator<< to print composite_view
this change removes the last two callers of `operator<<(ostream&, const composite_view&)`,
it paves the road to remove this operator.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 16:13:59 +08:00
Kefu Chai
1ef8f63b4e compound_compat.hh: specialize fmt::formatter<composite>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `composite` with the help of fmt::ostream.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 16:13:59 +08:00
Kefu Chai
28cabd0a1f compound_compat.hh: specialize fmt::formatter<composite_view>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `composite::composite_view` with the help of fmt::ostream.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 16:13:59 +08:00
Kefu Chai
15eac8c4cd compound_compat.hh: specialize fmt::formatter<component_view>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `composite::component_view` with the help of fmt::ostream.

in this change, '#' is used to add 0x prefix. as fmtlib allows us to add
'0x' prefix using '#' format specifier when printing numbers using 'x'
as its type specifier. see https://fmt.dev/latest/syntax.html

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 16:13:10 +08:00
Kefu Chai
5a9b4c02e3 auth: drop operator<<(ostream, authenticated_user)
since we don't have any callers of this operator, let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 16:02:29 +08:00
Kefu Chai
85c89debe6 cql3: do not use operator<< to print authenticated_user
this change removes the last two callers of `operator<<(ostream&, const authenticated_user&)`,
it paves the road to remove this operator.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 16:02:29 +08:00
Kefu Chai
a7037ae0f4 auth: specialize fmt::formatter<authenticated_user>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `auth::authenticated_user` with the help of fmt::ostream.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-29 16:02:29 +08:00
David Garcia
f45c4983db docs: update theme 1.4
Closes #13346
2023-03-29 06:56:27 +03:00
Avi Kivity
6977df5539 cql3/selection, forward_service: use use stateless_aggregate_function directly
Now that stateless_aggregate_function is directly exposed by
aggregate_function, we can use it directly, avoiding the intermediary
aggregate_function::aggregate, which is removed.
2023-03-28 23:49:34 +03:00
Avi Kivity
58eb21aa5d db: functions: fold stateless_aggregate_function_adapter into aggregate_function
Now that all aggregate functions are derived from
stateless_aggregate_function_adapter, we can just fold its functionality
into the base class. This exposes stateless_aggregate_function to
all users of aggregate_function, so they can begin to benefit from
the transformation, though this patch doesn't touch those users.

The aggregate_function base class is partiallly devirtualized since
there is just a single implementation now.
2023-03-28 23:47:11 +03:00
Avi Kivity
68529896aa cql3: functions: simplify accumulator_for template
The accumulator_for template is used to select the accumulator
type for aggregates. After refactoring, all that is needed from
it is to select the native type, so remove all the excess code.
2023-03-28 23:47:11 +03:00
Avi Kivity
4ea3136026 cql3: functions: base user-defined aggregates on stateless aggregates
Since the model for stateless aggregates was taken from user
defined aggregates, the conversion is trivial.
2023-03-28 23:47:11 +03:00
Avi Kivity
f2715b289a cql3: functions: drop native_aggregate_function
Now that all aggregates are implemented staetelessly,
native_aggregate_function no longer has subclasses, so drop it.
2023-03-28 23:47:11 +03:00
Avi Kivity
6bceb25982 cql3: functions: reimplement count(column) statelessly
Note that we don't use the automarshalling helper for the
aggregation function, since it doesn't work for compound
types.
2023-03-28 23:47:11 +03:00
Avi Kivity
4f2cdace9a cql3: functions: reimplement avg() statelessly 2023-03-28 23:47:11 +03:00
Avi Kivity
b0a8fd3287 cql3: functions: reimplement sum() statelessly 2023-03-28 23:47:11 +03:00
Avi Kivity
d21d11466a cql3: functions: change wide accumulator type to varint
Currently, we use __int128, but this has no direct counterpart
in CQL, so we can't express the accumulator type as part of a
CQL scalar function. Switch to varint which is a superset, although
slower.
2023-03-28 23:47:11 +03:00
Avi Kivity
3252dc0172 cql3: functions: unreverse types for min/max
Currently it works without this, but later unreversing will
be removed from another part of the stack, causing min/max
on reversed types to return incorrect results. Anticipate
that an unreverse the types during construction.
2023-03-28 23:47:09 +03:00
Avi Kivity
ed466b7e68 cql3: functions: rename make_{min,max}_dynamic_function
There's no longer a statically-typed variant, so no need
to distinguish the dynamically-typed one.
2023-03-28 23:37:49 +03:00
Wojciech Mitros
c9b701b516 wasm: return wasm instance cache as a reference instead of a pointer
In an incoming change, the wasm instance cache will be modified to be owned
by the query_processor - it will hold an optional instead of a raw
pointer to the cache, so we should stop returning the raw pointer
from the getter as well.
Consequently, the cache is also stored as a reference in wasm::cache,
as it gets the reference from the query_processor.
For consistency with the wasm engine and the wasm alien thread runner,
the name of the getter is also modified to follow the same pattern.
2023-03-28 18:18:48 +02:00
Wojciech Mitros
60c99b4c47 wasm: move wasm engine to query_processor
The wasm engine is used for compiling and executing Wasm UDFs, so
the query_processor is a more appropriate location for it than
replica::database, especially because the wasm instance cache
and the wasm alien thread runner are already there.

This patch also reduces the number of wasm engines to 1, shared by
all shards, as recommended by the wasmtime developers.
2023-03-28 17:41:30 +02:00
Calle Wilund
6525209983 alternator/rest api tests: Remove name assumption and rely on actual scylla info
Fixes #13332
The tests user the discriminator "system" as prefix to assume keyspaces are marked
"internal" inside scylla. This is not true in enterprise universe (replicated key
provider). It maybe/probably should, but that train is sailing right now.

Fix by removing one assert (not correct) and use actual API info in the alternator
test.

Closes #13333
2023-03-28 15:41:23 +03:00
Raphael S. Carvalho
989afbf83b compaction: TWCS: wire up compaction_strategy_state
TWCS no longer keeps internal state, and will now rely on state
managed by each compaction group through compaction::table_state.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-28 08:48:15 -03:00
Raphael S. Carvalho
233fe6d3dc compaction: LCS: wire up compaction_strategy_state
LCS no longer keeps internal state, and will now rely on state
managed by each compaction group through compaction::table_state.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-28 08:48:15 -03:00
Raphael S. Carvalho
2186a75e9b compaction: Expose compaction_strategy_state through table_state
That will allow compaction_strategy to access the compaction group state
through compaction::table_state, which is the interface at which replica
talks to the compaction layer.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-28 08:48:10 -03:00
Botond Dénes
b6c022a142 Merge 'cmake: sync with configure.py (15/n)' from Kefu Chai
this is the 15th changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals:
    - to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience
    - to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules.

this changeset includes following changes:

 - build: cmake: add two missing tests
 - build: cmake: port more cxxflags from configure.py

Closes #13262

* github.com:scylladb/scylladb:
  build: cmake: add missing source files to idl and service
  build: cmake: port more cxxflags from configure.py
  build: cmake: add two missing tests
2023-03-28 09:16:38 +03:00
Botond Dénes
88c5b2618c Merge 'Get rid of global variable "load_prio_keyspaces" (step 1)' from Calle Wilund
The concept is needed by enterprise functionality, but in the hunt for globals this sticks out and should be removed.
This is also partially prompted by the need to handle the keyspaces in the above set special on shutdown as well as startup. I.e. we need to ensure all user keyspaces are flushed/closed earlier then these. I.e. treat as "system" keyspace for this purpose.

These changes adds a "extension internal" keyspace set instead, which for now (until enterprise branches are updated) also included the "load_prio" set. However, it changes distributed loader to use the extension API interface instead, as well as adds shutdown special treatment to replica::database.

Closes #13335

* github.com:scylladb/scylladb:
  datasbase: Flush/close "extension internal" keyspaces after other user ks
  distributed_loader: Use extensions set of "extension internal" keyspaces
  db::extentions: Add "extensions internal" keyspace set
2023-03-28 08:35:10 +03:00
Kefu Chai
fcee7f7ac9 reloc: silence warning from readelf
we've been seeing errors like

```
10:39:36  gdb-add-index: [Was there no debuginfo? Was there already an index?]
10:39:36  readelf: /jenkins/workspace/scylla-master/next/scylla/build/dist/debug/redhat/BUILDROOT/scylla-5.3.0~dev-0.20230321.0f97d464d32b.x86_64/usr/lib/debug/opt/scylladb/libreloc/libc.so.6-5.3.0~dev-0.20230321.0f97d464d32b.x86_64.debug: Error: Unable to find program interpreter name
```

when strip.sh is processing *.debug elf images. this is caused by a
known issue, see
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1012107 . and this
error is not fatal. but it is very distracting when we are trying to
find errors in jenkins logging messages.

so, in this change, the stderr output from readelf is muted for higher
signal-noise ratio in the build logging message.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13267
2023-03-28 08:29:37 +03:00
Anna Stuchlik
4435b8b6f1 doc: elaborate on Scylla admin REST API - V2
This is V2 of https://github.com/scylladb/scylladb/pull/11849

This commit addes more information about ScyllaDB's REST API,
including and example for Docker and a screenshot of
the Swagger UI.

Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>

Closes #13331
2023-03-28 08:27:09 +03:00
Botond Dénes
9a024f72c4 Merge 'thrift: return address in listen_addresses() only after server is ready' from Marcin Maliszkiewicz
This is used for readiness API: /storage_service/rpc_server and the fix prevents from returning 'true' prematurely.

Some improvement for readiness was added in a51529dd15 but thrift implementation wasn't fully done.

Fixes https://github.com/scylladb/scylladb/issues/12376

Closes #13319

* github.com:scylladb/scylladb:
  thrift: return address in listen_addresses() only after server is ready
  thrift: simplify do_start_server() with seastar:async
2023-03-28 08:26:16 +03:00
Botond Dénes
60240e6d91 Merge 'bytes, gms: replace operator<<(..) with fmt formatter' from Kefu Chai
this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `bytes` and `gms::inet_address` without using ostream<<. also, this change removes all existing callers of `operator<<(ostream, const bytes &)` and `operator<<(ostream, const gms::inet_address&)`.

`gms::inet_address` related changes are included here in hope to demonstrate the usage of delimiter specifier of `fmt_hex` 's formatter.

Refs #13245

Closes #13275

* github.com:scylladb/scylladb:
  gms/inet_address: implement operator<< using fmt::formatter
  treewide: use fmtlib to format gms::inet_address
  gms/inet_address: specialize fmt::formatter<gms::inet_address>
  bytes: implement formatting helpers using formatter
  bytes: specialize fmt::formatter<bytes>
  bytes: specialize fmt::formatter<fmt_hex>
  bytes: mark fmt_hex::v `const`
2023-03-28 08:25:41 +03:00
Botond Dénes
b22f8c6d13 Merge 'Adjust repair module to other task manager modules' conventions' from Aleksandra Martyniuk
Files with task manager repair module and related classes
are modified to be consistent with task manager compaction module.

Closes #13231

* github.com:scylladb/scylladb:
  repair: rename repair_module
  repair: add repair namespace to repair/task_manager_module.hh
  repair: rename repair_task.hh
2023-03-28 08:24:42 +03:00
Raphael S. Carvalho
ee89ff24f2 replica: Add compaction_strategy_state to compaction group
The state is not wired anywhere yet. It will replice the ones
stored in compaction strategies themselves. Therefore, allowing
each compaction group to have its own state.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-27 15:46:14 -03:00
Raphael S. Carvalho
25f73a4181 compaction: Introduce compaction_strategy_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-27 15:46:11 -03:00
Raphael S. Carvalho
1ffe2f04ef compaction: add table_state param to compaction_strategy::notify_completion()
once compaction_strategy is made staless, the state must be retrieved
in notify_completion() through table_state.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-27 13:40:02 -03:00
Raphael S. Carvalho
2ffaae97a4 compaction: LCS: extract state into a separate struct
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-27 13:40:02 -03:00
Raphael S. Carvalho
e2f38baa92 compaction: TWCS: prepare for stateless strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-27 13:40:01 -03:00
Raphael S. Carvalho
017f432b8f compaction: TWCS: extract state into a separate struct
This is a step towards decoupling compaction strategy (computation)
and its state. Making the former stateless.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-27 13:38:47 -03:00
Calle Wilund
7af7c379a5 datasbase: Flush/close "extension internal" keyspaces after other user ks
Refs #13334

Effectively treats keyspaces listed in "extension internal" as system keyspaces
w.r.t. shutdown/drain. This ensures all user keyspaces are fully flushed before
we disable these "internal" ones.
2023-03-27 15:15:49 +00:00
Calle Wilund
c3ec6a76c0 distributed_loader: Use extensions set of "extension internal" keyspaces
Refs #13334

Working towards removing load_prio_keyspaces. Use the extensions interface
to determine which keyspaces to initialize early.
2023-03-27 15:14:13 +00:00
Calle Wilund
7c8c020c0e db::extentions: Add "extensions internal" keyspace set
Refs #13334

To be populated early by extensions. Such a keyspace should be
1.) Started before user keyspaces
2.) Flushed/closed after user keyspaces
3.) For all other regards be considered "user".
2023-03-27 15:12:31 +00:00
Aleksandra Martyniuk
f10b862955 repair: rename repair_module 2023-03-27 16:33:39 +02:00
Aleksandra Martyniuk
8f935481cd repair: add repair namespace to repair/task_manager_module.hh 2023-03-27 16:32:51 +02:00
Aleksandra Martyniuk
17e0e05f42 repair: rename repair_task.hh 2023-03-27 16:31:51 +02:00
Raphael S. Carvalho
232e71f2cf compaction: add const-qualifier to a few compaction_strategy methods
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-27 11:13:10 -03:00
Botond Dénes
c7131a0574 Update tools/cqlsh/ submodule
* tools/cqlsh b9a606f...8769c4c (11):
  > dist: redhat: provide only a single version
  > pylib/setup, requirement.txt: remove Six
  > setup: do not support python2
  > install.sh: install files with correct permission in struct umask settings
  > Remove unneed LC_ALL=en_US.UTF-8
  > Support using other driver (datastax or older scylla ones)
  > Fix RPM based downgrade command on scylla-cqlsh
  > gitignore: ignore pylib/cqlshlib/__pycache__
  > dist/redhat: add a proper changelog entry
  > github actions: enable starting on tags
  > Add support for building docker image
2023-03-27 16:23:54 +03:00
Kefu Chai
a3cb5db542 gms/inet_address: implement operator<< using fmt::formatter
less repeatings this way,

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-27 20:06:45 +08:00
Kefu Chai
8dbaef676d treewide: use fmtlib to format gms::inet_address
the goal of this change is to reduce the dependency on
`operator<<(ostream&, const gms::inet_address&)`.

this is not an exhaustive search-and-replace change, as in some
caller sites we have other dependencies to yet-converted ostream
printer, we cannot fix them all, this change only updates some
caller of `operator<<(ostream&, const gms::inet_address&)`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-27 20:06:45 +08:00
Kefu Chai
4ea6e06cac gms/inet_address: specialize fmt::formatter<gms::inet_address>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `gms::inet_address` with the help of fmt::ostream.
please note, the ':' delimiter is specified when printing the IPv6 address.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-27 20:06:45 +08:00
Kefu Chai
a606606ac4 bytes: implement formatting helpers using formatter
some of these helpers prints a byte array using `to_hex()`, which
materializes a string instance and then drop it on the floor after
printing it to the given ostream. this hurts the performance, so
`fmt::print()` should be more performant in comparison to the
implementations based on `to_hex()`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-27 20:06:45 +08:00
Kefu Chai
36dc2e3f28 bytes: specialize fmt::formatter<bytes>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print bytes with the help of fmt::ostream.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-27 20:06:45 +08:00
Kefu Chai
2f9dfba800 bytes: specialize fmt::formatter<fmt_hex>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print bytes_view with the help of fmt::ostream. because fmtlib
has its own specialization for fmt::formatter<std::basic_string_view<T>>,
we cannot just create a full specialization for std::basic_string_view<int8_t>,
otherwise fmtlib would complain that

> Mixing character types is disallowed.

so we workaround this using a delegate of fmt_hex.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-27 20:06:45 +08:00
Tomasz Grabiec
79ee38181c Merge 'storage_service: wait for normal state handlers earlier in the boot procedure' from Kamil Braun
The `wait_for_normal_state_handled_on_boot` function waits until
`handle_state_normal` finishes for the given set of nodes. It was used
in `run_bootstrap_ops` and `run_replace_ops` to wait until NORMAL states
of existing nodes in the cluster are processed by the joining node
before continuing the joining process. One reason to do it is because at
the end of `handle_state_normal` the joining node might drop connections
to the NORMAL nodes in order to reestablish new connections using
correct encryption settings. In tests we observed that the connection
drop was happening in the middle of repair/streaming, causing
repair/streaming to abort.

Unfortunately, calling `wait_for_normal_state_handled_on_boot` in
`run_bootstrap_ops`/`run_replace_ops` is too late to fix all problems.
Before either of these two functions, we create a new CDC generation and
write the data to `system_distributed_everywhere.cdc_generation_descriptions_v2`.
In tests, the connections were sometimes dropped while this write was
in-flight. This would cause the write to never arrive to other nodes,
and the joining node would timeout waiting for confirmations.

To fix this, call `wait_for_normal_state_handled_on_boot` earlier in the
boot procedure, before `make_new_generation` call which does the write.

Fixes: #13302

Closes #13317

* github.com:scylladb/scylladb:
  storage_service: wait for normal state handlers earlier in the boot procedure
  storage_service: bootstrap: wait for normal tokens to arrive in all cases
  storage_service: extract get_nodes_to_sync_with helper
  storage_service: return unordered_set from get_ignore_dead_nodes_for_replace
2023-03-27 13:56:47 +02:00
Kamil Braun
cd282cf0ab Merge 'Raft, use schema commit log' from Gusev Petr
We need this so that we can have multi-partition mutations which are applied atomically. If they live on different shards, we can't guarantee atomic write to the commitlog.

Fixes: #12642

Closes #13134

* github.com:scylladb/scylladb:
  test_raft_upgrade: add a test for schema commit log feature
  scylla_cluster.py: add start flag to server_add
  ServerInfo: drop host_id
  scylla_cluster.py: add config to server_add
  scylla_cluster.py: add expected_error to server_start
  scylla_cluster.py: ScyllaServer.start, refactor error reporting
  scylla_cluster.py: fix ScyllaServer.start, reset cmd if start failed
  raft: check if schema commitlog is initialized Refuse to boot if neither the schema commitlog feature nor force_schema_commit_log is set. For the upgrade procedure the user should wait until the schema commitlog feature is enabled before enabling consistent_cluster_management.
  raft: move raft initialization after init_system_keyspace
  database: rename before_schema_keyspace_init->maybe_init_schema_commitlog
  raft: use schema commitlog for raft tables
  init_system_keyspace: refactoring towards explicit load phases
2023-03-27 13:27:30 +02:00
Marcin Maliszkiewicz
339a8fe64d thrift: return address in listen_addresses() only after server is ready
listen_addresses() checks if _server variable is empty and after this
patch we assign (move) the value only after server is ready.

This is used for readiness API: /storage_service/rpc_server and the fix
prevents from returning 'true' prematurely. Some improvement for readiness
was added in a51529dd15 but thrift implementation
wasn't fully done.

Fixes #12376
2023-03-27 13:20:53 +02:00
Marcin Maliszkiewicz
a38701b9d4 thrift: simplify do_start_server() with seastar:async
Code is executed typically on startup only so overhead is very limited.
Notably using async avoids managing tserver variable lifetime.
2023-03-27 13:12:10 +02:00
David Garcia
70ce1b2002 docs: Separate conf.py
docs: update github actions

docs: fix Makefile tabs

Update docs-pr.yaml

Update Makefile

Closes #13323
2023-03-27 13:42:58 +03:00
Botond Dénes
89e58963ab Update tools/python3/ submodule
* tools/python3 279b6c1...d2f57dd (3):
  > dist: redhat: provide only a single version
  > SCYLLA-VERSION-GEN: use -gt when comparing values
  > SCYLLA-VERSION-GEN: remove unnecessary bashism
2023-03-27 12:00:27 +03:00
Botond Dénes
b5afdf56c3 Merge 'Cleanup keyspace compaction task' from Aleksandra Martyniuk
Task manager task implementations of classes that cover
cleanup keyspace compaction which can be started through
/storage_service/keyspace_compaction/ api.

Top level task covers the whole compaction and creates child
tasks on each shard.

Closes #12712

* github.com:scylladb/scylladb:
  test: extend test_compaction_task.py to test cleanup compaction
  compaction: create task manager's task for cleanup keyspace compaction on one shard
  compaction: create task manager's task for cleanup keyspace compaction
  api: add get_table_ids to get table ids from table infos
  compaction: create cleanup_compaction_task_impl
2023-03-27 11:52:51 +03:00
Kefu Chai
ed347c5051 bytes: mark fmt_hex::v const
as fmt_hex is a helper class for formatting the underlying `bytes_view`,
it does not mutate it, so mark the member variable const and mark
the parameter in its constructor const. this change also helps us to
use fmt_hex in the use case where the const semantics is expected.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-27 16:49:07 +08:00
Botond Dénes
ab61704c54 Merge 'mutation: replace operator<<(.., const range_tombstone&) with fmt formatter' from Kefu Chai
this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `range_tombstone` and `range_tombstone_change` without using ostream<<. also, this change removes all existing callers of `operator<<(ostream, const range_tombstone &)` and `operator<<(ostream, const range_tombstone_change &)`, and then removes these two `operator<<`s.

Refs #13245

Closes #13260

* github.com:scylladb/scylladb:
  mutation: drop operator<<(ostream, const range_tombstone{_change,} &)
  mutation: use fmtlib to print range_stombstone{_change,}
  mutation: mutation_fragment_v2: specialize fmt::formatter<range_tombstone_change>
  mutation: range_tombstone: specialize fmt::formatter<range_tombstone>
2023-03-27 11:38:59 +03:00
Botond Dénes
bd42f5ee0b Merge 'raft: includes used header and use <path/to/header> for include boost headers' from Kefu Chai
at least, we need to access the declarations of exceptions, like`not_a_leader` and `dropped_entry`, so, instead of relying on other header to do this job for us, we should include the header which include the declaration. so, in this chance "raft.h" is include explicitly. also, include boost headers using "<path/to/header>` instead of "path/to/header` for more consistency.

Closes #13326

* github.com:scylladb/scylladb:
  raft: include boost header using <path/to/header> not "path/to/header"
  raft: include used header
2023-03-27 10:11:45 +03:00
Kefu Chai
96ba88f621 dist/debian: add libexec/scylla to source/include-binaries
* scripts/create-relocatable-package.py: add a command to print out
  executables under libexec
* dist/debian/debian_files_gen.py: call create-relocatable-package.py
  for a list of files under libexec and create source/include-binaries
  with the list.

we repackage the precompiled binaries in the relocatable package into a debian source package using `./scylla/install.sh`, which edits the executable to use the specified dynamic library loader. but dpkg-source does not like this, as it wants to ensure that the files in original tarball (*.orig.tar.gz) is identical to the files in the source package created by dpkg-source.

so we have following failure when running reloc/build_deb.sh

```
dpkg-source: error: cannot represent change to scylla/libexec/scylla: binary file contents changed
dpkg-source: error: add scylla/libexec/scylla in debian/source/include-binaries if you want to store the modified binary in the debian tarball
dpkg-source: error: unrepresentable changes to source
dpkg-buildpackage: error: dpkg-source -b . subprocess returned exit status 1
debuild: fatal error at line 1182:
dpkg-buildpackage -rfakeroot -us -uc -ui failed
```

in this change, to address the build failure, as proposed by dpkg, the
path to the patched/edited executable is added to
`debian/source/include-binaries`. see the "Building" section in https://manpages.debian.org/bullseye/dpkg-dev/dpkg-source.1.en.html for more details.
please search `adjust_bin()` in `scylladb/install.sh` for more details.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12722
2023-03-27 10:10:12 +03:00
Botond Dénes
4b5b6a9010 test/lib: rm test_table.hh
No users left.
2023-03-27 02:00:44 -04:00
Botond Dénes
3a43574b39 test/boos/multishard_mutation_query_test: migrate other tests to random schema
Create a local method called create_test_table that has the same
signature as test::create_test_table, but uses random schema behind the
scenes to generate the schema and the data, then migrate all the test
cases to use it instead.
To accomodate to the added randomness added by the random schema and
random data, the unreliable querier cache population checks was replaced
with more reliable lookup and miss checks, to prevent test flakiness.
Querier cache population checks worked well with a fixed and simple
schema and a fixed table population, they don't work that well with
random data.

With this, there are no more uses of test_table.hh in this test and the
include can be removed.
2023-03-27 02:00:44 -04:00
Botond Dénes
56a9968817 test/boost/multishard_mutation_query_test: use ks keyspace
This keyspace exists by default and thus we don't have to create a new
one for each test. Also use `get_name()` to pass the test case's name as
table name, instead of hard-coding it. We already had some copy-pasta
creep in: two tests used the same table name. This is an error, as each
test runs in its own env, but it is confusing to see another test case's
name in the logs.
2023-03-27 02:00:44 -04:00
Botond Dénes
ad313d8eef test/boost/multishard_mutation_query_test: improve test pager
Propagate the page size to the result builder, so it can determine when
a page is short and thus it is the last page, instead of asking for more
pages until an empty one turns up. This will make tests more reliable
when dealing with random datasets.
Also change how the page counter is bumped: bump it after the current
page is executed, at which point we know whether there will be a next
page or not. This fixes an off-by-one seen in some cases.
2023-03-27 02:00:44 -04:00
Botond Dénes
3df70a9f3b test/boost/multishard_mutation_query_test: refactor fuzzy_test
Use the random_schema and its facilities to generate the schema and the
dataset. This allows the test to provide a much better coverage then the
previous, fixed and simplistic schema did.
Also reduce the test table population and the number of scans ran on it
to the test runs in a more reasonable time-frame.
We run these tests all the time due to CI, so no need to try to do too
much in a single run.
2023-03-27 02:00:43 -04:00
Botond Dénes
2cdda562f7 test/boost: add multishard_mutation_query_test more memory
The tests in this file work with random schema and random data. Some
seeds can generate large partitions and rows, give the test some
more headroom to work with.
2023-03-27 01:44:00 -04:00
Botond Dénes
00f06522c2 types/user: add get_name() accessor
For the raw name (bytes).
2023-03-27 01:44:00 -04:00
Botond Dénes
99c9a71d93 test/lib/random_schema: add create_with_cql()
Allowing the generated schema to be created as a CQL table, so that
queries can be run against it.
2023-03-27 01:44:00 -04:00
Botond Dénes
10a44fee06 test/lib/random_schema: fix udt handling
* generate lowercase names (upper-case seems to cause problems);
* preserve dependency order between UDTs when dumping them from schema;
* use built-in describe() to dump to CQL string;
* drop single arg dump_udts() overlad, which was not recursive, unlike
  the vector variant;
2023-03-27 01:44:00 -04:00
Botond Dénes
b2ddc60c10 test/lib/random_schema: type_generator(): also generate frozen types
For regular and static columns, to introduce some further randomness.
So far frozen types were generated only for primary key members and
embedded types.
2023-03-27 01:44:00 -04:00
Botond Dénes
1cb4b1fc83 test/lib/random_schema: type_generator(): make static column generation conditional
On the schema having clustering columns. Otherwise static column is
illegal.
2023-03-27 01:44:00 -04:00
Botond Dénes
2a7cccd1a8 test/lib/random_schema: type_generator(): don't generate duration_type for keys
And for any embedded type (collection, tuple members, etc.).
Its not allowed as I recently learned it.
2023-03-27 01:44:00 -04:00
Botond Dénes
c9f54e539d test/lib/random_schema: generate_random_mutations(): add overload with seed 2023-03-27 01:44:00 -04:00
Botond Dénes
394909869d test/lib/random_schema: generate_random_mutations(): respect range tombstone count param
Even though there is a parameter determining the number of range
tombstones to be generated, the method disregards it and generates
just 4. Fix that.
2023-03-27 01:43:59 -04:00
Botond Dénes
477b26f7af test/lib/random_schema: generate_random_mutations(): add yields 2023-03-27 01:43:59 -04:00
Botond Dénes
fd8a50035a test/lib/random_schema: generate_random_mutations(): fix indentation 2023-03-27 01:43:59 -04:00
Botond Dénes
71fdec7b42 test/lib/random_schema: generate_random_mutations(): coroutinize method 2023-03-27 01:43:59 -04:00
Botond Dénes
393aaddff0 test/lib/random_schema: generate_random_mutations(): expand comment
Add note about mutation order and deduplication.
2023-03-27 01:43:59 -04:00
Avi Kivity
cd0b167d6c Merge 'bloom_filter: cleanups' from Kefu Chai
this series applies some random cleanups to bloom_filter. these cleanups were the side products when the author was working on #13314 .

Closes #13315

* github.com:scylladb/scylladb:
  bloom_filter: mark internal help function static
  bloom_filter: add more constness to false positive rate tables
  bloom_filter: use vector::back() when appropriate
2023-03-26 19:43:37 +03:00
Kefu Chai
33f4012eeb test: cql-pytest: test_describe: clamp bloom filter's fp rate
before this change, we use `round(random.random(), 5)` for
the value of `bloom_filter_fp_chance` config option. there are
chances that this expression could return a number lower or equal
to 6.71e-05.

but we do have a minimal for this option, which is defined by
`utils::bloom_calculations::probs`. and the minimal false positive
rate is 6.71e-05.

we are observing test failures where the we are using 0 for
the option, and scylla right rejected it with the error message of
```
bloom_filter_fp_chance must be larger than 6.71e-05 and less than or equal to 1.0 (got 0)
```.

so, in this change, to address the test failure, we always use a number
slightly greater or equal to a number slightly greater to the minimum to
ensure that the randomly picked number is in the range of supported
false positive rate.

Fixes #13313
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13314
2023-03-26 19:41:22 +03:00
Botond Dénes
d5488dba69 reader_permit: set_trace_state(): emit trace message linking to previous page
This method is called on the start of each page, updating the trace
state stored on the permit to that of the current page.
When doing so, emit a trace message, containing the session id of the
previous page, so the per-page sessions can be stiched together later.
Note that this message is only emitted if the cached read survived
between the pages.

Example:

    Tracing session: dcfc1570-ca3c-11ed-88d0-24443f03a8bb

     activity                                                                                                                              | timestamp                  | source    | source_elapsed | client
    ---------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
															Execute CQL3 query | 2023-03-24 08:10:27.271000 | 127.0.0.1 |              0 | 127.0.0.1
													     Parsing a statement [shard 0] | 2023-03-24 08:10:27.271864 | 127.0.0.1 |             -- | 127.0.0.1
													  Processing a statement [shard 0] | 2023-03-24 08:10:27.271958 | 127.0.0.1 |             94 | 127.0.0.1
	   Creating read executor for token 3274692326281147944 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] | 2023-03-24 08:10:27.271995 | 127.0.0.1 |            132 | 127.0.0.1
												     read_data: querying locally [shard 0] | 2023-03-24 08:10:27.271998 | 127.0.0.1 |            135 | 127.0.0.1
							     Start querying singular range {{3274692326281147944, pk{00026b73}}} [shard 0] | 2023-03-24 08:10:27.272003 | 127.0.0.1 |            140 | 127.0.0.1
									     [reader concurrency semaphore] admitted immediately [shard 0] | 2023-03-24 08:10:27.272006 | 127.0.0.1 |            143 | 127.0.0.1
										   [reader concurrency semaphore] executing read [shard 0] | 2023-03-24 08:10:27.272014 | 127.0.0.1 |            150 | 127.0.0.1
					 Querying cache for range {{3274692326281147944, pk{00026b73}}} and slice {(-inf, +inf)} [shard 0] | 2023-03-24 08:10:27.272022 | 127.0.0.1 |            159 | 127.0.0.1
     Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 3 clustering row(s) (3 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2023-03-24 08:10:27.272076 | 127.0.0.1 |            212 | 127.0.0.1
								   Caching querier with key ab928e0d-b815-46b7-9a02-1fa2d9549477 [shard 0] | 2023-03-24 08:10:27.272084 | 127.0.0.1 |            221 | 127.0.0.1
														Querying is done [shard 0] | 2023-03-24 08:10:27.272087 | 127.0.0.1 |            224 | 127.0.0.1
											    Done processing - preparing a result [shard 0] | 2023-03-24 08:10:27.272106 | 127.0.0.1 |            242 | 127.0.0.1
															  Request complete | 2023-03-24 08:10:27.271259 | 127.0.0.1 |            259 | 127.0.0.1

    Tracing session: dd3092f0-ca3c-11ed-88d0-24443f03a8bb

     activity                                                                                                                              | timestamp                  | source    | source_elapsed | client
    ---------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
															Execute CQL3 query | 2023-03-24 08:10:27.615000 | 127.0.0.1 |              0 | 127.0.0.1
													     Parsing a statement [shard 0] | 2023-03-24 08:10:27.615223 | 127.0.0.1 |             -- | 127.0.0.1
													  Processing a statement [shard 0] | 2023-03-24 08:10:27.615310 | 127.0.0.1 |             87 | 127.0.0.1
	   Creating read executor for token 3274692326281147944 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] | 2023-03-24 08:10:27.615346 | 127.0.0.1 |            124 | 127.0.0.1
												     read_data: querying locally [shard 0] | 2023-03-24 08:10:27.615349 | 127.0.0.1 |            126 | 127.0.0.1
							     Start querying singular range {{3274692326281147944, pk{00026b73}}} [shard 0] | 2023-03-24 08:10:27.615352 | 127.0.0.1 |            130 | 127.0.0.1
	  Found cached querier for key ab928e0d-b815-46b7-9a02-1fa2d9549477 and range(s) {{{3274692326281147944, pk{00026b73}}}} [shard 0] | 2023-03-24 08:10:27.615358 | 127.0.0.1 |            135 | 127.0.0.1
														 Reusing querier [shard 0] | 2023-03-24 08:10:27.615362 | 127.0.0.1 |            139 | 127.0.0.1
				   Continuing paged query, previous page's trace session is dcfc1570-ca3c-11ed-88d0-24443f03a8bb [shard 0] | 2023-03-24 08:10:27.615364 | 127.0.0.1 |            141 | 127.0.0.1
										   [reader concurrency semaphore] executing read [shard 0] | 2023-03-24 08:10:27.615371 | 127.0.0.1 |            148 | 127.0.0.1
     Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2023-03-24 08:10:27.615385 | 127.0.0.1 |            163 | 127.0.0.1
														Querying is done [shard 0] | 2023-03-24 08:10:27.615583 | 127.0.0.1 |            360 | 127.0.0.1
											    Done processing - preparing a result [shard 0] | 2023-03-24 08:10:27.615730 | 127.0.0.1 |            507 | 127.0.0.1
															  Request complete | 2023-03-24 08:10:27.615518 | 127.0.0.1 |            518 | 127.0.0.1
See the message:

    Continuing paged query, previous page's trace session is dcfc1570-ca3c-11ed-88d0-24443f03a8bb [shard 0] | 2023-03-24 08:10:27.615364 | 127.0.0.1 |            141 | 127.0.0.1

This is a folow-up to #13255
Refs: #12781

Closes #13318
2023-03-26 18:41:21 +03:00
Avi Kivity
f937fad25a Merge 'readers/multishard: shard_reader: fast-forward created reader to current range' from Botond Dénes
When creating the reader, the lifecycle policy might return one that was saved on the last page and survived in the cache. This reader might have skipped some fast-forwarding ranges while sitting in the cache. To avoid using a reader reading a stale range (from the read's POV), check its read range and fast forward it if necessary.

Fixes: https://github.com/scylladb/scylladb/issues/12916

Closes #12932

* github.com:scylladb/scylladb:
  readers/multishard: shard_reader: fast-forward created reader to current range
  readers/multishard: reader_lifecycle_policy: add get_read_range()
  test/boost/multishard_mutation_query_test: paging: handle range becoming wrapping
2023-03-26 18:39:50 +03:00
Wojciech Mitros
f0aa540e00 cql: renice the wasm compilation alien thread
The Wasm compilation is a slow, low priority task, so it should
not compete with reactor threads or the networking core.
To achieve that, we increase the niceness of the thread by 10.

An alternative solution would be to set the priority using
pthread_setschedparam, but it's not currently feasible,
because as long as we're using the SCHED_OTHER policy for our
threads, we cannot select any other priority than 0.

Closes #13307
2023-03-26 18:38:23 +03:00
Anna Stuchlik
1cfea1f13c doc: remove incorrect info about BYPASS CACHE
Fixes https://github.com/scylladb/scylladb/issues/13106

This commit removes the information that BYPASS CACHE
is an Enterprise-only feature and replaces that info
with the link to the BYPASS CACHE description.

Closes #13316
2023-03-26 18:13:17 +03:00
Kefu Chai
e796525f23 types: remove unused header
<iterator> was introduced back in
1cf02cb9d8, but lexicographical_compare.hh
was extracted out in bdfc0aa748, since we
don't have any users of <iterator> in types.hh anymore, let's remove it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13327
2023-03-26 16:55:16 +03:00
Avi Kivity
eeff8cd075 Merge 'dist/redhat: enforce dependency on %{release} also' from Kefu Chai
s/%{version}/%{version}-%{release}/ in `Requires:` sections.

this enforces the runtime dependencies of exactly the same releases between scylla packages.

Fixes #13222
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13229

* github.com:scylladb/scylladb:
  dist/redhat: split Requires section into multiple lines
  dist/redhat: enforce dependency on %{release} also
2023-03-26 16:50:10 +03:00
Avi Kivity
bfd70c192e cql3: functions: reimplement min/max statelessly
min() and max() had two implementations: one static (for each type in
a select list) and one dynamic (for compound types). Since the
dynamic implementation is sufficient, we only reimplement that. This
means we don't use the automarshalling helpers, since we don't do any
arithemetic on values apart from comparison, which is conveniently
provided by abstract_type.
2023-03-26 15:18:22 +03:00
Avi Kivity
e6342d476b cql3: functions: reimplement count(*) statelessly
Note we have to explicitly decay lambdas to functions using unary operator +.
2023-03-26 15:18:22 +03:00
Avi Kivity
9291ec5ed1 cql3: functions: simplify creating native functions even more
Add a helper function to consolidate the internal native function
class and the automatic marshalling introduced in previous patches.

Since decaying a lambda into a function pointer (in order to
infer its signature) there are two overloads: one accepts a lambda
and decays it into a function pointer, the second accepts a function
pointer, infers its argument, and constructs the function object.
2023-03-26 15:15:36 +03:00
Kefu Chai
3425184b2a raft: include boost header using <path/to/header> not "path/to/header"
for more consistency with the rest of the source tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-26 14:07:50 +08:00
Kefu Chai
0421d6d12f raft: include used header
at least, we need to access the declarations of exceptions, like
`not_a_leader` and `dropped_entry`, so, instead of relying on
other header to do this job for us, we should include the header
which include the declaration. so, in this chance "raft.h" is
include explicitly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-26 14:07:50 +08:00
Kefu Chai
023e985a6c build: cmake: add missing source files to idl and service
they were added recently, but cmake failed to sync with configure.py.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-26 14:01:21 +08:00
Kefu Chai
e0ca80d21f build: cmake: port more cxxflags from configure.py
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-26 14:01:21 +08:00
Kefu Chai
a5547ea11b build: cmake: add two missing tests
they are leftovers in f113dac5bf

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-26 14:01:21 +08:00
Tzach Livyatan
46e6c639d9 docs: minor improvments to the Raft Handling Failures and recovery procedure sections
Closes #13292
2023-03-24 18:17:36 +01:00
Botond Dénes
b6682ad607 docs/operating-scylla/admin-tools: scylla-sstable.rst: update schema section
With the recent changes to the ways schema can be provided to the tool.
2023-03-24 11:41:40 -04:00
Botond Dénes
bc9341b84a test/cql-pytest: test_tools.py: add test for schema loading
A comprehensive test covering all the supported ways of providing the
schema to scylla-sstable, either explicitely or implicitely
(auto-detect).
2023-03-24 11:41:40 -04:00
Botond Dénes
afdfe34ca7 test/cql-pytest: nodetool.py: add flush_keyspace()
It would have been better if `flush()` could have been called with a
keyspace and optional table param, but changing it now is too much
churn, so we add a dedicated method to flush a keyspace instead.
2023-03-24 11:41:40 -04:00
Botond Dénes
1f0ab699c3 tools/scylla-sstable: reform schema loading mechanism
So far, schema had to be provided via a schema.cql file, a file which
contains the CQL definition of the table. This is flexible but annoying
at the same time. Many times sstables the tool operates on are located
in their table directory in a scylla data directory, where the schema
tables are also available. To mitigate this, an alternative method to
load the schema from memory was added which works for system tables.
In this commit we extend this to work for all kind of tables: by
auto-detecting where the scylla data directory is, and loading the
schema tables from disk.
2023-03-24 11:41:40 -04:00
Botond Dénes
c5b2fc2502 tools/schema_loader: add load_schema_from_schema_tables()
Allows loading the schema for the designated keyspace and table, from
the system table sstables located on disk. The sstable files opened for
read only.
2023-03-24 11:41:40 -04:00
Botond Dénes
19560419d2 Merge 'treewide: improve compatibility with gcc 13' from Avi Kivity
An assortment of patches that reduce our incompatibilities with the upcoming gcc 13.

Closes #13243

* github.com:scylladb/scylladb:
  transport: correctly format unknown opcode
  treewide: catch by reference
  test: raft: avoid confusing string compare
  utils, types, test: extract lexicographical compare utilities
  test: raft: fsm_test: disambiguate raft::configuration construction
  test: reader_concurrency_semaphore_test: handle all enum values
  repair: fix signed/unsigned compare
  repair: fix incorrect signed/unsigned compare
  treewide: avoid unused variables in if statements
  keys: disambiguate construction from initializer_list<bytes>
  cql3: expr: fix serialize_listlike() reference-to-temporary with gcc
  compaction: error on invalid scrub type
  treewide: prevent redefining names
  api: task_manager: fix signed/unsigned compare
  alternator: streams: fix signed/unsigned comparison
  test: fix some mismatched signed/unsigned comparisons
2023-03-24 15:16:05 +02:00
Botond Dénes
132d101dc7 db/schema_tables: expose types schema 2023-03-24 08:50:39 -04:00
Botond Dénes
14bff955e2 readers/multishard: shard_reader: fast-forward created reader to current range
When creating the reader, the lifecycle policy might return one that was
saved on the last page and survived in the cache. This reader might have
skipped some fast-forwarding ranges while sitting in the cache. To avoid
using a reader reading a stale range (from the read's POV), check its
read range and fast forward it if necessary.
2023-03-24 08:43:03 -04:00
Botond Dénes
0aa03f85a3 readers/multishard: reader_lifecycle_policy: add get_read_range()
Allows retrieving the current read-range for the reader on the given
shard (where the method is called).
2023-03-24 08:40:11 -04:00
Botond Dénes
1c7a66cd2a test/boost/multishard_mutation_query_test: paging: handle range becoming wrapping
After each page, the read range is adjusted so it continues from/after
the last read partition. Sometimes this can result in the range becoming
wrapped like this: (pk, pk]. In this case, we can just drop this range
and continue with the rest of the ranges (if there are multiple ones).
2023-03-24 08:40:11 -04:00
Tomasz Grabiec
c54a3d9c10 Merge 'Clean enabled features manipulations in system keyspace' from Pavel Emelyanov
There was an attempt to cut feature-service -> system-keyspace dependency (#13172) which turned out to require more changes. Here's a preparation squeezing from this future work.

This set
- leaves only batch-enabling API in feature service
- keeps the need for async context in feature service
- narrows down system keyspace features API to only load and store records
- relaxes features updating logic in sys.ks.
- cosmetic

Closes #13264

* github.com:scylladb/scylladb:
  feature_service: Indentation fix after previous patch
  feature_service: Move async context into enable()
  system_keyspace: Refactor local features load/save helpers
  feature_service: Mark supported_feature_set() const
  feature_service: Remove single feature enabling method
  boot: Enable features in batch
  gossiper: Enable features in batch
2023-03-24 13:12:49 +01:00
Petr Gusev
c1634ea5fa test_raft_upgrade: add a test for schema commit log feature
The test tries to start a node with
consistent_cluster_management but without
force_schema_commit_log. This is expected to fail,
since the schema commitlog feature should be enabled
by all the cluster nodes.
2023-03-24 16:08:17 +04:00
Petr Gusev
e407956e9f scylla_cluster.py: add start flag to server_add
Sometimes when creating a node it's useful
to just install it and not start. For example,
we may want to try to start it later with
expected error.

The ScyllaServer.install method has been made
exception safe, if an exception occurs, it
reverts to the original state. This allows
to not duplicate the try/except logic
in two of its call sites.
2023-03-24 16:08:17 +04:00
Petr Gusev
794d0e4000 ServerInfo: drop host_id
We are going to allow the
ScyllaCluster.add_server function not to
start the server if the caller has requested
that with a special parameter. The host_id
can only be obtained from a running node, so
add_server won't be able to return it in
this case. I've grepped the tests for host_id
and there doesn't seem to be any
reference to it in the code.
2023-03-24 16:08:17 +04:00
Petr Gusev
8e3392c64f scylla_cluster.py: add config to server_add
Sometimes when creating a node it's useful
to pass a custom node config.
2023-03-24 16:08:17 +04:00
Petr Gusev
c1d0ee2bce scylla_cluster.py: add expected_error to server_start
Sometimes it's useful to check that the node has failed
to start for a particular reason. If server_start can't
find expected_error in the node's log or if the
node has started without errors, it throws an exception.
2023-03-24 16:08:11 +04:00
Petr Gusev
a4411e9ec4 scylla_cluster.py: ScyllaServer.start, refactor error reporting
Extract the function that encapsulates all the error
reporting logic. We are going to use it in several
other places to implement expected_error feature.
2023-03-24 15:54:52 +04:00
Petr Gusev
21b505e67c scylla_cluster.py: fix ScyllaServer.start, reset cmd if start failed
The ScyllaServer expects cmd to be None if the
Scylla process is not running. Otherwise, if start failed
and the test called update_config, the latter will
try to send a signal to a non-existent process via cmd.
2023-03-24 15:54:52 +04:00
Petr Gusev
75a4ff2da9 raft: check if schema commitlog is initialized
Refuse to boot if neither the schema commitlog feature
nor force_schema_commit_log is set. For the upgrade
procedure the user should wait until
the schema commitlog feature is enabled before
enabling consistent_cluster_management.
2023-03-24 15:54:52 +04:00
Petr Gusev
d8997a4993 raft: move raft initialization after init_system_keyspace
Raft tables are loaded on the second call to
init_system_keyspace, so it seems more logical
to move initialization after it. This is not
necessary right now since raft tables are not used
in this initialization logic, but it may
change in the future and cause troubles.
2023-03-24 15:54:52 +04:00
Petr Gusev
769732d095 database: rename before_schema_keyspace_init->maybe_init_schema_commitlog
We are going to move the raft tables from the first
load phase to the second. This means the second
init_system_keyspace call will load raft tables along
with the schema, making the name of this function imprecise.
2023-03-24 15:54:52 +04:00
Petr Gusev
273e70e1f9 raft: use schema commitlog for raft tables
Fixes: #12642
2023-03-24 15:54:52 +04:00
Petr Gusev
5a5d664a5a init_system_keyspace: refactoring towards explicit load phases
We aim (#12642) to use the schema commit log
for raft tables. Now they are loaded at
the first call to init_system_keyspace in
main.cc, but the schema commitlog is only
initialized shortly before the second
call. This is important, since the schema
commitlog initialization
(database::before_schema_keyspace_init)
needs to access schema commitlog feature,
which is loaded from system.scylla_local
and therefore is only available after the
first init_system_keyspace call.

So the idea is to defer the loading of the raft tables
until the second call to init_system_keyspace,
just as it works for schema tables.
For this we need a tool to mark which tables
should be loaded in the first or second phase.

To do this, in this patch we introduce system_table_load_phase
enum. It's set in the schema_static_props for schema tables.
It replaces the system_keyspace::table_selector in the
signature of init_system_keyspace.

The call site for populate_keyspace in init_system_keyspace
was changed, table_selector.contains_keyspace was replaced with
db.local().has_keyspace. This check prevents calling
populate_keyspace(system_schema) on phase1, but allows for
populate_keyspace(system) on phase2 (to init raft tables).
On this second call some tables from system keyspace
(e.g. system.local) may have already been populated on phase1.
This check protects from double-populating them, since every
populated cf is marked as ready_for_writes.
2023-03-24 15:54:46 +04:00
Anna Stuchlik
9e27f6b4b7 doc: update the Ubuntu version used in the image
Starting from 5.2 and 2023.1 our images are based on Ubuntu:22.04.
See https://github.com/scylladb/scylladb/issues/13138#issuecomment-1467737084

This commit adds that information to the docs.
It should be merged and backported to branch-5.2.

Closes #13301
2023-03-24 13:50:51 +02:00
Kamil Braun
0b19a614fa storage_service: wait for normal state handlers earlier in the boot procedure
The `wait_for_normal_state_handled_on_boot` function waits until
`handle_state_normal` finishes for the given set of nodes. It was used
in `run_bootstrap_ops` and `run_replace_ops` to wait until NORMAL states
of existing nodes in the cluster are processed by the joining node
before continuing the joining process. One reason to do it is because at
the end of `handle_state_normal` the joining node might drop connections
to the NORMAL nodes in order to reestablish new connections using
correct encryption settings. In tests we observed that the connection
drop was happening in the middle of repair/streaming, causing
repair/streaming to abort.

Unfortunately, calling `wait_for_normal_state_handled_on_boot` in
`run_bootstrap_ops`/`run_replace_ops` is too late to fix all problems.
Before either of these two functions, we create a new CDC generation and
write the data to `system_distributed_everywhere.cdc_generation_descriptions_v2`.
In tests, the connections were sometimes dropped while this write was
in-flight. This would cause the write to never arrive to other nodes,
and the joining node would timeout waiting for confirmations.

To fix this, call `wait_for_normal_state_handled_on_boot` earlier in the
boot procedure, before `make_new_generation` call which does the write.

Fixes: #13302
2023-03-24 12:45:07 +01:00
Kamil Braun
451389970b storage_service: bootstrap: wait for normal tokens to arrive in all cases
`storage_service::bootstrap` waits until it receives normal tokens of
other nodes before proceeding or it times out with an error. But it only
did that for bootstrap operation, not for replace operation. Do it for
replace as well.
2023-03-24 12:44:37 +01:00
Kamil Braun
c003b7017d storage_service: extract get_nodes_to_sync_with helper 2023-03-24 12:44:37 +01:00
Kamil Braun
599393dcba storage_service: return unordered_set from get_ignore_dead_nodes_for_replace 2023-03-24 12:44:37 +01:00
Anna Stuchlik
73b74e8cac doc: remove Enterprise upgrade guides from OSS doc
This commit removes the Enterprise upgrade guides from
the Open Source documentation. The Enterprise upgrade guides
should only be available in the Enterprise documentation,
with the source files stored in scylla-enterprise.git.

In addition, this commit:
- adds the links to the Enterprise user guides in the Enterprise
documentation at https://enterprise.docs.scylladb.com/
- adds the redirections for the removed pages to avoid
breaking any links.

This commit must be reverted in scylla-enterprise.git.

Closes #13298
2023-03-24 10:57:03 +02:00
Kefu Chai
a7b4f84b6a bloom_filter: mark internal help function static
as `initialize_opt_k()` is not used out side of the translation unit,
let's mark it `static`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-24 15:41:45 +08:00
Kefu Chai
1a82a7ac72 bloom_filter: add more constness to false positive rate tables
we never mutate them, so mark them const for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-24 15:41:45 +08:00
Kefu Chai
7f4a3fdac8 bloom_filter: use vector::back() when appropriate
no need to use `size - 1` for accessing the last element in a vector,
let's just use `vector::back()` for more compacted code.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-24 15:41:45 +08:00
Jan Ciolek
a1c86786ca db/view/view.cc: rate limit view update error messages
When propagating a view update to a paired view
replica fails, there is an error message.
This message is printed for every mutation,
which causes log spam when some node goes down.

This isn't a fatal error - it's normal that
a remote view replica goes down, it'll hopefully
receive the updates later through hints.

I'm unsure if the error message should
be printed at all, but for now we can
just rate limit it and that will improve
the situation with log spamming.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #13175
2023-03-24 08:59:39 +02:00
Pavel Emelyanov
b0a5769d92 validation: Avoid throwing schema lookup
The validate_column_family() tries to find a schema and throws if it
doesn't exist. The latter is determined by the exception thrown by the
database::find_schema(), but there's a throw-less way of doing it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13295
2023-03-24 08:43:48 +02:00
Kamil Braun
e8fb718e4a Merge 'topology changes over raft' from Gleb Natapov
The patch series introduces linearisable topology changes using
raft protocol. The state machine driven by raft is described in
"service: Introduce topology state machine". Some explanations about
the implementation can be found in "storage_service: raft topology:
implement topology management through raft".

The code is not ready for production. There is not much in terms of error
handling and integration with the rest of the system is not even started.
For full integration request fencing will need to be implemented and
token_metadata has to be extended to support not just "pending" nodes
but concepts of "read replica set" and "write replica set".

The code may be far from be usable, but it is hidden behind the
"experimental raft" flag and having it in tree will relieve me from
constant rebase burden.

* 'raft-topology-v6' of github.com:scylladb/scylla-dev:
  storage_service: fix indentation from previous patch
  storage_service: raft topology: implement topology management through raft
  service: raft: make group0_guard move assignable
  service: raft: wire up apply() and snapshot transfer for topology in group0 state machine
  storage_service: raft topology: introduce a function that applies topology cmd to local state machine
  storage_service: raft topology: introduce a raft monitor and topology coordinator fibers
  storage_service: raft topology: introduce snapshot transfer code for the topology table
  raft topology: add RAFT_TOPOLOGY_CMD verb that will be used by topology coordinator to communicated with nodes
  bootstrapper: Add get_random_bootstrap_tokens function
  service: raft: add support for topology_change command into raft_group0_client
  service: raft: introduce topology_change group0 command
  system_keyspace: add a table to persist topology change state machine's state
  service: Introduce topology state machine data structures
  storage_proxy: not consult topology on local table write
2023-03-23 15:59:45 +01:00
Gleb Natapov
5a908c3f46 storage_service: fix indentation from previous patch 2023-03-23 16:29:56 +02:00
Gleb Natapov
f3bd7e9b8c storage_service: raft topology: implement topology management through raft
The code here implements the state machine described in "service:
Introduce topology state machine".  A topology operation is requested
by writing into topology_request field through raft. After that
topology_change_transition() function running on a leader is responsible
to drive the operation to completion. There is no much in terms of error
handling here yet. It something fails the code will just continue trying.

topology_change_state_load() which is (eventually) called on all nodes each
time state machine's state changes is a glue between the raft view of
the topology and the rest of the "legacy" system. The code there creates
token_metadata object from the raft view and fills in peers table which
is needed for drivers. The gossiper is almost completely cut of from the
topology management, but the code still updates node's sate there to
'normal' and 'left' for some legacy functionality to continue working.
Note that handlers for those states are disabled in raft mode.

raft_topology_cmd_handler() is called by topology coordinator and this
is where the streaming happens. The kind of streaming depends on the
state the node is in. The function is "re-entrable". It can be called
more then once and will either start new operation if it is the first
invocation or previous one failed, or it will wait from previous
operation to complete.

The new code is hidden behind "experimental raft" and should not change
how the system works if disabled.

Some indentation here is intentionally left wrong and will be fixed by
the next patch.
2023-03-23 16:29:56 +02:00
Gleb Natapov
8865d5cf13 service: raft: make group0_guard move assignable 2023-03-23 16:29:56 +02:00
Gleb Natapov
344b483425 service: raft: wire up apply() and snapshot transfer for topology in group0 state machine 2023-03-23 16:29:56 +02:00
Gleb Natapov
aca21d3318 storage_service: raft topology: introduce a function that applies topology cmd to local state machine
The function applies to persistent storage and call stub function
topology_change_state_load() that will load the new state into the
memory in later patches.
2023-03-23 16:29:56 +02:00
Gleb Natapov
284afd9255 storage_service: raft topology: introduce a raft monitor and topology coordinator fibers
Raft monitor fiber monitors local's server raft state and starts the
topology coordinator fiber when it becomes a leader. Stops it when it
is not longer a leader.

The coordinator fiber waits for topology state changes, but there will
be none yet.
2023-03-23 16:29:56 +02:00
Gleb Natapov
d69a887366 storage_service: raft topology: introduce snapshot transfer code for the topology table 2023-03-23 16:29:56 +02:00
Gleb Natapov
6a4d773b7e raft topology: add RAFT_TOPOLOGY_CMD verb that will be used by topology coordinator to communicated with nodes
Empty for now. Will be used later by the topology coordinator to
communicate with other nodes to instruct them to start streaming,
or start to fence read/writes.
2023-03-23 16:29:56 +02:00
Nadav Har'El
4fdcee8415 test/alternator: increase CQL connection timeout
This patch increases the connection timeout in the get_cql_cluster()
function in test/cql-pytest/run.py. This function is used to test
that Scylla came up, and also test/alternator/run uses it to set
up the authentication - which can only be done through CQL.

The Python driver has 2-second and 5-second default timeouts that should
have been more than enough for everybody (TM), but in #13239 we saw
that in one case it apparently wasn't enough. So to be extra safe,
let's increase the default connection-related timeouts to 60 seconds.

Note this change only affects the Scylla *boot* in the test/*/run
scripts, and it does not affect the actual tests - those have different
code to connect to Scylla (see cql_session() in test/cql-pytest/util.py),
and we already increased the timeouts there in #11289.

Fixes #13239

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13291
2023-03-23 16:03:20 +02:00
Avi Kivity
afe6b0d8c9 Merge 'reader_concurrency_semaphore: add trace points for important events' from Botond Dénes
Currently we have no visibility into what happens to a read in the reader concurrency semaphore as far as tracing is concerned. This series fixes that, storing a trace state pointer on the reader permit and using it to add trace messages to important semaphore related events:
* admission decision
* execution (execution stage functionality)
* eviction

This allows for seeing if the read suffered any delay in the semaphore.

Example tracing (2 pages):
```
Tracing session: 8cc80d50-c72d-11ed-8427-14e21cc3ed56

 activity                                                                                                                                  | timestamp                  | source    | source_elapsed | client
-------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                                        Execute CQL3 query | 2023-03-20 10:43:16.773000 | 127.0.0.1 |              0 | 127.0.0.1
                                                                                                             Parsing a statement [shard 0] | 2023-03-20 10:43:16.773754 | 127.0.0.1 |             -- | 127.0.0.1
                                                                                                          Processing a statement [shard 0] | 2023-03-20 10:43:16.773837 | 127.0.0.1 |             83 | 127.0.0.1
          Creating read executor for token -4911109968640856406 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] | 2023-03-20 10:43:16.773874 | 127.0.0.1 |            121 | 127.0.0.1
                                                                                                     read_data: querying locally [shard 0] | 2023-03-20 10:43:16.773877 | 127.0.0.1 |            123 | 127.0.0.1
                                      Start querying singular range {{-4911109968640856406, pk{000d73797374656d5f736368656d61}}} [shard 0] | 2023-03-20 10:43:16.773881 | 127.0.0.1 |            128 | 127.0.0.1
                                                                             [reader concurrency semaphore] admitted immediately [shard 0] | 2023-03-20 10:43:16.773884 | 127.0.0.1 |            130 | 127.0.0.1
                                                                                   [reader concurrency semaphore] executing read [shard 0] | 2023-03-20 10:43:16.773890 | 127.0.0.1 |            137 | 127.0.0.1
                  Querying cache for range {{-4911109968640856406, pk{000d73797374656d5f736368656d61}}} and slice {(-inf, +inf)} [shard 0] | 2023-03-20 10:43:16.773903 | 127.0.0.1 |            149 | 127.0.0.1
 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 100 clustering row(s) (100 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2023-03-20 10:43:16.774674 | 127.0.0.1 |            920 | 127.0.0.1
                                                                   Caching querier with key 5eff94d2-e47a-43b2-8e3a-2d80a9cc3b3e [shard 0] | 2023-03-20 10:43:16.774685 | 127.0.0.1 |            931 | 127.0.0.1
                                                                                                                Querying is done [shard 0] | 2023-03-20 10:43:16.774688 | 127.0.0.1 |            934 | 127.0.0.1
                                                                                            Done processing - preparing a result [shard 0] | 2023-03-20 10:43:16.774706 | 127.0.0.1 |            953 | 127.0.0.1
                                                                                                                          Request complete | 2023-03-20 10:43:16.774225 | 127.0.0.1 |           1225 | 127.0.0.1

Tracing session: 8d26f630-c72d-11ed-8427-14e21cc3ed56

 activity                                                                                                                                                | timestamp                  | source    | source_elapsed | client
---------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                                                      Execute CQL3 query | 2023-03-20 10:43:17.395000 | 127.0.0.1 |              0 | 127.0.0.1
                                                                                                                           Parsing a statement [shard 0] | 2023-03-20 10:43:17.395498 | 127.0.0.1 |             -- | 127.0.0.1
                                                                                                                        Processing a statement [shard 0] | 2023-03-20 10:43:17.395558 | 127.0.0.1 |             60 | 127.0.0.1
                        Creating read executor for token -4911109968640856406 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] | 2023-03-20 10:43:17.395597 | 127.0.0.1 |             99 | 127.0.0.1
                                                                                                                   read_data: querying locally [shard 0] | 2023-03-20 10:43:17.395600 | 127.0.0.1 |            102 | 127.0.0.1
                                                    Start querying singular range {{-4911109968640856406, pk{000d73797374656d5f736368656d61}}} [shard 0] | 2023-03-20 10:43:17.395604 | 127.0.0.1 |            106 | 127.0.0.1
 Found cached querier for key 5eff94d2-e47a-43b2-8e3a-2d80a9cc3b3e and range(s) {{{-4911109968640856406, pk{000d73797374656d5f736368656d61}}}} [shard 0] | 2023-03-20 10:43:17.395610 | 127.0.0.1 |            112 | 127.0.0.1
                                                                                                                               Reusing querier [shard 0] | 2023-03-20 10:43:17.395614 | 127.0.0.1 |            116 | 127.0.0.1
                                                                                                 [reader concurrency semaphore] executing read [shard 0] | 2023-03-20 10:43:17.395622 | 127.0.0.1 |            125 | 127.0.0.1
                 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 11 clustering row(s) (11 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2023-03-20 10:43:17.395711 | 127.0.0.1 |            213 | 127.0.0.1
                                                                                                                              Querying is done [shard 0] | 2023-03-20 10:43:17.395718 | 127.0.0.1 |            221 | 127.0.0.1
                                                                                                          Done processing - preparing a result [shard 0] | 2023-03-20 10:43:17.395734 | 127.0.0.1 |            236 | 127.0.0.1
                                                                                                                                        Request complete | 2023-03-20 10:43:17.395276 | 127.0.0.1 |            276 | 127.0.0.1

```
Fixes: https://github.com/scylladb/scylladb/issues/12781

Closes #13255

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: add trace points for important events
  reader_permit: refresh trace_state on new pages
  reader_permit: keep trace_state pointer on permit
  test/perf/perf_collection: give more unique names to key comparators
2023-03-23 15:37:33 +02:00
Botond Dénes
7699904c54 Revert "repair: Reduce repair reader eviction with diff shard count"
This reverts commit c6087cf3a0.

Said commit can cause a deadlock when 2 or more repairs compete for
locks on 2 or more nodes. Consider the following scenario:

Node n1 and n2 in the cluster, 1 shard per node, rf = 2, each shard has
1 available unit for the reader lock

    n1 starts repair r1
    r1-n1 (instance of r1 on node1) takes the reader lock on node1
    n2 starts repair r2
    r2-n2 (instance of r2 on node2) takes the reader lock on node2
    r1-n2 will fail to take the reader lock on node2
    r2-n1 will fail to take the reader lock on node1

As a result, r1 and r2 could not make progress and deadlock happens.

The complexity comes from the fact that a repair job needs lock on more
than one node. It is not guaranteed that all the participant nodes could
take the lock in one short.

There is no simple solution to this so we have to revert this locking
mechanism and look for another way to prevent reader trashing when
repairing nodes with mismatching shard count.

Fixes: #12693

Closes #13266
2023-03-23 15:35:32 +02:00
Nadav Har'El
b5e61e1b83 test/cql-pytest, lwt: test for detection of contradicting batches
Cassandra detects when a batch has both an IF EXISTS and IF NOT EXISTS
on the same row, and complains this is not a useful request (after all,
it can never succeed, because the batch can only succeed if both conditions
are true, and that can't be if one checks IF EXISTS and the other
IF NOT EXISTS).

This patch adds a test, test_lwt_with_batch_conflict_1, which checks
that this case results in an error. It passes on Cassandra, but xfails
on Scylla which doesn't report an error in this case.

A second test, test_lwt_with_batch_conflict_2, shows that the detection
of the EXISTS / NOT EXISTS conflict is special, and other conflicts
such as having both "r=1" and "r=2" for the same row, are NOT detected
by Cassandra.

Refs #13011.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13270
2023-03-23 13:35:21 +02:00
Pavel Emelyanov
b13ff5248c sstables: Mark continuous_data_consumer::reader_position() const
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13285
2023-03-23 13:27:33 +02:00
Pavel Emelyanov
bee5593ba1 storage_service: Move node_ops_meta_data to .cc file
It's declared in header, but is not used outside of .cc. Forward
declaration in header would be enough.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13289
2023-03-23 13:22:39 +02:00
Tzach Livyatan
ea66c16818 Fix Enable Authorization doc page references a wrong CL used by a 'cassandra' user
Fix https://github.com/scylladb/scylladb/issues/11633

Closes #11637
2023-03-23 13:20:36 +02:00
Kefu Chai
0421a82821 sstables: add type constraits right in parameter list
for better readability.

also, add `#include <concepts>`, as we should include what we use
instead of relying on other headers do this on behalf of us.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13277
2023-03-23 13:57:22 +03:00
Anna Stuchlik
b54868c639 doc: disable the outdated banner
This commit disables the banner that advertises ScyllaDB
University Live event, which aleardy took place.

Closes #13284
2023-03-23 08:57:45 +02:00
Kefu Chai
1197664f09 test: network_topology_strategy_test: silence warning
clang warns when the implicit conversion changes the precision of the
converted number. in this case, the before being multiplied,
`std::numeric_limits<unsigned long>::max() >> 1` is implicitly
promoted to double so it can obtain the common type of double and
unsigned long. and the compiler warns:

```
/home/kefu/dev/scylladb/test/boost/network_topology_strategy_test.cc:129:84: error: implicit conversion from 'unsigned long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Werror,-Wimplicit-const-int-float-conversion]
    return static_cast<unsigned long>(d*(std::numeric_limits<unsigned long>::max() >> 1)) << 1;
                                       ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
```
but

1. we don't really care about the precision here, we just want to map a
   double to a token represented by an int64_t
2. the maximum possible number being converted is less than
   9223372036854775807, which is the maximum number of int64_t, which
   is in general an alias of `long long`, not to mention that
   LONG_MAX is always 2147483647, after shifting right, the result
   would be 1073741823

so this is a false alarm. in order to silence it, we explicitly
cast the RHS of `*` operator to double.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13221
2023-03-23 08:55:29 +02:00
Botond Dénes
aee5dfaa84 Merge 'docs: Add card logos' from David Garcia
Related issue https://github.com/scylladb/scylladb/issues/13119

Adds product logos to cards

**Preview:**

![Welcome-to-ScyllaDB-Documentation-ScyllaDB-Docs (1)](https://user-images.githubusercontent.com/9107969/224996621-6c93676d-1427-4a28-a529-fd3cd2bc2d61.png)

Closes #13167

* github.com:scylladb/scylladb:
  docs: Update custom styles
  docs: Update styles
  docs: Add card logos
2023-03-23 08:53:58 +02:00
Botond Dénes
0f5e845399 Merge 'docs: scylladb better php driver' from Daniel Reis
Hey y'all!

Me and @malusev998 are maintaining a updated version of the [PHP Driver ](https://github.com/he4rt/scylladb-php-driver) together with @he4rt community and it had a bunch of improvements on these last month.

Before it was working only at PHP 7.1 (DataStax branch), and at our branch we have it working at PHP 8.1 and 8.2.

We are also using the ScyllaDB C++ Driver on this project and I think that is a good idea to point new users for this project since it's the most updated PHP Driver maintained now.

What do y'all think about that?

Closes #13218

* github.com:scylladb/scylladb:
  fix: links to php driver
  fix: adding php versions into driver's description
  docs: scylladb better php driver
2023-03-23 08:53:30 +02:00
Tzach Livyatan
2d40952737 DOCS: remove invalid example from DML reference, WHERE clause section
Closes #12596
2023-03-22 18:37:20 +02:00
Nadav Har'El
d1e6d9103a Merge 'api: reference httpd::* symbols like 'httpd::*'' from Kefu Chai
this change is a leftover of 063b3be8a7, which failed to include the changes in the header files.

it turns out we have `using namespace httpd;` in seastar's `request_parser.rl`, and we should not rely on this statement to expose the symbols in `seatar::httpd` to `seastar` namespace. in this change,

also, sine `get_name()` previously a non-static member function of `seastar_test` is now a static member function, so we need to update the tests which capture `this` for calling this function, so they don't capture `this` anymore.

Closes #13202

* github.com:scylladb/scylladb:
  test: drop unused captured variables
  Update seastar submodule
2023-03-22 18:16:15 +02:00
Kefu Chai
596ea6d439 test: drop unused captured variables
this should silence the warning like:
```
test/boost/multishard_mutation_query_test.cc:493:29: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
    do_with_cql_env_thread([this] (cql_test_env& env) -> future<> {
                            ^~~~
test/boost/multishard_mutation_query_test.cc:577:29: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
    do_with_cql_env_thread([this] (cql_test_env& env) -> future<> {
                            ^~~~
2 errors generated.
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-22 21:21:04 +08:00
Avi Kivity
4a18ee87eb Update seastar submodule
* seastar 9cbc1fe889...1204efbc5e (14):
  > http: Add lost pragma once into client.hh
  > prometheus, http: do not expose httpd::* in seastar
  > build: add haswell support
  > ci: fix configuration to build checkheaders target.
  > core: map_reduce: Fix use-after-free in variant with futurized reducer
  > Merge 'tests: support boost::test decorators and tolerate failures in test_spawn_input' from Kefu Chai
  > memory: support reallocing foreign (non-Seastar) memory on a reactor thread
  > test: futures: disable -Wself-move for GCC>=13
  > map_reduce: do not move a temporary object
  > doc/building-dpdk.md: drop extraneous '$'
  > http: url_decode: translate plus back into char
  > Merge 'seastar-json2code: cleanups' from Kefu Chai
  > Fix markdown formatting
  > Merge 'Minor abort on OOM changes' from Travis Downs
2023-03-22 21:21:04 +08:00
Benny Halevy
c09d0f6694 everywhere: use sstables::generation_type
Use generation_type rather than generation_type::int_t
where possible and removed the deprecated
functions accepting the int_t.i

Ref #10459

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-22 13:59:47 +02:00
Benny Halevy
b597f41b8c test: sstable_test_env: use make_new_generation
Also, add a bunch of make_sstable variants that get a
generation_type param for this.
With that, the entry points for generation_type::int_t
are deprecated and their users will be converted
in following patches.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-22 13:58:59 +02:00
Benny Halevy
a0e43af576 sstable_directory::components_lister::process: fixup indentation 2023-03-22 13:58:43 +02:00
Benny Halevy
a8dc2fda29 sstables: make highest_generation_seen return optional generation
It is possible to find no generation in an empty
table directory, and in he future, with uuid generations
it'd be possible to find no numeric generations in the
directory.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-22 13:55:23 +02:00
Benny Halevy
ba680a7b96 replica: table: add make_new_generation function
make_new_generation generates a new generation
from an optional one.

If disengaged, it just generates a new generation
based on the shard_id.  Otherwise, it generates
the next generation in sequence by adding
smp::count to the previous value, like we do today.

In the future, with uuid-based generations, the
function could be used to generate a new random
uuid based on the optional parameter.

It will be up to the caller, e.g. replica::table or
sstables manager to decide which kind of generation to
create.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-22 13:52:22 +02:00
Benny Halevy
b28eacce6f replica: table: move sstable generation related functions out of line
updating the highest generation happens only during
startup and creating sstables is done rarely enough
there is no reason to inline either functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-22 13:49:18 +02:00
Benny Halevy
d4d480a374 test: sstables: use generation_type::int_t
Convert all users to use sstables::generation_type::int_t.
Further patches will continue to convert most to
using sstables::generation_type instead so we can
abstract the value type.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-22 13:48:50 +02:00
Benny Halevy
30cc0beb47 sstables: generation_type: define int_t
So it can be used everywhere to prepare for
uuid sstable generation support.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-22 13:36:52 +02:00
Vlad Zolotarov
f94bbc5b34 transport: add per-scheduling-group CQL opcode-specific metrics
This patch extends a previous patch that added these metrics globally:
 - cql_requests_count
 - cql_request_bytes
 - cql_response_bytes

This patch adds a "scheduling_group_name" label to these metrics and changes corresponding
counters to be accounted on a per-scheduling-group level.

As a bonus this patch also marks all 3 metrics as 'skip_when_empty'.

Ref #13061

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20230321201412.3004845-1-vladz@scylladb.com>
2023-03-22 13:27:48 +02:00
Botond Dénes
ff87f95a26 reader_concurrency_semaphore: add trace points for important events
Notably, to admission execution and eviction. Registering/unregistering
the permit as inactive is not traced, as this happens on every
buffer-fill for range scans.
Semaphore trace messages have a "[reader_concurrency_semaphore]" prefix
to allow them to be clearly associated with the semaphore.
2023-03-22 04:58:18 -04:00
Botond Dénes
1f51f752cc reader_permit: refresh trace_state on new pages
To make sure all tracing done on a certain page will make its way into
the appropriate trace session.
This is a contination of the previous patch (which added trace pointer
to the permit).
2023-03-22 04:58:10 -04:00
Botond Dénes
156e5d346d reader_permit: keep trace_state pointer on permit
And propagate it down to where it is created. This will be used to add
trace points for semaphore related events, but this will come in the
next patches.
2023-03-22 04:58:01 -04:00
Botond Dénes
27a4c24522 test/perf/perf_collection: give more unique names to key comparators
perf.cc has two key comparators: key_compare and key_tri_compare. These
are very generic name, in fact key_compare directly clashes with a
comparator with the same name in types.hh. Avoid the clash by renaming
both of these to a more unique name.
2023-03-22 04:58:01 -04:00
Nadav Har'El
2038388268 cql-pytest: translate Cassandra's tests for multi-column relations
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectMultiColumnRelationTest.java into our
cql-pytest framework.

The tests reproduce four already-known Scylla bugs and three new bugs.
All tests pass on Cassandra. Because of these bugs 9 of the 22 tests
are marked xfail, and one is marked skip (it crashes Scylla).

Already known issues:

Refs    #64: CQL Multi column restrictions are allowed only on a clustering
             key prefix
Refs  #4178: Not covered corner case for key prefix optimization in filtering
Refs  #4244: Add support for mixing token, multi- and single-column
             restrictions
Refs  #8627: Cleanly reject updates with indexed values where value > 64k

New issue discovered by these tests:

Refs #13217: Internal server error when null is used in multi-column relation
Refs #13241: Multi-column IN restriction with tuples of different lengths
             crashes Scylla
Refs #13250: One-element multi-column restriction should be handled like a
             single-column restriction

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13265
2023-03-22 09:54:32 +02:00
Tzach Livyatan
083408723f doc: Add Mumur term to the glossery
Point to the difference between the official MurmurHash3 and Scylla / Cassandra implementation

Update docs/glossary.rst

Co-authored-by: Anna Stuchlik <37244380+annastuchlik@users.noreply.github.com>

Closes #11369
2023-03-21 22:45:47 +02:00
Alejo Sanchez
da00052ad8 gms, service: replicate live endpoints on shard 0
Call replicate_live_endpoints on shard 0 to copy from 0 to the rest of
the shards. And get the list of live members from shard 0.

Move lock to the callers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13240
2023-03-21 15:46:12 +01:00
Gleb Natapov
fd6d45e178 bootstrapper: Add get_random_bootstrap_tokens function
Does the same as get_bootstrap_tokens() but does not consult
initial token config option. Will be used later.
2023-03-21 16:06:43 +02:00
Gleb Natapov
fc84c69b7e service: raft: add support for topology_change command into raft_group0_client
Extend raft_group0_client::prepare_command with support of
topology_change type of command.
2023-03-21 16:06:43 +02:00
Gleb Natapov
16d61e791f service: raft: introduce topology_change group0 command
Also extend group0_command to be able to send new command type. The
 command consists of a mutation array.
2023-03-21 16:06:43 +02:00
Gleb Natapov
5e232ebee5 system_keyspace: add a table to persist topology change state machine's state
Add local table to store topology change state machine's state there.
Also add a function that loads the state to memory.
2023-03-21 16:06:43 +02:00
Gleb Natapov
a2b7d2c1a1 service: Introduce topology state machine data structures
The topology state machine will track all the nodes in a cluster,
their state, properties (topology, tokens, etc) and requested actions.

Node state can be one of those:
 none             - the node is not yet in the cluster
 bootstrapping    - the node is currently bootstrapping
 decommissioning  - the node is being decommissioned
 removing         - the node is being removed
 replacing        - the node is replacing another node
 normal           - the node is working normally
 rebuild          - the node is being rebuilt
 left             - the node is left the cluster

Nodes in state left are never removed from the state.

Tokens also can be in one of the states:

write_both_read_old - writes are going to new and old replica, but reads are from
                      old replicas still
write_both_read_new - writes still going to old and new replicas but reads are
                      from new replica
owner               - tokens are owned by the node and reads and write go to new
                      replica set only

Tokens that needs to be move start in 'write_both_read_old' state. After entire
cluster learns about it streaming start. After the streaming tokens move
to 'write_both_read_new' state and again the whole cluster needs to learn about it
and make sure no reads started before that point exist in the system.
After that tokens may move to 'owner' state.

topology_request is the field through which a topology operation request
can be issued to a node. A request is one of the topology operation
currently supported: join, leave, replace or remove.
2023-03-21 16:06:43 +02:00
Gleb Natapov
dd1e27736e storage_proxy: not consult topology on local table write
Writes to tables with local replication strategies do not need to consult
the topology. This is not only an optimization but it allows writing
into the local tables before topology is known.
2023-03-21 16:06:43 +02:00
Anna Stuchlik
922f6ba3dd doc: fix the service name in upgrade guides
Fixes https://github.com/scylladb/scylladb/issues/13207

This commit fixes the service and package names in
the upgrade guides 5.0-to-2022.1 and 5.1-to-2022.2.
Service name: scylla-server
Package name: scylla-enterprise

Previous PRs to fix the same issue in other
upgrade guides:
https://github.com/scylladb/scylladb/pull/12679
https://github.com/scylladb/scylladb/pull/12698

This commit must be backported to branch-5.1 and branch 5.2.

Closes #13225
2023-03-21 15:56:28 +02:00
Kefu Chai
124410c059 api: reference httpd::* symbols like 'httpd::*'
this change is a leftover of 063b3be,
which failed to include the changes in the header files.

it turns out we have `using namespace httpd;` in seastar's
`request_parser.rl`, and we should not rely on this statement to
expose the symbols in `seatar::httpd` to `seastar` namespace.
in this change,

* api/*.hh: all httpd symbols are referenced by `httpd::*`
  instead of being referenced as if they are in `seastar`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-21 15:49:10 +02:00
Avi Kivity
19810cfc5e transport: correctly format unknown opcode
gcc allows an enum to contain values outside its members. For extra
safety, as this can be user visible, format the unknown opcode and
return it.
2023-03-21 15:43:00 +02:00
Avi Kivity
e75009cd49 treewide: catch by reference
gcc rightly warns about capturing by value, so capture by
reference.
2023-03-21 15:43:00 +02:00
Avi Kivity
eaad38c682 test: raft: avoid confusing string compare
gcc doesn't like comparing a C string to an sstring -- apparently
it has different promotion rules than clang. Fix by doing an
explicit conversion.
2023-03-21 15:43:00 +02:00
Avi Kivity
bdfc0aa748 utils, types, test: extract lexicographical compare utilities
UUID_test uses lexicograhical_compare from the types module. This
is a layering violation, since UUIDs are at a much lower level than
the database type system. In practical terms, this cause link failures
with gcc due to some thread-local-storage variables defined in types.hh
but not provided by any object, since we don't link with types.o in this
test.

Fix by extracting the relevant functions into a new header.
2023-03-21 15:42:53 +02:00
Avi Kivity
32a724fada test: raft: fsm_test: disambiguate raft::configuration construction
gcc thinks the constructor call is ambiguous since "{}" can match
the default constructor. Fix by making the parameter type explicit.

Use "{}" for the constructor call to avoid the most-vexing-parse
problem.
2023-03-21 13:45:57 +02:00
Avi Kivity
83e149c341 test: reader_concurrency_semaphore_test: handle all enum values
gcc considers values outside the enum class enumeration lists to be
valid, so handle them. In this case, we don't think they can happen,
so abort.
2023-03-21 13:45:57 +02:00
Avi Kivity
bc0bba10b4 repair: fix signed/unsigned compare
Fix the loop induction variable to have the same type as
the termination value.
2023-03-21 13:45:49 +02:00
Avi Kivity
94a10ed6ab repair: fix incorrect signed/unsigned compare
A signed/unsigned compare can overflow. Fix by using the safer
std::cmp_greater().

The problem is minor as the user is unlikely to send a negative id.
2023-03-21 13:45:34 +02:00
Avi Kivity
a806024e1d treewide: avoid unused variables in if statements
gcc warns about unused variables declared in if statements. Just
drop them.
2023-03-21 13:42:49 +02:00
Avi Kivity
9ced89a41c keys: disambiguate construction from initializer_list<bytes>
Some tests initialize via an initializer_list, but gcc finds other
valid constructors via vector<managed_bytes>. Disambiguate by adding
a constructor that accepts the initializer_list, and forward to the
wanted constructor.
2023-03-21 13:42:49 +02:00
Avi Kivity
41a2856f78 cql3: expr: fix serialize_listlike() reference-to-temporary with gcc
serialize_listlike() is called with a range of either managed_bytes
or managed_bytes_opt. If the former, then iterating and assigning
to a loop induction variable of type managed_byted_opt& will bind
the reference to a temporary managed_bytes_opt, which gcc dislikes.

Fix by performing the binding in a separate statement, which allows
for lifetime extension.
2023-03-21 13:42:49 +02:00
Avi Kivity
32cc975b2f compaction: error on invalid scrub type
gcc allows an enum to contain a value outside its enum set,
so we need to handle it. Since it shouldn't happen, signal
an internal error.
2023-03-21 13:42:49 +02:00
Avi Kivity
7bb717d2f9 treewide: prevent redefining names
gcc dislikes a member name that matches a type name, as it changes
the type name retroactively. Fix by fully-qualifying the type name,
so it is not changed by the newly-introduced member.
2023-03-21 13:42:49 +02:00
Avi Kivity
7ab65379b9 api: task_manager: fix signed/unsigned compare
Trivial fix by changing the type of the induction variable.
2023-03-21 13:42:42 +02:00
Avi Kivity
429650e508 alternator: streams: fix signed/unsigned comparison
We compare a signed variable to an unsigned one, which can
yield surprising results. In this case, it is harmless since
we already validated the signed input is positive, but
use std::cmp_less() to quench any doubts (and warnings).
2023-03-21 13:41:53 +02:00
Nadav Har'El
77bf90bf7d Merge 'Sanitize {format_types|version_types} to/from string converters' from Pavel Emelyanov
There's a need to convert both -- version and format -- to string and back. Currently, there's a disperse set of helpers in sstables/ code doing that and this PR brings some other to it

- adds fmt::formatter<> specialization for both types
- leaves one set of {format|version}_from_string() helpers converting any string-ish object into value

refs: #12523

Closes #13214

* github.com:scylladb/scylladb:
  sstables: Expell sstable_version_types from_string() helper
  sstables: Generalize ..._from_string helpers
  sstables: Implement fmt::formatter<sstable_format_types>
  sstables: Implement fmt::formatter<sstable_version_types>
  sstables: Move format maps to namespace scope
2023-03-21 13:39:24 +02:00
Avi Kivity
0770b328c7 test: fix some mismatched signed/unsigned comparisons
gcc likes to complain about sized/unsigned compares as they
can yield surprising results. The fixes are trivial, so apply
them.
2023-03-21 13:15:12 +02:00
Pavel Emelyanov
970fc80ea6 feature_service: Indentation fix after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-21 11:59:37 +03:00
Pavel Emelyanov
8600cb2db0 feature_service: Move async context into enable()
Callers don't need to know that enabling features has this requirement
Indentation is deliberately left broken (until next patch)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-21 11:59:34 +03:00
Pavel Emelyanov
ae6e29a919 system_keyspace: Refactor local features load/save helpers
Introduce load_local_enabled_features() and save_local_enabled_features()
that get and put std::set<sstring> with feature names (and perform set to
string and back conversions on their own). They look natural next to
existing sys.ks. methods to get/set local-supported features and peer
features.

Using the new API, the more generic functions to preserve individual
features and load them on startup can become much shorter and cleaner.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-21 11:54:02 +03:00
Wojciech Mitros
406ea34aba build: add wasm compilation target for rust
In the future, when testing WASM UDFs, we will only store the Rust
source codes of them, and compile them to WASM. To be able to
do that, we need rust standard library for the wasm32-wasi target,
which is available as an RPM called rust-std-static-wasm32-wasi.

Closes #12896

[avi: regenerate toolchain]

Closes #13258
2023-03-21 10:30:08 +02:00
Pavel Emelyanov
6a5ab87441 feature_service: Mark supported_feature_set() const
It's indeed such

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-21 11:12:29 +03:00
Pavel Emelyanov
985fbf703a feature_service: Remove single feature enabling method
No longer used

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-21 11:12:28 +03:00
Pavel Emelyanov
b27d2c9399 boot: Enable features in batch
On boot main calls enable_features_on_startup() which at the end scans
through the list of features and enables them. Same as in previous patch
-- it makes sense to use batch enabling here.

Note, that despite the loop that collects features is not as trivial as
in previous patch (gossiper case), it still operates with local copies
of feature sets so delaying the feature's enabling doesn't affect other
features' need to be enabled too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-21 11:12:25 +03:00
Pavel Emelyanov
256dd9d7e3 gossiper: Enable features in batch
Gossiper code walks the list of feature names and enables them
one-by-one. However, in the feature_service code there's a method that
enables features in batch.

Using it now doesn't make any difference, but next patches will make
some use of it. Also, this will let shortening feature_service's API and
will make it simpler to remove qctx thing from there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-21 11:12:16 +03:00
Pavel Emelyanov
fe7609865d Merge 'reader_concurrency_semaphore: improve diagnostics printout' from Botond Dénes
Remove redundant "Total: ..." line.
Include the entire `reader_concurrency_semaphore::stats` in the printout. This includes a lot of metrics not exported to monitoring. These metrics are very valuable when debugging timeouts but are otherwise uninteresting. To avoid bloating our monitoring with such niche metrics, we dump them when they are interesting: when timeouts happen. To be really helpful, we do need historic values too, but this shouldn't be a problem: timeouts come in bursts, we usually get at least a handful of diagnostics dumps at a time.
New stats are also added to record the reason why reads are queued on the semaphore.

Printout before:
```
INFO  2023-03-14 12:43:54,496 [shard 0] reader_concurrency_semaphore - Semaphore test_reader_concurrency_semaphore_memory_limit_no_leaks with 4/4 count and 7168/4096 memory resources: kill limit triggered, dumping permit diagnostics:
permits count   memory  table/description/state
4       4       7K      *.*/reader/active/unused
2       0       0B      *.*/reader/waiting_for_admission

6       4       7K      total

Total: 6 permits with 4 count and 7K memory resources
```

Printout after:
```
INFO  2023-03-16 04:23:41,791 [shard 0] reader_concurrency_semaphore - Semaphore test_reader_concurrency_semaphore_memory_limit_no_leaks with 3/4 count and 7168/4096 memory resources: kill limit triggered, dumping permit diagnostics:
permits count   memory  table/description/state
2       2       6K      *.*/reader/active/unused
1       1       1K      *.*/reader/waiting_for_memory
2       0       0B      *.*/reader/waiting_for_admission

5       3       7K      total

Stats:
permit_based_evictions: 0
time_based_evictions: 0
inactive_reads: 0
total_successful_reads: 0
total_failed_reads: 0
total_reads_shed_due_to_overload: 0
total_reads_killed_due_to_kill_limit: 1
reads_admitted: 4
reads_enqueued_for_admission: 4
reads_enqueued_for_memory: 5
reads_admitted_immediately: 2
reads_queued_because_ready_list: 0
reads_queued_because_used_permits: 0
reads_queued_because_memory_resources: 0
reads_queued_because_count_resources: 4
reads_queued_with_eviction: 0
total_permits: 6
current_permits: 5
used_permits: 0
blocked_permits: 0
disk_reads: 0
sstables_read: 0
```

Closes #13173

* github.com:scylladb/scylladb:
  test/boost/reader_concurrency_semaphore_test: remove redundant stats printouts
  reader_concurrency_semaphore: do_dump_reader_permit_diagnostics(): print the stats
  reader_concurrency_semaphore: add stats to record reason for queueing permits
  reader_concurrency_semaphore: can_admit_read(): also return reason for rejection
2023-03-21 10:41:11 +03:00
Pavel Emelyanov
eecb9244dd sstables: Expell sstable_version_types from_string() helper
It's name is too generic despite it's narrow specialization. Also,
there's a version_from_string() method that does the same in a more
convenient way.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-21 09:56:18 +03:00
Pavel Emelyanov
4e99637777 sstables: Generalize ..._from_string helpers
There are two string->{version|format} converters living on class
sstable. It's better to have both in namespace scope. Surprisingly,
there's only one caller of it.

Also this patch makes both accept std::string_view not to limit the
helpers in converting only sstring&-s. This changes calls for
reverse_map template update with "heterogenuous lookup".

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-21 09:56:18 +03:00
Pavel Emelyanov
bb59dc2ec1 sstables: Implement fmt::formatter<sstable_format_types>
Same as in previous patch for another enum-class type.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-21 09:56:18 +03:00
Pavel Emelyanov
6b04eb74d6 sstables: Implement fmt::formatter<sstable_version_types>
This way the version type can be fed as-is into fmt:: code, respectively
the conversion to string is as simple as fmt::to_string(v). So also drop
the explicit existing to_string() helper updating all callers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-21 09:56:18 +03:00
Pavel Emelyanov
ea1c6fbf98 sstables: Move format maps to namespace scope
They will be used by fmt::formatter specification for version and format
types in next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-21 09:56:18 +03:00
Nadav Har'El
511308bccf test/cql-pytest: tests for single-element multi-column restrictions
It turns out that Cassandra handles a restriction like `(c2) = (1)` just
like `c2 = 1`, and is not limited like multi-column restrictions. In
particular, this query works despite missing "c1", and may also use an
index if c2 is indexed.

But currently in Scylla, `(c2) = (1)` is handled like a multi-column
restriction, so complains if c2 is not the first clustering key column,
and cannot use an index.

This patch adds several tests demonstrating this difference between
Scylla and Cassandra (#13250). The xfailing tests pass on Cassandra
but fail on Scylla.

Refs #13250

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13252
2023-03-21 07:56:24 +02:00
Anna Stuchlik
26bb36cdf5 doc: related https://github.com/scylladb/scylladb/issues/12754; add the missing information about reporting latencies to the upgrade guide 5.1 to 5.2
Closes #12935
2023-03-21 07:17:07 +02:00
Kefu Chai
faa47e9624 mutation: drop operator<<(ostream, const range_tombstone{_change,} &)
as all of its callers have been removed, let's drop these two operators.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-21 11:37:07 +08:00
Kefu Chai
d146535ec6 mutation: use fmtlib to print range_stombstone{_change,}
prepare for removing `operator<<(std::ostream&, const range_tombstone&)` and
`operator<<(std::ostream& out, const range_tombstone_change&)`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-21 11:37:07 +08:00
Kefu Chai
755aea8e7f mutation: mutation_fragment_v2: specialize fmt::formatter<range_tombstone_change>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print range_tombstone_change without using ostream<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-21 11:37:07 +08:00
Kefu Chai
4af0a0ed19 mutation: range_tombstone: specialize fmt::formatter<range_tombstone>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print range_tombstone.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-21 11:37:07 +08:00
Daniel Reis
3d1c78bdcc fix: links to php driver 2023-03-20 15:28:00 -03:00
Daniel Reis
f83f844319 fix: adding php versions into driver's description 2023-03-20 15:25:52 -03:00
Kefu Chai
b11fd28a46 dist/redhat: split Requires section into multiple lines
for better readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-20 22:25:24 +08:00
Kefu Chai
7165551fd7 dist/redhat: enforce dependency on %{release} also
s/%{version}/%{version}-%{release}/ in `Requires:` sections.

this enforces the runtime dependencies of exactly the same
releases between scylla packages.

Fixes #13222
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-20 22:25:24 +08:00
Avi Kivity
0f97d464d3 Merge 'cql: check if the function is builtin when granting permissisons' from Wojciech Mitros
Currently, when granting a permission on a funciton resource, we only check if the function exists, regardless of whether it's a user or a builtin function. We should not support altering permissions on builtin functions, so this patch adds a check for confirming that the found function is not builtin.

Additionally, adjust an error exception thrown when trying to alter a permission that does not apply on a given resource

Closes #13184

* github.com:scylladb/scylladb:
  cql: change exception type when granting incorrect permissions
  cql: check if the function is builtin when granting permissisons
2023-03-20 16:17:02 +02:00
Kefu Chai
476bd84dd0 config: add a space before parameter
for better consistency in the code formatting.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13248
2023-03-20 16:03:00 +02:00
Botond Dénes
bf8b746bca Merge 'utils: UUID: specialize fmt::formatter for UUID and tagged_uuid<>' from Kefu Chai
this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print UUID without using ostream<<. also, this change re-implements some formatting helpers using fmtlib for better performance and less dependencies on operator<<(), but we cannot drop it at this moment, as quite a few caller sites are still using operator<<(ostream&, const UUID&) and operator<<(ostream&, tagged_uuid<T>&). we will address them separately.

* add `fmt::formatter<UUID>`
* add `fmt::formatter<tagged_uuid<T>>`
* implement `UUID::to_string()` using `fmt::to_string()`
* implement `operator<<(std::ostream&, const UUID&)` with `fmt::print()`, this should help to improve the performance when printing uuid, as `fmt::print()` does not materialize a string when printing the uuid.
* treewide: use fmtlib when printing UUID

Refs #13245

Closes #13246

* github.com:scylladb/scylladb:
  treewide: use fmtlib when printing UUID
  utils: UUID: specialize fmt::formatter for UUID and tagged_uuid<>
2023-03-20 14:26:11 +02:00
Gleb Natapov
34d41177fe storage_service: pass storage_proxy and system_distributed_keyspace objects to messaging initialization
Will be needed there later.

Message-Id: <20230316112801.1004602-14-gleb@scylladb.com>
2023-03-20 11:58:50 +01:00
Gleb Natapov
d8edd2055f service: raft: add several accessors to group0 class
They will be used by later patches.

Message-Id: <20230316112801.1004602-13-gleb@scylladb.com>
2023-03-20 11:57:18 +01:00
Gleb Natapov
7d535a84bb servers: raft: make remove_from_raft_config public
Will be used by later patches.

Message-Id: <20230316112801.1004602-11-gleb@scylladb.com>
2023-03-20 11:47:55 +01:00
Gleb Natapov
f017aa1ad3 service: raft: pass storage service to group0_state_machine
To apply topology_change commands group0_state_machine needs to have an
access to the storage service to support topology changes over raft.

Message-Id: <20230316112801.1004602-10-gleb@scylladb.com>
2023-03-20 11:45:57 +01:00
Gleb Natapov
a690070722 raft_sys_table_storage: give initial snapshot a non zero value
We create a snapshot (config only, but still), but do not assign it any
id. Because of that it is not loaded on start. We do want it to be
loaded though since the state of group0 will not be re-created from the
log on restart because the entries will have outdated id and will be
skipped. As a result in memory state machine state will not be restored.
This is not a problem now since schema state it restored outside of raft
code.

Message-Id: <20230316112801.1004602-5-gleb@scylladb.com>
2023-03-20 11:45:38 +01:00
Gleb Natapov
2fc8e13dd8 raft: add server::wait_for_state_change() function
Add a function that allows waiting for a state change of a raft server.
It is useful for a user that wants to know when a node becomes/stops
being a leader.

Message-Id: <20230316112801.1004602-4-gleb@scylladb.com>
2023-03-20 11:31:55 +01:00
Gleb Natapov
59f7aeb79b raft: move some functions out of ad-hoc section
Make tick() and is_leader() part of the API. First is used externally
already and another will be used in following patches.

Message-Id: <20230316112801.1004602-3-gleb@scylladb.com>
2023-03-20 11:25:19 +01:00
Nadav Har'El
c550e681d7 test/rest_api: fix flaky test for toppartitions
The REST test test_storage_service.py::test_toppartitions_pk_needs_escaping
was flaky. It tests the toppartition request, which unfortunately needs
to choose a sampling duration in advance, and we chose 1 second which we
considered more than enough - and indeed typically even 1ms is enough!
but very rarely (only know of only one occurance, in issue #13223) one
second is not enough.

Instead of increasing this 1 second and making this test even slower,
this patch takes a retry approach: The tests starts with a 0.01 second
duration, and is then retried with increasing durations until it succeeds
or a 5-seconds duration is reached. This retry approach has two benefits:
1. It de-flakes the test (allowing a very slow test to take 5 seconds
instead of 1 seconds which wasn't enough), and 2. At the same time it
makes a successful test much faster (it used to always take a full
second, now it takes 0.07 seconds on a dev build on my laptop).

A *failed* test may, in some cases, take 10 seconds after this patch
(although in some other cases, an error will be caught immediately),
but I consider this acceptable - this test should pass, after all,
and a failure indicates a regression and taking 10 seconds will be
the last of our worries in that case.

Fixes #13223.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13238
2023-03-20 11:32:53 +02:00
Kefu Chai
0ba6627d5c wasm: block all signals in alien thread
as in main(), we use `stop_signal` to handle SIGINT and SIGTERM,
so when scylla receives a SIGTERM, the corresponding signal handler
could get called on any threads created by this program. so there
is chance that the alien_runner thread could be choosen to run the
signal handler setup by `main()`, but that signal handler assumes
the availability of Seastar reactor. unfortunately, we don't have
a Seastar reactor in alien thread. the same applies to Seastar's
`thread_pool` which handles the slow and blocking POSIX calls typically
used for interacting with files.

so, in this change, we use the same approach as Seastar's
`thread_pool::work()` -- just block all signals, so the alien threads
used by wasm for compiling UDF won't handle the signals using the
handlers planted by `main()`.

Fixes #13228
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13233
2023-03-20 11:20:19 +02:00
Avi Kivity
bab29a2f27 Merge 'Unit tests cleanup for sstable generation changes' from Benny Halevy
This series cleans up unit test in preparation for PR #12994.
Helpers are added (or reused) to not rely on specific sstable generation numbers where possible (other than loading reference sstables that are committed to the repo with given generation numbers), and to generate the sstables for tests easily, taking advantage of generation management in `sstable_test_env`, `table_for_tests`, or `replica::table` itself.

Closes #13242

* github.com:scylladb/scylladb:
  test: add verify_mutation helpers.
  test: add make_sstable_containing memtable
  test: table_for_tests: add make_sstable function
  test: sstable_test_env: add make_sst_factory methods
  test: sstable_compaction_test: do not rely on specific generations
  tests: use make_sstable defaults as much as possible
  test: sstable_test_env: add make_table_for_tests
  test: sstable_datafile_test: do not rely on sepecific sstable generations
  test: sstable_test_env: add reusable_sst(shared_sstable)
  sstable: expose get_storage function
  test: mutation_reader_test: create_sstable: do not rely on specific generations
  test: mutation_reader_test: do_test_clustering_order_merger_sstable_set: rely on test_envsstable generation
  test: mutation_reader_test: combined_mutation_reader_test: define a local sst_factory function
  test: mutation_reader_test: do not use tmpdir
  test: use big format by default
  test: sstable_compaction_test: use highest sstable version by default
  test: test_env: make_db_config: set cfg host_id
  test: sstable_datafile_test: fixup indentation
  test: sstable_datafile_test: various tests: do_with_async
  test: sstable_3_x_test: validate_read, sstable_assertions: get shared_sstable
  test: sstable_3_x_test: compare_sstables: get shared_sstable
  test: sstable_3_x_test: write_sstables: return shared_sstable
  test: sstable_3_x_test: write, compare, validate_sstables: use env.tempdir
  test: sstable_3_x_test: compacted_sstable_reader: do not reopen compacted_sst
  test: lib: test_services: delete now unused stop_and_keep_alive
  test: sstable_compaction_test: use deferred_stop to stop table_for_tests
  test: sstable_compaction_test: compound_sstable_set_incremental_selector_test: do_with_async
  test: sstable_compaction_test: sstable_needs_cleanup_test: do_with_async
  test: sstable_compaction_test: leveled_05: fixup indentation
  test: sstable_compaction_test: leveled_05: do_with_async
  test: sstable_compaction_test: compact_02: do_with_async
  test: sstable_compaction_test: compact_sstables: simplify variable allocation
  test: sstable_compaction_test: compact_sstables: reindent
  test: sstable_compaction_test: compact_sstables: use thread
  test: sstable_compaction_test: sstable_rewrite: simplify variable allocation
  test: sstable_compaction_test: sstable_rewrite: fixup indentation
  test: sstable_compaction_test: sstable_rewrite: do_with_async
  test: sstable_compaction_test: compact: fixup indentation
  test: sstable_compaction_test: compact: complete conversion to async thread
  test: sstable_compaction_test: compaction_manager_basic_test: rename generations to idx
2023-03-20 11:16:46 +02:00
Nadav Har'El
8b0822be77 test/cql-pytest: reproducer for bug crashing Scylla on mismatched tuple
This patch addes a reproducing test for issue #13241, where attempting
a SELECT restriction (b,c,d) IN ((1,2)) - where the tuple is shorter
than needed - crashes Scylla (on segmentation fault) instead of
generating a clean error as it should (and as done on Cassandra).

The test also demonstractes that if the tuple is longer than needed
(instead of shorter), the behavior is correct, and it is also
correct if "=" is used instead of IN. Only the combination of IN
and too-short tuple seems to be broken - but broken in a bad way
(can be used to crash Scylla).

Because the test crashes Scylla when fails, it is marked "skip".

Refs #13241

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13244
2023-03-20 11:13:02 +02:00
Anna Stuchlik
fc927b1774 doc: add the Enterprise vs. OSS Matrix
Fixes https://github.com/scylladb/scylladb/issues/12758

This commit adds a new page with a matrix that shows
on which ScyllaDB Open Source versions we based given
ScyllaDB Enterprise versions.

The new file is added to the newly created Reference
section.

Closes #13230
2023-03-20 10:18:10 +02:00
Kefu Chai
94c6df0a08 treewide: use fmtlib when printing UUID
this change tries to reduce the number of callers using operator<<()
for printing UUID. they are found by compiling the tree after commenting
out `operator<<(std::ostream& out, const UUID& uuid)`. but this change
alone is not enough to drop all callers, as some callers are using
`operator<<(ostream&, const unordered_map&)` and other overloads to
print ranges whose elements contain UUID. so in order to limit the
 scope of the change, we are not changing them here.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-20 15:38:45 +08:00
Kefu Chai
c14c70b89d utils: UUID: specialize fmt::formatter for UUID and tagged_uuid<>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print UUID without using ostream<<. also, this change reimplements
some formatting helpers using fmtlib for better performance and less
dependencies on operator<<(), but we cannot drop it at this moment,
as quite a few caller sites are still using operator<<(ostream&, const UUID&)
and operator<<(ostream&, tagged_uuid<T>&). we will address them separately.

* add fmt::formatter<UUID>
* add fmt::formatter<tagged_uuid<T>>
* implement UUID::to_string() using fmt::to_string()
* implement operator<<(std::ostream&, const UUID&) with fmt::print(),
  this should help to improve the performance when printing uuid, as
  fmt::print() does not materialize a string when printing the uuid.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-20 14:25:45 +08:00
Botond Dénes
583e49dd09 Merge 'cmake: sync with configure.py (14/n)' from Kefu Chai
this is the 14rd changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals:
  - to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience
  - to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules.

this changeset includes following changes:

- build: cmake: promote add_scylla_test() to test/
- build: cmake: add all tests

Closes #13220

* github.com:scylladb/scylladb:
  build: cmake: add all tests
  build: cmake: promote add_scylla_test() to test/
2023-03-20 08:13:07 +02:00
Pavel Emelyanov
c88e47a624 memory_data_sink: Add move ctor
To make it possible to move the class member away resetting to be be
empty at the same time.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13208
2023-03-20 07:55:20 +02:00
Pavel Emelyanov
b631081df8 test: Fixie for test sstable chdir
Some unit tests want to change the sstable::_dir on the fly. However,
the sstable::_dir is going away, so it needs a yet another virtual call
on storage driver.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13213
2023-03-20 07:28:22 +02:00
Benny Halevy
d62df5cac6 test: add verify_mutation helpers.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 17:48:22 +02:00
Benny Halevy
cf4eaa1fbc test: add make_sstable_containing memtable
Helper for make_sstable + write_memtable_to_sstable_for_test
+ reusable_sst / load.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 17:48:22 +02:00
Benny Halevy
0ce6afb5f9 test: table_for_tests: add make_sstable function
table_for_tests uses a sstables manager to generate sstables
and gets the new generation from
table.calculate_generation_for_new_table().

The version to use is either the highest supported or
an ad-hoc version passed to make_sstable.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 17:48:22 +02:00
Benny Halevy
88d085ea66 test: sstable_test_env: add make_sst_factory methods
The tests extensively use a `std::function<shared_sstable()>`
to generate new tables.

Rather than handcrafting them all over the place,
let sstable_test_env return such factory given a schema
(and another entry point that also gets a version)
and that uses the embedded generation_factory in the test_env
to generate new sstables with unique generations.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 17:48:22 +02:00
Benny Halevy
c308ba635b test: sstable_compaction_test: do not rely on specific generations
No need to maintain a static generation numbers in the test.
Let the sstable_test_env dispatch sstable generations automatically
And use the generated sstable themselves for reference rather
than their generation numbers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 17:47:46 +02:00
Benny Halevy
51b2c38472 tests: use make_sstable defaults as much as possible
Add a few goodies to sstable_test_env to extend
entry points with default params for make_sstable
and reusable_sst.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 17:47:14 +02:00
Benny Halevy
084f4e4fde test: sstable_test_env: add make_table_for_tests
Wrap table_for_tests ctor to pass the env sstables_manager
as well as the temporary directory path, as this is the
most common use case, and in preparation for adding
a make_sstable method in table_for_tests.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 17:47:14 +02:00
Benny Halevy
e9af4e4cd8 test: sstable_datafile_test: do not rely on sepecific sstable generations
There is no need to use specific generations in the test, just
rely on the ones sstable_test_env generates.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 17:46:47 +02:00
Benny Halevy
94192f0ded test: sstable_test_env: add reusable_sst(shared_sstable)
Allow generating a sstable object from an existing
sstable to get the directory, generation, and version
from it, rather than passing them to reusable_sst
from other sources - since the intention is
to get a new sstable object based on an existing
sstable that was generated by the test.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 17:20:07 +02:00
Benny Halevy
b11e2c81ae sstable: expose get_storage function
To be used by sstable_test_env to reopen existing sstables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 17:19:12 +02:00
Benny Halevy
e9c3f0e478 test: mutation_reader_test: create_sstable: do not rely on specific generations
No need to maintain a static generation numbers in the test.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
648ab706df test: mutation_reader_test: do_test_clustering_order_merger_sstable_set: rely on test_envsstable generation
Rather than maintaining a running generation number,
use the default env.make_sstable(s) in sst_factory
and collect the expected generations from the resulting
shared sstable.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
11595b3024 test: mutation_reader_test: combined_mutation_reader_test: define a local sst_factory function
For generating shared_sstables with increasing generations
(using the test_env make_sstable generations) and a given level.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
506dc1260f test: mutation_reader_test: do not use tmpdir
Rely on the test_env temporary directory instead.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
ceb5d4fb47 test: use big format by default
No need to pass the big format explicitly as it's
set by default by make_sstable and it is never overriden.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
f24b69a6ae test: sstable_compaction_test: use highest sstable version by default
Tests should just generate the highest sstable version
available.  There is no need to ontinue testing old versions,
in particular partially supported ones like "la".

Use also the default values for sstable::format_types, buffer_size,
etc. if there's no particular need to override them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
df5347fca8 test: test_env: make_db_config: set cfg host_id
So we can safely use `me` sstables in sstable_directory_test
that validates the sstable host owner.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
8b168869be test: sstable_datafile_test: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
1fce7c76a5 test: sstable_datafile_test: various tests: do_with_async
To simplify further cleanups.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
2954feb734 test: sstable_3_x_test: validate_read, sstable_assertions: get shared_sstable
Pass the test-generated shared_sstable to validate_read
and then to sstable_assertions so it can be used
for make_sstable version and generation params.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
969ec8611e test: sstable_3_x_test: compare_sstables: get shared_sstable
Use the sstable generated by the test to generate
the result_filename we want for compare.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
3ba0d1659c test: sstable_3_x_test: write_sstables: return shared_sstable
To be pssed to compare_sstable in the next patch,
so it can generate to result filename out of it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
4c842fb0e8 test: sstable_3_x_test: write, compare, validate_sstables: use env.tempdir
Do not create a tmpdir every time, just use
the one that the sstable test env provides.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
71c0c713ee test: sstable_3_x_test: compacted_sstable_reader: do not reopen compacted_sst
Just use the one we created during compaction
for verification so we won't have to rely on a particular
generation/version.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
e385575407 test: lib: test_services: delete now unused stop_and_keep_alive
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
0bf60d42aa test: sstable_compaction_test: use deferred_stop to stop table_for_tests
Rather than calling cf.stop_and_keep_alive() before the test exits.
since it must be stopped also on failure.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
208726d987 test: sstable_compaction_test: compound_sstable_set_incremental_selector_test: do_with_async
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
9d83a94c28 test: sstable_compaction_test: sstable_needs_cleanup_test: do_with_async
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
d8a354a35e test: sstable_compaction_test: leveled_05: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
8b8c1c5813 test: sstable_compaction_test: leveled_05: do_with_async
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
d1879a5932 test: sstable_compaction_test: compact_02: do_with_async
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
76799d08d6 test: sstable_compaction_test: compact_sstables: simplify variable allocation
No need to use lw_shared all over the place now that
the function ises a seastar thread.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
af106684ae test: sstable_compaction_test: compact_sstables: reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
8de808ff15 test: sstable_compaction_test: compact_sstables: use thread
Prepare for using make_sstable_containing in a follow up patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
f4989f2ba5 test: sstable_compaction_test: sstable_rewrite: simplify variable allocation
No need to use lw_shared all over the place now that
the function ises a seastar thread.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
fb379709cf test: sstable_compaction_test: sstable_rewrite: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
b27910cff2 test: sstable_compaction_test: sstable_rewrite: do_with_async
simplify flow using seastar thread.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
d1a112a156 test: sstable_compaction_test: compact: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
d503eb75f1 test: sstable_compaction_test: compact: complete conversion to async thread
We already use test_env::do_with_async in this function
but we didn't take full advantage of it to simplify the
implementation.

Do that before further changes are made.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:53:56 +02:00
Benny Halevy
237c844901 test: sstable_compaction_test: compaction_manager_basic_test: rename
generations to idx

The function used `calculate_generation_for_new_table` for
the sstables generation.  The so-called `generations` are just used
to generate key indices.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-19 16:52:21 +02:00
Botond Dénes
9859bae54f Merge 'Ignore no such column family in repair' from Aleksandra Martyniuk
While repair requested by user is performed, some tables
may be dropped. When the repair proceeds to these tables,
it should skip them and continue with others.

When no_such_column_family is thrown during user requested
repair, it is logged and swallowed. Then the repair continues with
the remaining tables.

Fixes: #13045

Closes #13068

* github.com:scylladb/scylladb:
  repair: fix indentation
  repair: continue user requested repair if no_such_column_family is thrown
  repair: add find_column_family_if_exists function
2023-03-19 15:16:02 +02:00
Botond Dénes
b1c7538e92 Merge 'Give table a reference to storage_options' from Pavel Emelyanov
The `storage_options` describes where sstables should be located. Currently the object reside on keyspace_metadata, but is thus not available at the place it's needed the most -- the `table::make_sstable()` call. This set converts keyspace_metadata::storage_opts to be lw-shared-ptr and shares the ptr with class table.

refs: #12523 (detached small change from large PR)

Closes #13212

* github.com:scylladb/scylladb:
  table: Keep storage options lw-shared-ptr
  keyspace_metadata: Make storage options lw-shared-ptr
2023-03-19 15:16:02 +02:00
Avi Kivity
a7099132cc scripts/pull_github_pr.sh: optionally authenticate
This helps overcome rate limits for unauthenticated requests,
preventing maintainers from getting much-needed rest.

Closes #13210
2023-03-19 15:16:02 +02:00
Kefu Chai
c5b6c91412 db: data_listener: mark data_listener's dtor virtual
Clang-17 warns when we tries to delete a pointer to a class with virtual
function(s) but without marking its dtor virtual. in this change, we
mark the dtor of the base class of `table_listener` virtual to address
the warning.

we have another solution though -- to mark `table_listener` `final`. as we
don't destruct `table_listener` with a pointer to its base classes. but
it'd be much simpler to just mark the dtor virtual of its base class
with virtual method(s). it's much idiomatic this way, and less error-prune.

this change should silence the warning like:
```
In file included from /home/kefu/dev/scylladb/test/boost/data_listeners_test.cc:9:
In file included from /usr/include/boost/test/unit_test.hpp:18:
In file included from /usr/include/boost/test/test_tools.hpp:46:
In file included from /usr/include/boost/test/tools/old/impl.hpp:20:
In file included from /usr/include/boost/test/tools/assertion_result.hpp:21:
In file included from /usr/include/boost/shared_ptr.hpp:17:
In file included from /usr/include/boost/smart_ptr/shared_ptr.hpp:17:
In file included from /usr/include/boost/smart_ptr/detail/shared_count.hpp:27:
In file included from /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:35:
In file included from /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/memory:78:
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on non-final 'table_listener' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
        delete __ptr;
        ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<table_listener>::operator()' requested here
          get_deleter()(std::move(__ptr));
          ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/stl_construct.h:88:15: note: in instantiation of member function 'std::unique_ptr<table_listener>::~unique_ptr' requested here
        __location->~_Tp();
                     ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13198
2023-03-19 15:16:02 +02:00
Kefu Chai
a01eb593ec test: sstables: do not compare a mutation with an optional<mutation>
this change should address the FTBFS with Clang-17.

turns out we are comparing a mutation with an
optimized_optional<mutation>. and Clang-17 does not want to convert the
LHS, which is a mutation to optimized_optional<mutation> for performing
the comparison using operator==(const optimized_optional<mutation>&),
desipte that optimized_optional(const T& obj) is not marked explicit.
this is understandable.

so, in this change, instead of relying on the implicit conversion, we
just

* check if the optional actually holds a value
* and compare the value by deferencing the optional.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13196
2023-03-19 15:16:02 +02:00
Pavel Emelyanov
be548a4da3 install-dependencies: Add rapid XML dev package
It will be needed by S3 driver to parse multipart-upload messages from
server

refs: #12523

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13158

[avi: regenerate toolchain]

Closes #13192
2023-03-19 15:16:02 +02:00
Avi Kivity
c3a2ec9d3c Merge 'use fmt::join() for printing ranges' from Kefu Chai
this series intends to deprecate `::join()`, as it always materializes a range into a concrete string. but what we always want is to print the elements in the given range to stream, or to a seastar logger, which is backed by fmtlib. also, because fmtlib offers exactly the same set of features implemented by to_string.hh, this change would allow us to use fmtlib to replace to_string.hh for better maintainability, and potentially better performance. as fmtlib is lazy evaluated, and claims to be performant under most circumstances.

Closes #13163

* github.com:scylladb/scylladb:
  utils: to_string: move join to namespace utils
  treewide: use fmt::join() when appropriate
  row_cache: pass "const cache_entry" to operator<<
2023-03-19 15:16:02 +02:00
Wojciech Mitros
3cdaf72065 docs: fix minor issues found in the wasm documentation
Even after last fixups, the documentation still had some issues with
compilation instructions in particular. I also ran a spelling and
grammar check on the text, and fixed issues found by it.

Closes #13206
2023-03-19 15:16:02 +02:00
Botond Dénes
6a8fbbebf2 test/boost/reader_concurrency_semaphore_test: remove redundant stats printouts
The semaphore stats are now included in the standard semaphore
diagnostics printout, no need to dump separately.
2023-03-17 03:15:41 -04:00
Botond Dénes
d6583cad0a reader_concurrency_semaphore: do_dump_reader_permit_diagnostics(): print the stats
Print the semaphore stats below the permit listing and remove the
currently redundant "Total: " line.
Some of the stats printed here are already exported as metrics, but
instead of trying to cherry-pick and risk some metrics falling through
the cracks, just print everything, there aren't that many anyway.
2023-03-17 03:15:41 -04:00
Botond Dénes
7b701ac52e reader_concurrency_semaphore: add stats to record reason for queueing permits
When diagnosing problems, knowing why permits were queued is very
valuable. Record the reason in a new stats, one for each reason a permit
can be queued.
2023-03-17 03:15:41 -04:00
Botond Dénes
bb00405818 reader_concurrency_semaphore: can_admit_read(): also return reason for rejection
So caller can bump the appropriate counters or log the reason why the
the request cannot be admitted.
2023-03-17 03:15:40 -04:00
Kefu Chai
f113dac5bf build: cmake: add all tests
* add a new test KIND "UNIT", which provides its own main()
* add all tests which were not included yet

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-17 12:56:09 +08:00
Kefu Chai
b440417527 build: cmake: promote add_scylla_test() to test/
as it will be used by test/manual/CMakeLists.txt also.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-17 12:56:09 +08:00
Daniel Reis
86a4b8a57d docs: scylladb better php driver 2023-03-16 17:00:22 -03:00
Wojciech Mitros
53af79442d cql: change exception type when granting incorrect permissions
For compatibility with Cassandra, this patch changes the exception
type thrown when trying to alter a permission that is not applicable
on the given resource from an Invalid query to a Syntax exception.
2023-03-16 16:43:37 +01:00
Wojciech Mitros
9c36c0313a cql: check if the function is builtin when granting permissisons
Currently, when granting a permission on a funciton resource, we only
check if the function exists, regardless of whether it's a user
or a builtin function. We should not support altering permissions
on builtin functions, so this patch adds a check for confirming
that the found function is not builtin.
2023-03-16 16:43:32 +01:00
Pavel Emelyanov
e882269d93 table: Keep storage options lw-shared-ptr
Tables need to know which storage their sstables need to be located at,
so class table needs to have itw reference of the storage options. The
thing can be inherited from the keyspace metadata.

Tests sometimes create table without keyspace at hand. For those use
default-initialized storage options (which is local storage).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-16 17:30:45 +03:00
Pavel Emelyanov
c619a53c61 keyspace_metadata: Make storage options lw-shared-ptr
Today the storage options are embedded into metadata object. In the
future the storage options will need to be somehow referenced by the
class table too. Using plan reference doesn't look safe, turn the
storage options into lw-shared-ptr instead.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-16 17:30:45 +03:00
Kefu Chai
93fa70069c utils: to_string: move join to namespace utils
`join` can easily be confused with boost::algorithm::join
so make it more visible that we're using scylla's
utils implementation.

Also, move `struct print_with_comma` to utils::internal.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-16 20:34:18 +08:00
Kefu Chai
c37f4e5252 treewide: use fmt::join() when appropriate
now that fmtlib provides fmt::join(). see
https://fmt.dev/latest/api.html#_CPPv4I0EN3fmt4joinE9join_viewIN6detail10iterator_tI5RangeEEN6detail10sentinel_tI5RangeEEERR5Range11string_view
there is not need to revent the wheel. so in this change, the homebrew
join() is replaced with fmt::join().

as fmt::join() returns an join_view(), this could improve the
performance under certain circumstances where the fully materialized
string is not needed.

please note, the goal of this change is to use fmt::join(), and this
change does not intend to improve the performance of existing
implementation based on "operator<<" unless the new implementation is
much more complicated. we will address the unnecessarily materialized
strings in a follow-up commit.

some noteworthy things related to this change:

* unlike the existing `join()`, `fmt::join()` returns a view. so we
  have to materialize the view if what we expect is a `sstring`
* `fmt::format()` does not accept a view, so we cannot pass the
  return value of `fmt::join()` to `fmt::format()`
* fmtlib does not format a typed pointer, i.e., it does not format,
  for instance, a `const std::string*`. but operator<<() always print
  a typed pointer. so if we want to format a typed pointer, we either
  need to cast the pointer to `void*` or use `fmt::ptr()`.
* fmtlib is not able to pick up the overload of
  `operator<<(std::ostream& os, const column_definition* cd)`, so we
  have to use a wrapper class of `maybe_column_definition` for printing
  a pointer to `column_definition`. since the overload is only used
  by the two overloads of
  `statement_restrictions::add_single_column_parition_key_restriction()`,
  the operator<< for `const column_definition*` is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-16 20:34:18 +08:00
Wojciech Mitros
aad2afd417 rust: update dependencies
Cranelift-codegen 0.92.0 and wasmtime 5.0.0 have security issues
potentially allowing malicious UDFs to read some memory outside
the wasm sandbox. This patch updates them to versions 0.92.1
and 5.0.1 respectively, where the issues are fixed.

Fixes #13157

Closes #13171
2023-03-16 13:45:53 +02:00
Takuya ASADA
a79604b0d6 create-relocatable-package.py: exclude tools/cqlsh
We should exclude tools/cqlsh for relocatable package.

fixes #13181

Closes #13183
2023-03-16 13:37:16 +02:00
Anna Stuchlik
d00926a517 doc: Add version 5.2 to the version selector
This commit adds branch-5.2 to the list of branches
for which we want to build the docs. As a result,
version 5.2 will be added to the version selector.

NOTE: Version 5.2 will be marked as unstable and
an appropriate message will be shown to the user.
After 5.2 is released, branch-5.2 needs to be
moved from UNSTABLE_VERSIONS to LATEST_VERSION
(where is should replace branch-5.1)

Closes #13200
2023-03-16 10:46:30 +02:00
Kamil Braun
b919373cce Merge 'api: gossiper: get alive nodes after reaching current shard 0 version' from Alecco
Add an API call to wait for all shards to reach the current shard 0
gossiper version. Throws when timeout is reached.

Closes #12540

* github.com:scylladb/scylladb:
  api: gossiper: fix alive nodes
  gms, service: lock live endpoint copy
  gms, service: live endpoint copy method
2023-03-16 09:46:02 +01:00
Botond Dénes
b31a55af7e Merge 'cmake: sync with configure.py (13/n)' from Kefu Chai
this is the 13rd changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals:

- to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience
- to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules.

this changeset includes following changes:

- build: cmake: increase per link job mem to 4GiB
- build: cmake: add missing sources to test-lib
- build: cmake: add more tests
- build: cmake: remote quotes in "include()" commands
- build: cmake: drop unnecessary linkages

Closes #13199

* github.com:scylladb/scylladb:
  build: cmake: drop unnecessary linkages
  build: cmake: remote quotes in "include()" commands
  build: cmake: add more tests
  build: cmake: add missing sources to test-lib
  build: cmake: increase per link job mem to 4GiB
2023-03-16 10:40:18 +02:00
Nadav Har'El
c5195e0acd cql-pytest: add reproducers for GROUP BY bugs
The translated Cassandra unit tests in cassandra_tests/validation/operations/
reproduced three bugs in GROUP BY's interaction with LIMIT and PER PARTITION
LIMIT - issue #5361, #5362 and #5363. Unfortunately, those test functions
are very long, and each test fails on all of these issues and a few more,
making it difficult to use these tests to verify when those tests have
been fixed. In other words, ideally a patch for issue 5361 should un-xfail
some reproducing test for this issue - but all the existing tests will
continue to fail after fixing 5361, because of other remaining bugs.

So in this patch, I created a new test file test_group_by.py with my own
tests for the GROUP BY feature. I tried to explore the different
capabilities of the GROUP BY feature, its different success and error
paths, and how GROUP BY interacts with LIMIT and PER PARTITION LIMIT.
As usual, I created many small test functions and not one huge test
function, and as a result we now have 5 xfailing tests which each
reproduces one bug and when the bug is fixed, it will start to pass.

All tests added here pass on Cassandra.

Refs #5361
Refs #5362
Refs #5363

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13136
2023-03-16 10:39:05 +02:00
Botond Dénes
f4b5679804 Merge 'doc: Updates the recommended OS to be Ubuntu 22.04' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/13138
Fixes https://github.com/scylladb/scylladb/issues/13153

This PR:

- Fixes outdated information about the recommended OS. Since version 5.2, the recommended OS should be Ubuntu 22.04 because that OS is used for building the ScyllaDB image.
- Adds the OS support information for version 5.2.

This PR (both commits) needs to be backported to branch-5.2.

Closes #13188

* github.com:scylladb/scylladb:
  doc: Add OS support for version 5.2
  doc: Updates the recommended OS to be Ubuntu 22.04
2023-03-16 08:05:19 +02:00
Kefu Chai
0069b43fd4 build: cmake: drop unnecessary linkages
most of the linked libraries should be pulled in by the targets
defined by subsystems.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-16 12:14:21 +08:00
Kefu Chai
681dfac496 build: cmake: remote quotes in "include()" commands
more consistent this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-16 12:14:21 +08:00
Kefu Chai
03f5f788a3 build: cmake: add more tests
all tests under test/boost are now buildable.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-16 12:14:21 +08:00
Kefu Chai
649a31a722 build: cmake: add missing sources to test-lib
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-16 12:14:21 +08:00
Kefu Chai
8963fe4e41 build: cmake: increase per link job mem to 4GiB
lld is multi-threaded in some phases, based on observation, it could
spawn up to 16 threads for each link job. and each job could take up
to more than 3 GiB memory in total. without the change, we can run
into OOM with a machine without abundant memory, so increase the
per-link-job mem accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-16 12:14:21 +08:00
Kefu Chai
9eb2626fec row_cache: pass "const cache_entry" to operator<<
operator<<(..) does not mutate the cache_entry parameter passed to it.
also, without this change fmtlib is not able to format given cache_entry
parameter, as the caller formatter has "const" specifier.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-16 07:46:11 +08:00
Avi Kivity
7a5e609d8d cql3: functions: add helpers for automating marshalling for scalar functions
Add a helper that, given a C++ function, deduces its arguument types
and wraps the function in marshalling/unmarshalling code.

The native function expects non-null inputs, so an additional helper is
called to decide what to do if nulls are encountered. One such
helper is return_accumulator_on_null (since that's the default
behavior of aggregates), and the other is return_any_nonnull(),
useful for reductions.
2023-03-15 22:28:41 +02:00
Avi Kivity
35dd3edb9e types: fix big_decimal constructor from literal 0
Currently, big_decimal(0) will select the big_decimal(string_view)
constructor (via 0 -> const char* -> string_view conversions).
0 is important for initializing aggregates, so fix it ahead of
using it.
2023-03-15 22:24:12 +02:00
Avi Kivity
6c8d942fa1 cql3: functions: add helper class for internal scalar functions
We'll need many scalar functions to implement aggregates in terms
of scalars, so we add an internal_scalar_function class to reduce
boilerplate. The new class proxies the scalar function into a
native noncopyable_function provided by the constructor.
2023-03-15 22:22:02 +02:00
Avi Kivity
26e8ec663b db: functions: add stateless aggregate functions
Currently, aggregate functions are implemented in a statefull manner.
The accumulator is stored internally in an aggregate_function::aggregate,
requiring each query to instantiate new instances (see
aggregate_function_selector's constructor, and note how it's called
from selector::new_instance()).

This makes aggregates hard to use in expressions, since expressions
are stateless (with state only provided to evaluate()). To facilitate
migration towards stateless expressions, we define a
stateless_aggregate_function (modelled after user-defined aggregates,
which are already stateless). This new struct defines the aggregate
in terms of three scalar functions: one to aggregate a new input into
an accumulator (provided in the first parameter), one to finalize an
accumulator into a result, and one to reduce two accumulators for
parallelized aggregation.

An adapter of the new struct to the aggregate_function interface is
also provided, to allow for incremental migration in the following
patches.
2023-03-15 22:10:23 +02:00
Avi Kivity
82c4341e0e db, cql3: move scalar_function from cql3/functions to db/functions
Previously, we moved cql3::functions::function to the
db::functions namespace, since functions are a part of the
data dictionary, which is independent of cql3. We do the
same now for scalar_function, since we wish to make use
of it in a new db::functions::stateless_aggregate_function.

A stub remains in cql3/functions to avoid churn.
2023-03-15 20:37:25 +02:00
Avi Kivity
29a2788b2e Merge 'reader_concurrency_semaphore: handle read blocked on memory being registered as inactive' from Botond Dénes
A read that requested memory and has to wait for it can be registered as inactive. This can happen for example if the memory request originated from a background I/O operation (a read-ahead maybe).
Handling this case is currently very difficult. What we want to do is evict such a read on-the-spot: the fact that there is a read waiting on memory means memory is in demand and so inactive reads should be evicted. To evict this reader, we'd first have to remove it from the memory wait list, which is almost impossible currently, because `expiring_fifo<>`, the type used for the wait list, doesn't allow for that. So in this PR we set out to make this possible first, by transforming all current queues to be intrusive lists of permits. Permits are already linked into an intrusive list, to allow for enumerating all existing permits. We use these existing hooks to link the permits into the appropriate queue, and back to `_permit_list` when they are not in any special queue. To make this possible we first have to make all lists store naked permits, moving all auxiliary data fields currently stored in wrappers like `entry` into the permit itself. With this, all queues and lists in the semaphore are intrusive lists, storing permits directly, which has the following implications:
* queues no longer take extra memory, as all of them are intrusive
* permits are completely self-sufficient w.r.t to queuing: code can queue or dequeue permits just with a reference to a permit at hand, no other wrapper, iterator, pointer, etc. is necessary.
* queues don't keep permits alive anymore; destroying a permit will automatically unlink it from the respective queue, although this might lead to use-after-free. Not a problem in practice, only one code-path (`reader_concurrenc_semaphore::with_permit()`) had to be adjusted.

After all that extensive preparations, we can now handle the case of evicting a reader which is queued on memory.

Fixes: #12700

Closes #12777

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: handle reader blocked on memory becoming inactive
  reader_concurrency_semaphore: move _permit_list next to the other lists
  reader_permit: evict inactive read on timeout
  reader_concurrency_semaphore: move inactive_read to .cc
  reader_concurrency_semaphore: store permits in _inactive_reads
  reader_concurrency_semaphore: inactive_read: de-inline more methods
  reader_concurrency_semaphore: make _ready_list intrusive
  reader_permit: add wait_for_execution state
  reader_concurrency_semaphore: make wait lists intrusive
  reader_concurrency_semaphore: move most wait_queue methods out-of-line
  reader_concurrency_semaphore: store permits directly in queues
  reader_permit: introduce (private) operator * and ->
  reader_concurrency_semaphore: remove redundant waiters() member
  reader_concurrency_semaphore: add waiters counter
  reader_permit: use check_abort() for timeout
  reader_concurrency_semaphore: maybe_dump_permit_diagnostics(): remove permit list param
  reader_concurrency_semaphroe: make foreach_permit() const
  reader_permit: add get_schema() and get_op_name() accessors
  reader_concurrency_semaphore: mark maybe_dump_permit_diagnostics as noexcept
2023-03-15 20:10:19 +02:00
Wojciech Mitros
b776cb4b41 docs: fix typos in wasm documentation
This patch fixes 2 small issues with the Wasm UDF documentation that
recently got uploaded:
1. a link was unnecessarily wrapped in angle brackets
2. a link did not redirect to the correct page due to a missing ":doc:" tag

Closes #13193
2023-03-15 18:48:48 +02:00
Anna Stuchlik
3ad3259396 doc: Add OS support for version 5.2
Fixes https://github.com/scylladb/scylladb/issues/13153

This commit adds a row for version 5.2 to the table of
supported platforms.
2023-03-15 16:12:41 +01:00
Kamil Braun
5705df77a1 Merge 'Refactor schema, introduce schema_static_props and move several properties into it' from Gusev Petr
Our end goal (#12642) is to mark raft tables to use
schema commitlog. There are two similar
cases in code right now - `with_null_sharder`
and `set_wait_for_sync_to_commitlog` `schema_builder`
methods. The problem is that if we need to
mark some new schema with one of these methods
we need to do this twice - first in
a method describing the schema
(e.g. `system_keyspace::raft()`) and second in the
function `create_table_from_mutations`, which is not
obvious and easy to forget.

`create_table_from_mutations` is called when schema object
is reconstructed from mutations, `with_null_sharder`
and `set_wait_for_sync_to_commitlog` must be called from it
since the schema properties they describe are
not included in the mutation representation of the schema.

This series proposes to distinguish between the schema
properties that get into mutations and those that do not.
The former are described with `schema_builder`, while for
the latter we introduce `schema_static_props` struct and
the `schema_builder::register_static_configurator` method.
This way we can formulate a rule once in the code about
which schemas should have a null sharder/be synced, and it will
be enforced in all cases.

Closes #13170

* github.com:scylladb/scylladb:
  schema.hh: choose schema_commitlog based on schema_static_props flag
  schema.hh: use schema_static_props for wait_for_sync_to_commitlog
  schema.hh: introduce schema_static_props, use it for null_sharder
  database.cc: drop ensure_populated and mark_as_populated
2023-03-15 15:43:49 +01:00
Kefu Chai
e21926f602 flat_mutation_reader_v2: use maybe_yield() when appropriate
just came across this part of code, as `maybe_yield()` is a wrapper
around "if should_yield(): yield()", so better off using it for more
concise code.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13107
2023-03-15 15:58:55 +02:00
Anna Stuchlik
1bb11126d7 doc: Updates the recommended OS to be Ubuntu 22.04
Fixes https://github.com/scylladb/scylladb/issues/13138
This PR fixes the outdated information about the recommended
OS. Since version 5.2, the recommended OS should be Ubuntu 22.04
because that OS is used for building the ScyllaDB image.

This commit needs to be backported to branch-5.2.
2023-03-15 13:42:37 +01:00
Pavel Emelyanov
47cdd31f27 main: Forget the --max-io-requests option
On start scylla checks if the option is set. It's nowadays useless, as
it had been removed from seastar (see 9e34779c update)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13148
2023-03-15 12:42:06 +02:00
Botond Dénes
e5f3f4b0d1 Merge 'cmake: sync with configure.py (12/n)' from Kefu Chai
this is the 12nd changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals:

- to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience
- to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules.

this changeset includes following changes:

- build: cmake: remove Seastar from the option name
- build: cmake: add missing sources in test-lib and utils
- build: cmake: do not include main.cc in scylla-main
- build: cmake: define SEASTAR_TESTING_MAIN for SEASTAR tests
- build: cmake: add more tests

Closes #13180

* github.com:scylladb/scylladb:
  build: cmake: add more tests
  build: cmake: define SEASTAR_TESTING_MAIN for SEASTAR tests
  build: cmake: do not include main.cc in scylla-main
  build: cmake: add missing sources in test-lib and utils
  build: cmake: remove Seastar from the option name
2023-03-15 12:40:51 +02:00
Nadav Har'El
543d4ed726 cql-pytest: translate Cassandra's tests for GROUP BY
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectGroupByTest.java into our cql-pytest
framework.

This test file contains only 8 separate test functions, but each of them
is very long checking hundreds of different combinations of GROUP BY with
other things like LIMIT, ORDER BY, etc., so 6 out of the 7 tests fail on
Scylla on one of the bugs listed below - most of the tests actually fail
in multiple places due to multiple bugs. All tests pass on Cassandra.

The tests reproduce six already-known Scylla issues and one new issue:

Already known issues:

Refs #2060: Allow mixing token and partition key restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #12477: Combination of COUNT with GROUP BY is different from Cassandra
             in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column

A new issue discovered by these tests:

Refs #13109: Incorrect sort order when combining IN, GROUP BY and ORDER BY

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13126
2023-03-15 12:40:24 +02:00
Pavel Emelyanov
bfc0533a8d test: Update boost.suite.run_first list
In debug mode the timings are:

view_schema_test:        90 sec
cql_query_test:         170 sec
memtable_test:         2090 sec
cql_functions_test:    2591 sec

other tests that are in/out of this list are not that obvious, but the
former two apparently deserve being replaced with the latter two.

Timings for dev/release modes are not that horrible, but the "first pair
is notably smaller than the latter" relation also exists.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13142
2023-03-15 12:10:50 +02:00
Botond Dénes
878ee27d74 Merge 'Load SSTable at the shard that actually own it' from Raphael "Raph" Carvalho
Today, the SSTable generation provides a hint on which shard owns a
particular SSTable. That hint determines which shard will load the
SSTable into memory.

With upcoming UUID generation, we will no longer have this hint
embedded into the SSTable generation, meaning that SSTables will be
loaded at random shards. This is not good because shards will have
to reference memory from other shards to access the SSTable
metadata that was allocated elsewhere.

This patch changes sstable_directory to:
1) Use generation value to only determine which shard will calculate
the owner shards for SSTables. Essentially works like a round-robin
distribution.
2) The shard assigned to compute the owners for a SSTable will do
so reading the minimum from disk, usually only Scylla file is
needed.
3) Once that shard finished computing the owners, it will forward
the SSTable to the shard that own it.
4) Shards will later load SSTables locally that were forwarded to
them.

Closes #13114

* github.com:scylladb/scylladb:
  sstables: sstable_directory: Load SSTable at the shard that actually own it
  sstables: sstable_directory: Give sstable_info_vector a more descriptive name
  sstables: Allow owner shards to be computed for a partially loaded SSTable
  sstables: Move SSTable loading to sstable_directory::sort_sstable()
  sstables: Move sstable_directory::sort_sstable() to private interface
  sstables: Restore indentation in sstable_directory::sort_sstable()
  sstables: Coroutinize sstable_directory::sort_sstable()
  sstables: sstable_directory: Extract sstable loading from process_descriptor()
  sstables: sstable_directory: Separate private fields from methods
  sstables: Coroutinize sstable_directory::process_descriptor
2023-03-15 10:43:22 +02:00
Kefu Chai
4505b0a9ca build: cmake: add more tests
* test/boost: add more tests: all tests listed in test/boost/CMakeLists.txt
  should build now.
* rust: add inc library, which is used for testing.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-15 15:38:47 +08:00
Kefu Chai
cac6ba529d build: cmake: define SEASTAR_TESTING_MAIN for SEASTAR tests
we need the `main()` defined by
seastar/testing/seastar_test.hh for driving the tests.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-15 15:38:47 +08:00
Kefu Chai
d9e3ffebf2 build: cmake: do not include main.cc in scylla-main
main.cc should only be included by scylla.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-15 15:38:46 +08:00
Kefu Chai
1cd3764b08 build: cmake: add missing sources in test-lib and utils
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-15 15:38:46 +08:00
Kefu Chai
269cce4c2c build: cmake: remove Seastar from the option name
change the option name to "LINK_MEM_PER_JOB" as this is not
a Seastar option, but a top-level project option.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
2023-03-15 15:38:46 +08:00
Michał Chojnowski
866672a9fa storage_proxy: rename metrics after service level rename
Under some circumstances, service_level_controller renames service
levels for internal purposes. However, the per-service-level metrics
registered by storage_proxy keep the name seen at first registration
time. This sometimes leads to mislabeled metrics.

Fix that by re-registering the metrics after scheduling groups
are renamed.

Fixes scylladb/scylla-enterprise#2755

Closes #13174
2023-03-15 09:15:54 +02:00
Botond Dénes
6373452b31 Merge 'Do not mask node operation errors' from Benny Halevy
This series handles errors when aborting node operations and prints them rather letting them leak and be exposed to the user.

Also, cleanup the node_ops logging formats when aborting different node ops
and add more error logging around errors in the "worker" nodes.

Closes #12799

* github.com:scylladb/scylladb:
  storage_service: node_ops_signal_abort: print a warning when signaling abort
  storage_service: s/node_ops_singal_abort/node_ops_signal_abort/
  storage_service: node_ops_abort: add log messages
  storage_service: wire node_ops_ctl for node operations
  storage_service: add node_ops_ctl class to formalize all node_ops flow
  repair: node_ops_cmd_request: add print function
  repair: do_decommission_removenode_with_repair: log ignore_nodes
  repair: replace_with_repair: get ignore_nodes as unordered_set
  gossiper: get_generation_for_nodes: get nodes as unordered_set
  storage_service: don't let node_ops abort failures mask the real error
2023-03-15 09:11:31 +02:00
Petr Gusev
afe1d39bdb schema.hh: choose schema_commitlog based on schema_static_props flag
This patch finishes the refactoring. We introduce the
use_schema_commitlog flag in schema_static_props
and use it to choose the commitlog in
database::add_column_family. The only
configurator added declares what was originally in
database::add_column_family - all
tables from schema_tables keyspace
should use schema_commitlog.
2023-03-14 19:43:51 +04:00
Petr Gusev
3ef201d67a schema.hh: use schema_static_props for wait_for_sync_to_commitlog
This patch continues the refactoring, now we move
wait_for_sync_to_commitlog property from schema_builder to
schema_static_props.

The patch replaces schema_builder::set_wait_for_sync_to_commitlog
and is_extra_durable with two register_static_configurator,
one in system_keyspace and another in system_distributed_keyspace.
They correspond to the two parts of the original disjunction
in schema_tables::is_extra_durable.
2023-03-14 19:26:05 +04:00
Calle Wilund
4681c4b572 configurables: Add optional service lookup to init callback
Simplified, more direct version of "dependency injection".
I.e. caller/initiator (main/cql_test_env) provides a set of
services it will eventually start. Configurable can remember
these. And use, at least after "start" notification.

Closes #13037
2023-03-14 17:13:52 +02:00
Petr Gusev
349bc1a9b6 schema.hh: introduce schema_static_props, use it for null_sharder
Our goal (#12642) is to mark raft tables to use
schema commitlog. There are two similar
cases in code right now - with_null_sharder
and set_wait_for_sync_to_commitlog schema_builder
methods. The problem is that if we need to
mark some new schema with one of these methods
we need to do this twice - first in
a method describing the schema
(e.g. system_keyspace::raft()) and second in the
function create_table_from_mutations, which is not
obvious and easy to forget.

create_table_from_mutations is called when schema object
is reconstructed from mutations, with_null_sharder
and set_wait_for_sync_to_commitlog must be called from it
since the schema properties they describe are
not included in the mutation representation of the schema.

This patch proposes to distinguish between the schema
properties that get into mutations and those that do not.
The former are described with schema_builder, while for
the latter we introduce schema_static_props struct and
the schema_builder::register_static_configurator method.
This way we can formulate a rule once in the code about
which schemas should have a null sharder, and it will
be enforced in all cases.
2023-03-14 18:29:34 +04:00
Wojciech Mitros
52eb70aef0 docs: make wasm documentation visible for users
Until now, the instructions on generating wasm files and using them
for Scylla UDFs were stored in docs/dev, so they were not visible
on the docs website. Now that the Rust helper library for UDFs
is ready, and we're inviting users to try it out, we should also
make the rest of the Wasm UDF documentation readily available
for the users.

Closes #13139
2023-03-14 16:21:23 +02:00
David Garcia
63ad5607ee docs: Update custom styles 2023-03-14 12:06:20 +00:00
David Garcia
bad914a34d docs: Update styles 2023-03-14 12:01:33 +00:00
David Garcia
8c4659a379 docs: Add card logos 2023-03-14 10:37:23 +00:00
Botond Dénes
1d9b7f3a92 Merge 'cmake: sync with configure.py (11/n)' from Kefu Chai
- build: cmake: remove test which does not exist yet
- build: cmake: document add_scylla_test()
- build: cmake: extract index, repair and data_dictionary out
- build: cmake: extract scylla-main out
- build: cmake: find Snappy before using it
- build: cmake: add missing linkages
- build: cmake: add missing sources to test-lib
- build: cmake: link sstables against libdeflate
- build: cmake: link Boost::regex against ICU::uc

Closes #13110

* github.com:scylladb/scylladb:
  build: cmake: link Boost::regex against ICU::uc
  build: cmake: link sstables against libdeflate
  build: cmake: add missing sources to test-lib
  build: cmake: add missing linkages
  build: cmake: find Snappy before using it
  build: cmake: extract scylla-main out
  build: cmake: extract index, repair and data_dictionary out
  build: cmake: document add_scylla_test()
  build: cmake: remove test which does not exist yet
2023-03-14 11:45:48 +02:00
Petr Gusev
00fc73d966 database.cc: drop ensure_populated and mark_as_populated
There was some logic to call mark_as_populate at
the appropriate places, but the _populated field
and the ensure_populated function were
not used by anyone.
2023-03-14 13:32:25 +04:00
Botond Dénes
e22b27a107 Merge 'Improve database shutdown verbosity' from Pavel Emelyanov
The `database::stop` method is sometimes hanging and it's always hard to spot where exactly it sleeps. Few more logging messages would make this much simpler.

refs: #13100
refs: #10941

Closes #13141

* github.com:scylladb/scylladb:
  database: Increase verbosity of database::stop() method
  large_data_handler: Increase verbosity on shutdown
  large_data_handler: Coroutinize .stop() method
2023-03-14 10:55:31 +02:00
Kefu Chai
5842804591 install-dependencies: extract go_arch() out
for defining the mapping from the output of `arch` to the corresponding
GO_ARCH. see b94dc384ca/src/go/build/syslist.go (L55)

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13151
2023-03-14 10:05:09 +03:00
Raphael S. Carvalho
0c77f77659 sstables: sstable_directory: Load SSTable at the shard that actually own it
Today, the SSTable generation provides a hint on which shard owns a
particular SSTable. That hint determines which shard will load the
SSTable into memory.

With upcoming UUID generation, we will no longer have this hint
embedded into the SSTable generation, meaning that SSTables will be
loaded at random shards. This is not good because shards will have
to reference memory from other shards to access the SSTable
metadata that was allocated elsewhere.

This patch changes sstable_directory to:
1) Use generation value to only determine which shard will calculate
the owner shards for SSTables. Essentially works like a round-robin
distribution.
2) The shard assigned to compute the owners for a SSTable will do
so reading the minimum from disk, usually only Scylla file is
needed.
3) Once that shard finished computing the owners, it will forward
the SSTable to the shard that own it.
4) Shards will later load SSTables locally that were forwarded to
them.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-13 15:40:43 -03:00
Raphael S. Carvalho
2c4e141314 sstables: sstable_directory: Give sstable_info_vector a more descriptive name
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-13 15:40:43 -03:00
Raphael S. Carvalho
a83328c358 sstables: Allow owner shards to be computed for a partially loaded SSTable
Today, owner shards can only be computed for a fully loaded SSTable.

For upcoming changes in the SSTable loader, we want to load the minimum
from disk to be able to compute the set of shards owning the SSTable.

If sharding metadata is available, it means we only need to read
TOC and Scylla components.

Otherwise, Summary must be read to provide first and last keys for
compute_shards_for_this_sstable() to operate on them instead.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-13 15:40:43 -03:00
Raphael S. Carvalho
b49ae56e70 sstables: Move SSTable loading to sstable_directory::sort_sstable()
The reason for this change is that we'll want to fully load the
SSTable only at the destination shard.

Later, sort_sstable() will calculate set of owner shards for a
SSTable by only loading scylla metadata file.

If it turns out that the SSTable belongs to current shard, then
we'll fully load the SSTable using the new and fresh
sstable_directory::load_sstable().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-13 15:40:43 -03:00
Raphael S. Carvalho
229d89dbde sstables: Move sstable_directory::sort_sstable() to private interface
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-13 15:40:43 -03:00
Raphael S. Carvalho
36602d1025 sstables: Restore indentation in sstable_directory::sort_sstable()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-13 15:40:43 -03:00
Raphael S. Carvalho
825f23b7f9 sstables: Coroutinize sstable_directory::sort_sstable()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-13 15:40:43 -03:00
Raphael S. Carvalho
a19a9f5d99 sstables: sstable_directory: Extract sstable loading from process_descriptor()
Will make it easier for process_descriptor to process the SSTable
without having to fully load the SSTable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-13 15:40:43 -03:00
Raphael S. Carvalho
08e6df256e sstables: sstable_directory: Separate private fields from methods
Following the expected coding convention. It's also somewhat
disturbing to see them mixed up.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-13 15:40:43 -03:00
Raphael S. Carvalho
7d751991c1 sstables: Coroutinize sstable_directory::process_descriptor
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-03-13 15:40:43 -03:00
Anna Stuchlik
8ceb8b0240 doc: add a Knowledge Base article about consitency, v2 of https://github.com/scylladb/scylladb/pull/12929
Closes #12957
2023-03-13 17:48:25 +02:00
Aleksandra Martyniuk
cb0e6d617a test: extend test_compaction_task.py to test cleanup compaction 2023-03-13 16:36:20 +01:00
Aleksandra Martyniuk
27b999808f compaction: create task manager's task for cleanup keyspace compaction on one shard
Implementation of task_manager's task that covers cleanup keyspace compaction
on one shard.
2023-03-13 16:35:39 +01:00
Aleksandra Martyniuk
7dd27205f6 compaction: create task manager's task for cleanup keyspace compaction
Implementation of task_manager's task covering cleanup keyspace compaction
that can be started through storage_service api.
2023-03-13 16:35:39 +01:00
Aleksandra Martyniuk
4a5752d0d0 api: add get_table_ids to get table ids from table infos 2023-03-13 16:35:39 +01:00
Aleksandra Martyniuk
8801f326c6 compaction: create cleanup_compaction_task_impl 2023-03-13 16:35:39 +01:00
Aleksandra Martyniuk
a976e2e05b repair: fix indentation 2023-03-13 15:25:53 +01:00
Aleksandra Martyniuk
41abc87d28 repair: continue user requested repair if no_such_column_family is thrown
When one of column families requested for repair does not exist, we should
repair all other requested column families.

no_such_column_family exception is caught and logged, and repair continues.
2023-03-13 15:25:52 +01:00
Aleksandra Martyniuk
2376a434b6 repair: add find_column_family_if_exists function 2023-03-13 15:25:15 +01:00
Botond Dénes
3f0b3489a2 reader_concurrency_semaphore: handle reader blocked on memory becoming inactive
Kill said read's memory requests with std::bad_alloc and dequeue it from
the memory wait list, then evict it on the spot.
Now that `_inactive_reads` just store permits, we can do this easily.
2023-03-13 08:07:53 -04:00
Botond Dénes
4f5657422d reader_concurrency_semaphore: move _permit_list next to the other lists
A mostly cosmetic change. Also add a comment mentioning that this is the
catch-all list.
2023-03-13 08:07:53 -04:00
Botond Dénes
d1bc5f9293 reader_permit: evict inactive read on timeout
If the read is inactive when the timeout clock fires, evict it.
Now that `_inactive_reads` just store permits, we can do this easily.
2023-03-13 08:07:53 -04:00
Botond Dénes
6181c08191 reader_concurrency_semaphore: move inactive_read to .cc
It is not used in the header anymore and moving it to the .cc allows us
to remove the dependency on flat_mutation_reader_v2.hh.
2023-03-13 08:07:53 -04:00
Botond Dénes
e56ec9373d reader_concurrency_semaphore: store permits in _inactive_reads
Add an member of type `inactive_read` to reader permit, and store permit
instances in `_inactive_reads`. This list is now just another intrusive
list the permit can be linked into, depending on its state.
Inactive read handles now just store a reader permit pointer.
2023-03-13 08:07:53 -04:00
Botond Dénes
d11f9efbfe reader_concurrency_semaphore: inactive_read: de-inline more methods
They will soon need to access reader_permit::impl internals, only
available in the .cc file.
2023-03-13 08:07:53 -04:00
Botond Dénes
8e296e8e05 reader_concurrency_semaphore: make _ready_list intrusive
Following the same scheme we used to make the wait lists intrusive.
Permits are added to the ready list intrusive list while waiting to be
executed and moved back to the _permit_list when de-queued from this
list.
We now use a conditional variable for signaling when there are permits
ready to be executed.
2023-03-13 08:07:53 -04:00
Nadav Har'El
c41b2d35ed test/alternator: test concurrent TagResource / UntagResource
This patch adds an Alternator test reproducing issue #6389 - that
concurrent TagResource and/or UntagResource operations was broken and
some of the concurrent modifications were lost.

The test has two threads, one loops adds and removes a tag A, the
other adds and removes a tag B. After we add tag A, we expect tag A
to be there - but due to issue #6389 this modification was sometimes
lost when it raced with an operation on B.

This test consistently failed before issue #6389 was fixed, and passes
now after the issue was fixed by the previous patches. The bug reproduces
by chance, so it requires a fairly long loop (a few seconds) to be sure
it reproduces - so is marked a "veryslow" test and will not run in CI,
but can be used to manually reproduce this issue with:

    test/alternator/run --runveryslow test_tag.py::test_concurrent_tag

Refs #6389.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-03-13 13:38:15 +02:00
Nadav Har'El
87f29d8fd2 db/tags: drop unsafe update_tags() utility function
The previous patches introduced the function modify_tags() as a
safe version of update_tags(), and switched all uses of update_tags()
to use modify_tags().

So now that the unsafe update_tags() is no longer use, we can drop it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-03-13 13:35:17 +02:00
Kamil Braun
228856f577 Merge 'Test changing IP address of 2 nodes in a cluster out of 3 & misc cleanups' from Konstantin Osipov
Closes #13135

* github.com:scylladb/scylladb:
  test: improve logging in ScyllaCluster
  raft: (test) test ip address change
2023-03-13 11:47:00 +01:00
Calle Wilund
dba45f3dc8 init: Add life cycle notifications to configurables
Allows a configurable to subscribe to life cycle notifications for scylla app.
I.e. do stuff on start/stop.
Also allow configurables in cql_test_env

v2:
* Fix camel casing
* Make callbacks future<> (should have been. mismerge?)

Closes #13035
2023-03-13 12:45:20 +02:00
Nadav Har'El
c196bd78de alternator: isolate concurrent modification to tags
Alternator modifies tags in three operations - TagResource, UntagResource
and UpdateTimeToLive (the latter uses a tag to store the TTL configuration).

All three operations were implemented by three separate steps:

 1. Read the current tags.
 2. Modify the tags according to the desired operation.
 3. Write the modified tags back with update_tags().

This implementation was not safe for concurrent operations - some
modifications may be be lost. We fix this in this patch by using the new
modify_tags() function introduced in the previous patch, which performs
all three steps under one lock so the tag operations are serialized and
correctly isolated.

Fixes #6389

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-03-13 12:25:03 +02:00
Nadav Har'El
fbdf52acf6 db/tags: add safe modify_tags() utility functions
The existing utility function update_tags() for modifying tags in a
schema (used mainly by Alternator) is not safe for concurrent operations:
The function first reads the old tags, then modifies them and writes
them back. If two such calls happen concurrently, both calls may read
the same old tags, make different modifications, and then both write
the new tags, with one's write overwriting the other's.

So in this patch, we introduce a new utility function, modify_tags(),
to provide a concurrency-safe read-modify-write operation on tags.
The new function takes a modification function and calls the read,
modify and write steps together under a single lock. The new function
also takes a table name instead of a schema object - because we need
to read the schema under the lock, because might have already been
changed by some other concurrent operation.

This patch only introduces the new function, it doesn't change any
code to use it yet, and doesn't remove the unsafe update_tags() function.
We'll do those things in the next patches.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-03-13 11:51:01 +02:00
Nadav Har'El
e5e9b59518 migration_manager: expose access to storage_proxy
A migration_manager holds a reference to a storage_proxy, and uses it
internally a lot - e.g., to gain access to the data_dictionary. Users
of migration_manager might also benefit from this storage_proxy - we
will see such a case in the next patches. So let's provide a getter
for the storage_proxy.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-03-13 11:43:53 +02:00
Israel Fruchter
ef229a5d23 Repackaging cqlsh
cqlsh is moving into it's own repository:
https://github.com/scylladb/scylla-cqlsh

* add cqlsh as submodule
* update scylla-java-tools to have cqlsh remove
* introduced new cqlsh artifcat (rpm/deb/tar)

Depends: https://github.com/scylladb/scylla-tools-java/pull/316
Ref: scylladb/scylladb#11569

Closes #11937

[avi: restore tools/java submodule location, adjust commit]
2023-03-12 20:22:33 +02:00
Pavel Emelyanov
0cd3a6993b sstables: Don't rely on lexicographical prefix comparison
When creating a deletion log for a bunch of sstables the code checks
that all sstables share the same "storage" by lexicographically
comparing their prefixes. That's not correct, as filesystem paths may
refer to the same directory even if not being equal.

So far that's been mostly OK, because paths manipulations were done in
simple forms without producing unequal paths. Patch 8a061bd8 (sstables,
code: Introduce and use change_state() call) triggerred a corner case.

    fs::path foo("/foo");
    sstring sub("");
    foo = foo / sub;

produces a correct path of "/foo/", but the trailing slash breaks the
aforementioned assumption about prefixes comparison. As a result, when
an sstable moves between, say, staging and normal locations it may gain
a trailing slash breaking the deletion log creation code.

The fix is to restrict the deletion log creation not to rely on path
strings comparison completely and trim the trailing slash if it happens.

A test is included.

fixes: #13085

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13090
2023-03-12 20:06:47 +02:00
Avi Kivity
beaa5a9117 Merge 'wasm: move compilation to an alien thread' from Wojciech Mitros
The compilation of wasm UDFs is performed by a call to a foreign
function, which cannot be divided with yielding points and, as a
result, causes long reactor stalls for big UDFs.
We avoid them by submitting the compilation task to a non-seastar
std::thread, and retrieving the result using seastar::alien.

The thread is created at the start of the program. It executes
tasks from a queue in an infinite loop.

All seastar shards reference the thread through a std::shared_ptr
to a `alien_thread_runner`.

Considering that the compilation takes a long time anyway, the
alien_thread_runner is implemented with focus on simplicity more
than on performance. The tasks are stored in an std::queue, reading
and writing to it is synchronized using an std::mutex for reading/
writing to the queue, and an std::condition_variable waiting until
the queue has elements.

When the destructor of the alien runner is called, an std::nullopt
sentinel is pushed to the queue, and after all remaining tasks are
finished and the sentinel is read, the thread finishes.

Fixes #12904

Closes #13051

* github.com:scylladb/scylladb:
  wasm: move compilation to an alien thread
  wasm: convert compilation to a future
2023-03-12 19:29:11 +02:00
Avi Kivity
24719ea639 Merge 'sstables: sstable_directory: avoid unnecessarily constructing tuple<> from pair<>' from Kefu Chai
- sstables: sstable_directory: avoid unnecessarily constructing tuple<> from pair<>
- sstables: sstable_directory: add type constraints

Closes #13144

* github.com:scylladb/scylladb:
  sstables: sstable_directory: add type constraints
  sstables: sstable_directory: avoid unnecessarily constructing tuple<> from pair<>
2023-03-12 19:10:02 +02:00
Pavel Emelyanov
24e943f79b install-dependencies: Add minio server and client
These two are static binaries, so no need in yum/apt-installing them with dependencies.
Just download with curl and put them into /urs/local/bin with X-bit set.

This is needed for future object-storage work in order to run unit tests against minio.

refs: #12523

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

[avi: regenerate frozen toolchain]

Closes #13064

Closes #13099
2023-03-12 19:07:10 +02:00
Marcin Maliszkiewicz
74cc90a583 main: remove unused bpo::store 2023-03-12 16:59:27 +02:00
Nadav Har'El
e72b85e82c Merge 'cql-pytest/lwt_test: test LWT UPDATE when partition/clustering ranges are empty' from Jan Ciołek
Adds two test cases which test what happens when we perform an LWT UPDATE, but the partition/clustering key has 0 possible values. This can happen e.g when a column is supposed to be equal to two different values (`c = 0 AND c = 1`).

Empty partition ranges work properly, empty clustering range currently causes a crash (#13129).
I added tests for both of these cases.

Closes #13130

* github.com:scylladb/scylladb:
  cql-pytest/test_lwt: test LWT update with empty clustering range
  cql-pytest/test_lwt: test LWT update with empty partition range
2023-03-12 15:11:33 +02:00
Nadav Har'El
53c8c43d8a Merge 'cql3: improve support for C-style parenthesis casts' from Jan Ciołek
CQL supports type casting using C-style casts.
For example it's possible to do: `blob_column = (blob)funcReturningInt()`

This functionality is pretty limited, we only allow such casts between types that have a compatible binary representation. Compatible means that the bytes will stay unchanged after the conversion.
This means that it's legal to cast an int to blob (int is just a 4 byte blob), but it's illegal to cast a bigint to int (change 4 bytes -> 8 bytes).
This simplifies things, to cast we can just reinterpret the value as the other type.

Another use of C-style casts are type hints. Sometimes it's impossible to infer the exact type of an expression from the context. In such cases the type can be specified by casting the expression to this type.
For example: `overloadedFunction((int)?)`
Without the cast it would be impossible to guess what should be the bind marker's type. The function is overloaded, so there are many possible argument types. The type hint specifies that the bind marker has type int.

An interesting thing is that such casts don't have to be explicit. CQL allows to put an int value in a place where a blob value is expected and it will be automatically converted without any explicit casting.

---

I started looking at our implementation of casts because of #12900. In there the author expressed the need to specify a type hint for bind marker used to pass the WASM code. It could be either `(text)?` for text WASM, or `(blob)?` for binary WASM. This specific use of type hints wasn't supported because there was no `receiver` and the implementation of `prepare_expression` didn't handle that. Preparing casts without a receiver should be easy to implement - we can infer the type of the expression by looking at the type to which the expression is cast.

But while reading `prepare_expression` for `expr::cast` I noticed that the code there is a bit strange. The implementation prepared the expression to cast using the original `receiver` instead of a receiver with the cast type. This caused some issues because of which casting didn't work as expected.
For example it was possible to do:
```cql
blob_column = (blob)funcReturningInt()
```
But this didn't work at all:
```cql
blob_column = (blob)(int)12323
```
It tried to prepare `untyped_contant(12323)` with a `blob` receiver, which fails.

This makes `expr::cast` useless for casting. Casting when the representation is compatible is already implicit. I couldn't find a single case where adding a cast would change the behavior in any way.
There was some use for it as a type hint to choose a specific overload of a function, but it was worthless for casting.

Cassandra has the same issue, I created a `cql-pytest` test and it showed that we behave in the same way as Cassandra does.

I decided to improve this. By preparing the expression using a receiver with the cast type, `expr::cast` becomes actually useful for casting values. Things like `(blob)(int)12323` now work without any issues.
This diverges from the behavior in Cassandra, but it's an extension, not a breaking incompatibility.

---

This PR improves `prepare_expression` for `expr::cast` in the following ways:
1) Support for more complex casts by preparing the expression using a different receiver. This makes casts like `(blob)(int)123` possible
2) Support preparing `expr::cast` without a receiver. Type inference chooses the cast type as the type of the expression.
3) Add pytest tests for C-style casts

`2)` Is needed for #12900, the other changes is just something I decided to do since I was already working on this piece of code.

Closes #13053

* github.com:scylladb/scylladb:
  expr_test: more tests for preparing bind variables with type hints
  prepare_expr: implement preparing expr::cast with no receiver
  prepare_expr: use :user formatting in cast_prepare_expression
  prepare_expr: remove std::get<> in cast_prepare_expression
  prepare_expr: improve cast_prepare_expression
  prepare_expr: improve readability in cast_prepare_expression
  cql-pytest: test expr::cast in test_cast.py
2023-03-12 15:07:54 +02:00
Nadav Har'El
843a5dfc15 Merge 'Allow setting permissions for user-defined functions' from Wojciech Mitros
This series aims to allow users to set permissions on user-defined functions.

The implementation is based on Cassandra's documentation and should be fully compatible: https://cassandra.apache.org/doc/latest/cassandra/cql/security.html#cql-permissions

Fixes: #5572
Fixes: #10633

Closes #12869

* github.com:scylladb/scylladb:
  cql3: allow UDTs in permissions on UDFs
  cql3: add type_parser::parse() method taking user_types_metadata
  schema_change_test: stop using non-existent keyspace
  cql3: fix parameter names in function resource constructors
  cql3: handle complex types as when decoding function permissions
  cql3: enforce permissions for ALTER FUNCTION
  cql-pytest: add a (failing) test case for UDT in UDF
  cql-pytest: add a test case for user-defined aggregate permissions
  cql-pytest: add tests for function permissions
  cql3: enforce permissions on function calls
  selection: add a getter for used functions
  abstract_function_selector: expose underlying function
  cql3: enforce permissions on DROP FUNCTION
  cql3: enforce permissions for CREATE FUNCTION
  client_state: add functions for checking function permissions
  cql-pytest: add a case for serializing function permissions
  cql3: allow specifying function permissions in CQL
  auth: add functions_resource to resources
2023-03-12 14:04:34 +02:00
Avi Kivity
7f9c822346 Merge 'Coroutinize distributed_loader's reshape() function' from Pavel Emelyanov
It was suggested as candidate from one of previous reviews, so here it is.

Closes #13140

* github.com:scylladb/scylladb:
  distributed_loader: Indentation fix after previous patch
  distributed_loader: Coroutinize reshape() helper
2023-03-12 12:21:33 +02:00
Nadav Har'El
1379d8330f Merge 'Teach sstables tests not to use tempdir explicitly' from Pavel Emelyanov
Many sstable test cases create tempdir on their own to create sstables with. Sometimes it's justified when the test needs to check files on disk by hand for some validation, but often all checks are fs-agnostic. The latter case(s) can be patched to work on top of any storage, in particular -- on top of object storage. To make it work tests should stop creating sstables explicitly in tempdir and this PR does exactly that.

All relevant occurrences of tempdir are removed from test cases, instead the sstable::test_env's tempdir is used. Next, the test_env::{create_sstable|reusable_sst} are patched not to accept the `fs::path dir` argument and pick the env's tempdir. Finally, the `make_sstable_easy` helper is patched to use path-less env methods too.

refs: #13015

Closes #13116

* github.com:scylladb/scylladb:
  test,sstables: Remove path from make_sstable_easy()
  test,lib: Remove wrapper over reusable_sst and move the comment
  test: Make "compact" test case use env dir
  test,compaction: Use env tempdir in some more cases
  test,compaction: Make check_compacted_sstables() use env's dir
  test: Relax making sstable with sequential generation
  test/sstable::test_env: Keep track of auto-incrementing generation
  test/lib: Add sstable maker helper without factory
  test: Remove last occurrence of test_env::do_with(rval, ...)
  test,sstables: Dont mess with tempdir where possible
  test/sstable::test_env: Add dir-less sstables making helpers
  test,sstables: Use sstables::test_env's tempdir with sweeper
  test,sstables: Use sstables::test_env's tempdir
  test/lib: Add tempdir sweeper
  test/lib: Open-code make_sstabl_easy into make_sstable
  test: Remove vector of mutation interposer from test_key_count_estimation
2023-03-12 10:14:26 +02:00
Kefu Chai
97e411bc96 sstables: sstable_directory: add type constraints
add type constraits for
`sstable_directory::parallel_for_each_restricted()`, to enforce the
constraints on the function so it should be invocable with the argument
of specified type. this helps to prevent the problems of passing
function which accepts `pair<key, value>` or `tuple<key, value>`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-11 17:47:19 +08:00
Kefu Chai
0a29d62f4f sstables: sstable_directory: avoid unnecessarily constructing tuple<> from pair<>
`parallel_for_each_restricted()` maps the elements in the given
container with the specified function. in this case, the elements is
of type `unordered_map::value_type`, which is a `pair<const Key, Value>`.
to convert it to a `tuple<Key, Value>`, the constructor of the tuple
is called. but what we intend to do here is but to access the second
element in the `pair<>` here.

in this change, the function's signature is changed to match
`scan_descriptors_map::value_type` to avoid the unnecessary overhead of
constructor of `tuple<>`. also, because the underlying
`max_concurrent_for_each()` does not pass a xvalue to the given func,
instead, it just pass `*s.begin` to the function, where `s.begin` is
an `Iterator` returned by `std::begin(container)`. so let's just use
a plain reference as the parameter type for the function.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-11 17:47:19 +08:00
Konstantin Osipov
7309a1bd6b test: improve logging in ScyllaCluster
Print IP addresses and cluster identifiers in more log messages,
it helps debugging.
2023-03-10 19:53:19 +03:00
Konstantin Osipov
4ace19928d raft: (test) test ip address change 2023-03-10 19:52:40 +03:00
Pavel Emelyanov
f84f0a9414 database: Increase verbosity of database::stop() method
Add logging messages when stopping (this way or another) various
sub-services and helper objects

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-10 19:45:23 +03:00
Pavel Emelyanov
2f316880ae large_data_handler: Increase verbosity on shutdown
It may hang waiting for background handlers, so it's good to know if
they exist at all

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-10 19:45:18 +03:00
Alejo Sanchez
e35762241a api: gossiper: fix alive nodes
Fix API call to wait for all shards to reach the current shard 0
gossiper version. Throws when timeout is reached.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-03-10 17:29:11 +01:00
Alejo Sanchez
6c04476561 gms, service: lock live endpoint copy
To allow concurrent execution, protect copy of live endpoints with a
semaphore.
2023-03-10 17:16:21 +01:00
Pavel Emelyanov
2000494881 large_data_handler: Coroutinize .stop() method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-10 19:06:14 +03:00
Pavel Emelyanov
e7250e5a3f Merge 'sstables: add more constness' from Kefu Chai
- sstables: mark param of sstable::*_from_sstring() const
- sstables: mark param of reverse_map() const
- sstables: mark static lookup table const

Closes #13115

* github.com:scylladb/scylladb:
  sstables: mark static lookup table const
  sstables: mark param of reverse_map() const
  sstables: mark param of sstable::*_from_sstring() const
2023-03-10 17:14:56 +03:00
Kamil Braun
51a76e6359 Revert "Merge 'sstables: remove unused function add more constness' from Kefu Chai"
This reverts commit 49e0d0402d, reversing
changes made to 25cf325674.

An old version of PR #13115 was accidentally merged into `master` (it
was dequeued concurrently while a running next promotion job included
it).

Revert the merge. We'll merge the new version as a follow-up.
2023-03-10 15:02:28 +01:00
Aleksandra Martyniuk
4808220729 test: extend test_compaction_task.py
test/rest_api/test_compaction_task.py is extended so that it checks
validity of major compaction run from column family api.
2023-03-10 15:01:22 +01:00
Aleksandra Martyniuk
0918529fdf api: unify major compaction
Major compaction can be	started	from both storage_service and column_family
api. The first allows to compact a subset of tables in given keyspace,
while the latter - given table in given keyspace.

As major compaction started from storage_service has a wider scope,
we use its mechanisms for column_family's one. That makes it more consistent
and reduces number of classes that would be needed to cover the major
compaction with task manager's tasks.
2023-03-10 15:01:22 +01:00
Pavel Emelyanov
537510f7d2 scylla-gdb: Parse and eval _all_threads without quotes
I've no idea why the quotes are there at all, it works even without
them. However, with quotes gdb-13 fails to find the _all_threads static
thread-local variable _unless_ it's printed with gdb "p" command
beforehand.

fixes: #13125

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13132
2023-03-10 15:01:22 +01:00
Pavel Emelyanov
b07570406e distributed_loader: Indentation fix after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-10 16:01:09 +03:00
Pavel Emelyanov
f90ea6efc2 distributed_loader: Coroutinize reshape() helper
Drop do_with(), keep the needed variable on stack.
Replace repeat() with plain loop + yield.
Keep track of run_custom_job()'s exception.

Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-10 15:37:57 +03:00
Wojciech Mitros
6b8c1823a3 cql3: allow UDTs in permissions on UDFs
Currently, when preparing an authorization statement on a specific
function, we're trying to "prepare" all cql types that appear in
the function signature while parsing the statement. We cannot
do that for UDTs, because we don't know the UDTs that are present
in the databse at parsing time. As a result, such authorization
statements fail.
To work around this problem, we postpone the "preparation" of cql
types until the actual statement validation and execution time.
Until then, we store all type strings in the resource object.
The "preparation" happens in the `maybe_correct_resource` method,
which is called before every `execute` during a `check_access` call.
At that point, we have access to the `query_processor`, and as a
result, to `user_types_metadata` which allows us to prepare the
argument types even for UDTs.
2023-03-10 11:02:33 +01:00
Wojciech Mitros
4f0b3539c5 cql3: add type_parser::parse() method taking user_types_metadata
In a future patch, we don't have access to a `user_types_storage`
while we want to parse a type, but we do have access to a
`user_types_metadata`, which is enough to parse the type.
We add a variant of the `type_parser::parse()` that takes
a `user_types_metadata` instead of a `user_types_storage` to be
able to parse a type also in the described context.
2023-03-10 11:02:33 +01:00
Wojciech Mitros
4182a221d6 schema_change_test: stop using non-existent keyspace
The current implementation of CQL type parsing worked even
when given a string representing a non-existent keyspace, as
long as the parsed type was one of the "native" types. This
implementation is going to change, so that we won't parse
types given an incorrect keyspace name.
When using `do_with_cql_env`, a "ks" keyspace is created by
default, and "tests" keyspace is not. The tests for reverse
schemas in `schema_change_test` were using the "tests"
keyspace, so in order to make the tests work after the future
changes, they now use the existing "ks" keyspace.
2023-03-10 11:02:32 +01:00
Wojciech Mitros
b93c7b94eb cql3: fix parameter names in function resource constructors
In some places, the parameter name used when constructing
a resource object was 'function_name', while the actual
argument was the signature of a function, which is particularly
confusing, because function names also appear frequently in these
contexts. This patch changes the identifiers to more accurately
reflect, what they represent.
2023-03-10 11:02:32 +01:00
Wojciech Mitros
9a303fd99c cql3: handle complex types as when decoding function permissions
Currently, we're parsing types that appear in a function resource
using abstract_type::parse_type, which only works with simple types.
This patch changes it to db::marshal::type_parser::parse, which
can also handle collections.

We also adjust the test_grant_revoke_udf_permissions test so that
it uses both simple and complex types as parameters of the function
that we're granting/revoking permissions on.
2023-03-10 11:02:32 +01:00
Wojciech Mitros
438c7fdfa7 cql3: enforce permissions for ALTER FUNCTION
Currently, the ALTER permission is only enforced on ALL FUNCTIONS
or on ALL FUNCTIONS IN KEYSPACE.
This patch enforces the permisson also on a specific function.
2023-03-10 11:02:32 +01:00
Piotr Sarna
c4e6925bb6 cql-pytest: add a (failing) test case for UDT in UDF
Our permissions system is currently incapable of figuring out
user-defined type definitions when preparing functions permissions.
This test case creates such a function, and it passes on Cassandra.
2023-03-10 11:02:32 +01:00
Piotr Sarna
63e67c9749 cql-pytest: add a test case for user-defined aggregate permissions
This test case is similar to the one for user-defined functions,
but checks if aggregate permissions are enforced.
2023-03-10 11:02:32 +01:00
Piotr Sarna
6deebab786 cql-pytest: add tests for function permissions
The test case checks that function permissions are enforced
for non-superuser users.
2023-03-10 11:01:48 +01:00
Kefu Chai
77643717db sstables: mark static lookup table const
these tables are mappings from symbolic names to their string
representation. we don't mutate them. so mark them const.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-10 16:18:29 +08:00
Kefu Chai
0889643243 sstables: mark param of reverse_map() const
it does not mutate the map in which the value is looked up, so let's
mark map const. also, take this opportunity to use structured binding
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-10 16:18:29 +08:00
Kefu Chai
9eae97c525 sstables: mark param of sstable::*_from_sstring() const
neither of the changed function mutates the parameter.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-10 16:18:28 +08:00
Pavel Emelyanov
e3dc60286c sstable: Remove unused friendship
The components_writer class from this list doesn't even exist
Also drop the forward declaration of mx::partition_reversing_data_source_impl

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13097
2023-03-10 07:13:18 +02:00
Jan Ciolek
c11f7a9e35 expr_test: more tests for preparing bind variables with type hints
Add tests for preparing expr::cast which contains a bind variable,
with a known receiver.
expr::cast serves as a type hint for the bind variable.
It specifies what should be the type of the bind variable,
we must check that this type is compatible with the receiver
and fail in case it isn't

The following cases are tested:
Valid:
`text_col = (text)?`
`int_col = (int)?`

Invalid:
`text_col = (int)?`
`int_col = (text)?`

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 18:31:45 +01:00
Jan Ciolek
a08eb5cb76 prepare_expr: implement preparing expr::cast with no receiver
Type inference in cast_prepare_expression was very limited.
Without a receiver it just gave up and said that it can't
infer the type.

It's possible to infer the type - an expression that
casts something to type bigint also has type bigint.

This can be implemented by creating a fake receiver
when the caller didn't specify one.
Type of this fake receiver will be c.type
and c.arg will be prepared using this receiver.

Note that the previous change (changing receiver
to cast_type_receiver in prepare_expression) is required
to keep the behaviour consistent.
Without it we would sometimes prepare c.arg using the
original receiver, and sometimes using a receiver
with type c.type.

Currently it's impossible to test this change
on live code. Every place that uses expr::cast
specifies a receiver.
A unit test is all that can be done at the moment
to ensure correctness.

In the future this functionality will be used in UDFs.
In https://github.com/scylladb/scylladb/pull/12900
it was requested to be able to use a type hint
to specify whether WASM code of the function
will be sent in binary or text form.

The user can convey this by typing
either `(blob)?` or `(text)?`.
In this case there will be no receiver
and type inference would fail.

After this change it will work - it's now possible
to prepare either of those and get an expression
with a known type.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 18:31:45 +01:00
Jan Ciolek
9f8340d211 prepare_expr: use :user formatting in cast_prepare_expression
By default expressions are printed using the {:debug} formatting,
wich is intended for internal use. Error messages should use the
{:user} formatting instead.

cast_prepare_expression uses the default formatting in a few places
that are user facing, so let's change it to use {:user} formatting.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 18:31:45 +01:00
Jan Ciolek
12560b5745 prepare_expr: remove std::get<> in cast_prepare_expression
A few times throughout cast_prepare_expression there's
a line which uses std::get<> to get the raw type of the cast.
`std::get<shared_ptr<cql3_type::raw>>(c.type)`

This is a dangerous thing to do. It might turn out that the variant
holds a different alternative and then it'll start throwing bad_variant_access.

In this case this would happen if someone called cast_prepare_expression
on an expression that is already prepared.

It's possible to modify the code in a way that avoids doing the std::get
altogether.
It makes the code more resilient and gives me a piece of mind.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 18:31:45 +01:00
Jan Ciolek
7c384de476 prepare_expr: improve cast_prepare_expression
Preparing expr::cast had some artificial limitations.
Things like this worked:
`blob_col = (blob)funcReturnsInt()`
But this didn't:
`blob_col = (blob)(int)1234`

This is caused by the line:
`prepare_expression(c.arg, db, keyspace, schema_opt, receiver)`

Here the code prepares the expression to be cast using the original
receiver which was passed to cast_prepare_expression.

In the example above this meant that it tried to prepare
untyped_constant(1234) using a receiver with type blob.
This failed because an integer literal is invalid for a blob column.

To me it looks like a mistake. What it should do instead
is prepare the int literal using the type (int) and then
see if int can be cast to blob, by checking if these types
have compatible binary representation.

This can be achieved by using `cast_type_receiver` instead of `receiver`.

Making this small change makes it possible to use the cast
in many situations where it was previously impossible.
The tests have to be updated to reflect the change,
some of them ow deviate from Cassandra, so they have
to be marked scylla_only.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 18:31:41 +01:00
Piotr Sarna
62458b8e4f cql3: enforce permissions on function calls
Only users with EXECUTE permission are able to use the function
in SELECT statements.
2023-03-09 17:51:17 +01:00
Piotr Sarna
4624934032 selection: add a getter for used functions
The function allows extracting used function definitions
from given selection. Thanks to that, it will be possible
to verify if the callee has proper permissions to execute
given functions.
2023-03-09 17:51:17 +01:00
Piotr Sarna
d95912c369 abstract_function_selector: expose underlying function
It will be needed later in order to check this function's permissions.
2023-03-09 17:51:17 +01:00
Piotr Sarna
488934e528 cql3: enforce permissions on DROP FUNCTION
Only users with DROP permission are allowed to drop
user-defined functions.
2023-03-09 17:51:15 +01:00
Piotr Sarna
e8afcf7796 cql3: enforce permissions for CREATE FUNCTION
Only users with CREATE permissions are allowed to create
user-defined functions.
2023-03-09 17:50:56 +01:00
Piotr Sarna
d10799a834 client_state: add functions for checking function permissions
The helper functions will be later used to enforce permissions
for user-defined functions.
2023-03-09 17:50:56 +01:00
Piotr Sarna
8de1017691 cql-pytest: add a case for serializing function permissions
This test case checks that granting function permissions
result in correct serialization of the permissions - so that
reading system_auth.role_permissions and listing the permissions
via CQL with `LIST permission OF role` works in a compatible way
with both Scylla and Cassandra.
2023-03-09 17:50:56 +01:00
Piotr Sarna
aa4c15a44a cql3: allow specifying function permissions in CQL
This commit allows users to specify the following resources:
 - ALL FUNCTIONS
 - ALL FUNCTIONS IN KEYSPACE ks
 - FUNCTION f(int, double)

The permissions set for these resources are not enforced yet.
2023-03-09 17:50:56 +01:00
Piotr Sarna
5b662dd447 auth: add functions_resource to resources
This commit adds "functions" resource to our authorization
resources. The implementation strives to be compatible
with Cassandra both from CQL level and serialization,
i.e. so that entries in system_auth.role_permissions table
will be identical if CassandraAuthorizer is used.

This commit adds a way of representing these resources
in-memory, but they are not enforced as permissions yet.

The following permissions are supported:
```
CREATE ALL FUNCTIONS
CREATE ALL FUNCTIONS IN KEYSPACE <ks>

ALTER ALL FUNCTIONS
ALTER ALL FUNCTIONS IN KEYSPACE <ks>
ALTER FUNCTION <f>

DROP ALL FUNCTIONS
DROP ALL FUNCTIONS IN KEYSPACE <ks>
DROP FUNCTION <f>

AUTHORIZE ALL FUNCTIONS
AUTHORIZE ALL FUNCTIONS IN KEYSPACE <ks>
AUTHORIZE FUNCTION <f>

EXECUTE ALL FUNCTIONS
EXECUTE ALL FUNCTIONS IN KEYSPACE <ks>
EXECUTE FUNCTION <f>
```
as per
https://cassandra.apache.org/doc/latest/cassandra/cql/security.html#cql-permissions
2023-03-09 17:50:19 +01:00
Jan Ciolek
e4a3e2ac14 cql-pytest/test_lwt: test LWT update with empty clustering range
Add a test case which performs an LWT UPDATE, but the clustering key
has 0 possible values, because it's supposed to be equal to two
different values.

This currently causes a crash, see https://github.com/scylladb/scylladb/issues/13129

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 15:44:10 +01:00
Jan Ciolek
5e5e4c5323 cql-pytest/test_lwt: test LWT update with empty partition range
Add a test case which performs an LWT UPDATE, but the partition key
has 0 possible values, because it's supposed to be equal to two
different values.
Such queries used to cause problems in the past.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-09 15:43:24 +01:00
Anna Stuchlik
6aff78ded2 doc: Remove Enterprise content from OSS docs
Related: https://github.com/scylladb/scylladb/issues/13119

This commit removes the pages that describe Enterprise only features
from the Open Source documentation:
- Encryption at Rest
- Workload Prioritization
- LDAP Authorization
- LDAP Authentication
- Audit

In addition, it removes most of the information about Incremental
Compaction Strategy (ICS), which is replaced with links to the
Enterprise documentation.

The changes above required additional updates introduced with this
commit:
- The links to Enterprise-only features are replaced with the
  corresponding links in the Enterprise documentation.
- The redirections are added for the removed pages to be redirected to
  the corresponding pages in the Enterprise documentation.

This commit must be reverted in the scylla-enterprise repository to
avoid deleting the Enterprise-only content from the Enterprise docs.

Closes #13123
2023-03-09 15:40:43 +02:00
Botond Dénes
11dde4b80b reader_permit: add wait_for_execution state
Used while the permit is in the _ready_list, waiting for the execution
loop to pick it up. This just acknowledging the existence of this
wait-state. This state will now show up in permit diagnostics printouts
and we can now determine whether a permit is waiting for execution,
without checking which queue it is in.
2023-03-09 07:11:51 -05:00
Botond Dénes
6229f8b1a6 reader_concurrency_semaphore: make wait lists intrusive
Instead of using expiring_fifo to store queued permits, use the same
intrusive list mechanism we use to keep track of all permits.
Permits are now moved between the _permit_list and the wait queues,
depending on which state they are in. This means _permit_list is now not
the definitive list containing all permits, instead it is the list
containing all permits that are not in a more specialized queue at the
moment.
Code wishing to iterate over all permits should now use
foreach_permits(). For outside code, this was already the only way and
internal users are already patched.
Making the wait lists intrusive allows us to dequeue a permit from any
position, with nothing but a permit reference at hand. It also means
the wait queues don't have any additional memory requirements, other
than the memory for the permit itself.
Timeout while being queued is now handled by the permit's on_timeout()
callback.
2023-03-09 07:11:49 -05:00
Benny Halevy
0f07a24889 storage_service: node_ops_signal_abort: print a warning when signaling abort
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-09 14:10:10 +02:00
Benny Halevy
2a1015dced storage_service: s/node_ops_singal_abort/node_ops_signal_abort/
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-09 14:09:09 +02:00
Benny Halevy
6394e9acf7 storage_service: node_ops_abort: add log messages
So we can correlate the respective messages
on the node_ops coordinator side.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-09 14:04:56 +02:00
Benny Halevy
3652025062 storage_service: wire node_ops_ctl for node operations
Use the node_ops_ctl methods for the basic
flow of: start, start_heartbeat_updater, prepare,
send_to_all, done|abort

As well for querying pending ops for decommission.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-09 14:02:31 +02:00
Botond Dénes
9ea9a48dbc reader_concurrency_semaphore: move most wait_queue methods out-of-line
They will soon depend on the definition of the reader_permit::impl,
which is only available in the .cc file.
2023-03-09 06:53:11 -05:00
Botond Dénes
1d27dd8f0e reader_concurrency_semaphore: store permits directly in queues
Instead of the `entry` wrapper. In _wait_list and _ready_list, that is.
Data stored in the `entry` wrapper is moved to a new
`reader_permit::auxiliary_data` type. This makes the reader permit
self-sufficient. This in turn prepares the ground for the ability to
de-queue a permit from any queue, with nothing but a permit reference at
hand: no need to have back pointer to wrappers and/or iterators.
2023-03-09 06:53:11 -05:00
Botond Dénes
bcfb8715f9 reader_permit: introduce (private) operator * and ->
Currently the reader_permit has some private methods that only the
semaphore's internal calls. But this method of communication is not
consistent, other times the semaphore accesses the permit impl directly,
calling methods on that.
This commit introduces operator * and -> for reader_permit. With this,
the semaphore internals always call the reader_permit::impl methods
direcly, either via a direct reference, or via the above operators.
This makes the permit internface a little narrower and reduces
boilerplate code.
2023-03-09 06:53:11 -05:00
Botond Dénes
f5b80fdfd8 reader_concurrency_semaphore: remove redundant waiters() member
There is now a field in stats with the same information, use that.
2023-03-09 06:53:11 -05:00
Botond Dénes
74a5981dbe reader_concurrency_semaphore: add waiters counter
Use it to keep track of all permits that are currently waiting on
something: admission, memory or execution.
Currently we keep track of size, by adding up the result of size() of
the various queues. In future patches we are going to change the queues
such that they will not have constant time size anymore, move to an
explicit counter in preperation to that.
Another change this commit makes is to also include ready list entries
in this counter. Permits in the ready list are also waiters, they wait
to be executed. Soon we will have a separate wait state for this too.
2023-03-09 06:53:11 -05:00
Botond Dénes
2694aa1078 reader_permit: use check_abort() for timeout
Instead of having callers use get_timeout(), then compare it against the
current time, set up a timeout timer in the permit, which assigned a new
`_ex` member (a `std::exception_ptr`) to the appropriate exception type
when it fires.
Callers can now just poll check_abort() which will throw when `_ex`
is not null. This is more natural and allows for more general reasons
for aborting reads in the future.
This prepares the ground for timeouts being managed inside the permit,
instead of by the semaphore. Including timing out while in a wait queue.
2023-03-09 06:53:09 -05:00
Benny Halevy
d322bbf6ff storage_service: add node_ops_ctl class to formalize all node_ops flow
All node operations we currently support go through
similar basic flow and may add some op-specific logic
around it.

1. Select the nodes to sync with (this is op specific).
2. hearbeat updater
3. send prepare req
4. perform the body of the node operation
5. send done
--
on any error: send abort

node_ops_ctl formalizes all those steps and makes
sure errors are handled in all steps, and
the error causing abort is not masked by errors
in the abort processing, and is propagated upstream.

Some of the printouts repeat the node operation description
to remain backward compatible so not to break dtests
that wait for them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-09 13:48:34 +02:00
Wojciech Mitros
2fd6d495fa wasm: move compilation to an alien thread
The compilation of wasm UDFs is performed by a call to a foreign
function, which cannot be divided with yielding points and, as a
result, causes long reactor stalls for big UDFs.
We avoid them by submitting the compilation task to a non-seastar
std::thread, and retrieving the result using seastar::alien.

The thread is created at the start of the program. It executes
tasks from a queue in an infinite loop.

All seastar shards reference the thread through a std::shared_ptr
to a `alien_thread_runner`.

Considering that the compilation takes a long time anyway, the
alien_thread_runner is implemented with focus on simplicity more
than on performance. The tasks are stored in an std::queue, reading
and writing to it is synchronized using an std::mutex for reading/
writing to the queue, and an std::condition_variable waiting until
the queue has elements.

When the destructor of the alien runner is called, an std::nullopt
sentinel is pushed to the queue, and after all remaining tasks are
finished and the sentinel is read, the thread finishes.
2023-03-09 11:54:38 +01:00
Botond Dénes
23f4e250c2 reader_concurrency_semaphore: maybe_dump_permit_diagnostics(): remove permit list param
This param is from a time when _permit_list was not accessible from the
outside, so it was passed along the semaphore instance to avoid making
the diagnostics methods friends.
To allow the semaphore freedom in how permits are stored, the
diagnostics code is instead made to use foreach_permit(), instead of
accessing the underlying list directly.
As the diagnostics code wants reader_permit::impl& directly, a new
variant of foreach_permit() passing impl references is introduced.
2023-03-09 05:19:59 -05:00
Botond Dénes
59dc15682b reader_concurrency_semaphroe: make foreach_permit() const
It already is conceptually, as it passes const references to the permits
it iterates over. The only reason it wasn't const before is a technical
issue which is solved here with a const_cast.
2023-03-09 05:19:59 -05:00
Botond Dénes
c86136c853 reader_permit: add get_schema() and get_op_name() accessors 2023-03-09 05:19:59 -05:00
Botond Dénes
9dd2cd07ef reader_concurrency_semaphore: mark maybe_dump_permit_diagnostics as noexcept
It is in fact noexcept and so it is expected to be, so document this.
2023-03-09 05:19:59 -05:00
Benny Halevy
f3d6868738 repair: node_ops_cmd_request: add print function
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-09 11:42:03 +02:00
Benny Halevy
130d6faa06 repair: do_decommission_removenode_with_repair: log ignore_nodes
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-09 11:42:03 +02:00
Benny Halevy
ac13e1f432 repair: replace_with_repair: get ignore_nodes as unordered_set
Prepare for following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-09 11:42:03 +02:00
Benny Halevy
78b0222842 gossiper: get_generation_for_nodes: get nodes as unordered_set
Prepare for following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-09 11:42:03 +02:00
Benny Halevy
28eb11553b storage_service: don't let node_ops abort failures mask the real error
Currently failing to abort a node operation will
throw and mask the original failure handled in the catch block.

See #12333 for example.

Fixes #12798

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-03-09 11:42:03 +02:00
Botond Dénes
49e0d0402d Merge 'sstables: remove unused function add more constness' from Kefu Chai
- sstables: remove unused function
- sstables: mark param of sstable::*_from_sstring() const
- sstables: mark param of reverse_map() const
- sstables: mark static lookup table const

Closes #13115

* github.com:scylladb/scylladb:
  sstables: mark static lookup table const
  sstables: mark param of reverse_map() const
  sstables: mark param of sstable::*_from_sstring() const
  sstables: remove unused function
2023-03-09 11:29:28 +02:00
Pavel Emelyanov
47df084363 test,sstables: Remove path from make_sstable_easy()
The method in question is only called with env's tempdir, so there's no
point in explicitly passing it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:48 +03:00
Pavel Emelyanov
8297ac0082 test,lib: Remove wrapper over reusable_sst and move the comment
There's a wonderful comment describing what the reusable_sst is for near
one of its wrappers. It's better to drop the wrapper and move the
comment to where it belongs.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:48 +03:00
Pavel Emelyanov
27d45df35f test: Make "compact" test case use env dir
Same as most of the previous work -- remove the explicit capturing of
env's tempdir over the test.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:48 +03:00
Pavel Emelyanov
fdff97a294 test,compaction: Use env tempdir in some more cases
Both already do so, but get the tempdir explicitly. It's possible to
make them much shorter by not carrying this variable over the code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:48 +03:00
Pavel Emelyanov
19ef07b059 test,compaction: Make check_compacted_sstables() use env's dir
It's in fact using it already via argument. Next patch will do the same
with another call, but having this change separately makes the next
patch shorter and easier to review.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:48 +03:00
Pavel Emelyanov
ef8928f2cc test: Relax making sstable with sequential generation
Many test cases populate sstable with a factory that at the same time
serves as a stable maintainer of a monitomic generation. Those can be
greately relaxed by re-using the recently introduced generation from the
test_env.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:48 +03:00
Pavel Emelyanov
be7f4ff53a test/sstable::test_env: Keep track of auto-incrementing generation
Lots of test cases make sstables with monotonically incrementing
generation values. In Scylla code this counter is maintained in class
table, but sstable tests not always have it. To mimic this behavior, the
test_env can keep track of the generation, so that callers just don't
mess with it (next patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:48 +03:00
Pavel Emelyanov
bc20879971 test/lib: Add sstable maker helper without factory
There's a make_sstable_containing() helper that creates sstable and
populates it with mutations (and makes some post validation). The helper
accepts a factory function that should make sstable for it.

This patch shuffles this helper a bit by introducing an overload that
populates (and validates) the already existing sstable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:48 +03:00
Pavel Emelyanov
2bbc59dd58 test: Remove last occurrence of test_env::do_with(rval, ...)
There's the lonely test case that uses the mentioned template to carry
its own instance of tempdir over its lifetime. Patch the case to re-use
the already existing env's tempdir and drop the template.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:48 +03:00
Pavel Emelyanov
4bd79dc900 test,sstables: Dont mess with tempdir where possible
Beneficiary of the previuous patch -- those cases that make sstables in
env's tempdir can now enjoy not mentioning this explicitly and letting
the env specify the sstable making path itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:48 +03:00
Pavel Emelyanov
dfcfe0a355 test/sstable::test_env: Add dir-less sstables making helpers
Lots of (most of) test cases out there generate sstables inside env's
temporary directory. This patch adds some sugar to env that will allow
test cases omit explicit env.tempdir() call.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:48 +03:00
Pavel Emelyanov
d28589a2f7 test,sstables: Use sstables::test_env's tempdir with sweeper
Continuation of the previous patch. Some test cases are sensitive to
having the temp directory clean, so patch them similarly, but equip with
the sweeper on entry instead of their own temprid instance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:48 +03:00
Pavel Emelyanov
904853cd7b test,sstables: Use sstables::test_env's tempdir
The one is maintained by the env throughout its lifetime. For many test
cases there's no point in generating tempdir on their own, so just
switch to using env's one.

The code gets longer lines, but this is going to change really soon.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:47 +03:00
Pavel Emelyanov
21e70e7edd test/lib: Add tempdir sweeper
This is a RAII-sh helper that cleans temp directory on destruction. To
be used in cases when a test needs to do several checks over clean
temporary directory (future patches).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:47 +03:00
Pavel Emelyanov
090e007e30 test/lib: Open-code make_sstabl_easy into make_sstable
The former helper is going to get rid of the fs::path& dir argument,
but the latter cannot yet live without it. The simplest solution is to
open-code the helper until better times.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:47 +03:00
Pavel Emelyanov
8d727701a4 test: Remove vector of mutation interposer from test_key_count_estimation
The test generates a vector of mutation to be later passed into
make_sstable() helper which just applies them to memtable. The test case
can generate memtable directly. This makes it possible to stop using the
local tempdir in this test case by future patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-09 08:21:47 +03:00
Kefu Chai
87a6cb5925 sstables: mark static lookup table const
these tables are mappings from symbolic names to their string
representation. we don't mutate them. so mark them const.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-09 12:40:37 +08:00
Kefu Chai
c18709d4a1 sstables: mark param of reverse_map() const
it does not mutate the map in which the value is looked up, so let's
mark map const. also, take this opportunity to use structured binding
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-09 12:40:37 +08:00
Kefu Chai
4128ab2029 sstables: mark param of sstable::*_from_sstring() const
neither of the changed function mutates the parameter.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-09 12:40:37 +08:00
Kefu Chai
c211b272f7 sstables: remove unused function
`sstable::version_from_sstring()` is used nowhere, so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-09 12:40:37 +08:00
Avi Kivity
25cf325674 Merge 'api: s/request/http::request/' from Kefu Chai
- api: reference httpd::* symbols like 'httpd::*'
- alternator: using chrono_literals before using it
- api: s/request/http::request/

the last two commits were inspired Pavel's comment of

> It looks like api/ code was caught by some using namespace seastar::httpd shortcut.

they should be landed before we merge and include https://github.com/scylladb/seastar/pull/1536 in Scylla.

Closes #13095

* github.com:scylladb/scylladb:
  api: reference httpd::* symbols like 'httpd::*'
  alternator: using chrono_literals before using it
  api: s/request/http::request/
2023-03-08 18:08:21 +02:00
Avi Kivity
a96fcdaac6 Merge 'distributed_loader: print log without using fmt::format() and fix of typo' from Kefu Chai
- distributed_loader: print log without using fmt::format()
- distributed_loader: correct a typo in comment

Closes #13108

* github.com:scylladb/scylladb:
  distributed_loader: correct a typo in comment
  distributed_loader: print log without using fmt::format()
2023-03-08 17:55:25 +02:00
Kefu Chai
3488b68413 build: cmake: link Boost::regex against ICU::uc
Boost::regex references icu_67::Locale::Locale, so let's fix this.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 22:53:42 +08:00
Kefu Chai
51ff2907b8 build: cmake: link sstables against libdeflate
sstables is the only place where libdefalte is used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 22:53:42 +08:00
Kefu Chai
2a18d470cc build: cmake: add missing sources to test-lib
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 22:53:42 +08:00
Kefu Chai
0b3d25ab1b build: cmake: add missing linkages
these dependencies were found when trying to compile
`user_function_test`. whenever a library libfoo references another one,
say, libbar, the corresponding linkage from libfoo to libbar is added.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 22:53:42 +08:00
Kefu Chai
21a7c439bb build: cmake: find Snappy before using it
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 22:53:42 +08:00
Kefu Chai
c8f762b6d0 build: cmake: extract scylla-main out
so tests and other libraries can link against it. also, drop the unused
abseil library linkages.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 22:53:42 +08:00
Kefu Chai
d07adcbe74 build: cmake: extract index, repair and data_dictionary out
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 22:53:42 +08:00
Kefu Chai
b1484a2a5f build: cmake: document add_scylla_test()
this change reuses part of Botond Dénes's work to add a full-blown
CMakeLists.txt to build scylla.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 22:26:30 +08:00
Kefu Chai
b0433bf82b build: cmake: remove test which does not exist yet
it was an oversight in 11124ee972,
which added a test not yet included master HEAD yet. so let's
drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 22:26:30 +08:00
Nadav Har'El
a4a318f394 cql: USING TTL 0 means unlimited, not default TTL
Our documentation states that writing an item with "USING TTL 0" means it
should never expire. This should be true even if the table has a default
TTL. But Scylla mistakenly handled "USING TTL 0" exactly like having no
USING TTL at all (i.e., it took the default TTL, instead of unlimited).
We had two xfailing tests demonstrating that Scylla's behavior in this
is different from Cassandra. Scylla's behavior in this case was also
undocumented.

By the way, Cassandra used to have the same bug (CASSANDRA-11207) but
it was fixed already in 2016 (Cassandra 3.6).

So in this patch we fix Scylla's "USING TTL 0" behavior to match the
documentation and Cassandra's behavior since 2016. One xfailing test
starts to pass and the second test passes this bug and fails on a
different one. This patch also adds a third test for "USING TTL ?"
with UNSET_VALUE - it behaves, on both Scylla and Cassandra, like a
missing "USING TTL".

The origin of this bug was that after parsing the statement, we saved
the USING TTL in an integer, and used 0 for the case of no USING TTL
given. This meant that we couldn't tell if we have USING TTL 0 or
no USING TTL at all. This patch uses an std::optional so we can tell
the case of a missing USING TTL from the case of USING TTL 0.

Fixes #6447

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13079
2023-03-08 16:18:23 +02:00
Kefu Chai
43b6f7d8d3 distributed_loader: correct a typo in comment
s/to many/too many/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 18:17:43 +08:00
Kefu Chai
b6991f5056 distributed_loader: print log without using fmt::format()
logger.info() is able to format the given arguments with the format
string, so let's just let it do its job.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-08 18:17:43 +08:00
Alejo Sanchez
f55e91d797 gms, service: live endpoint copy method
Move replication logic for live endpoint across shards to a separate
method

This will be used by API get alive nodes.

As this is now in a method and outside gossiper::run(), assert it's
called from shard 0.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-03-08 10:45:35 +01:00
Nadav Har'El
beb9a8a9fd docs/alternator: recommend to disable auto_snapshot
In issue #5283 we noted that the auto_snapshot option is not useful
in Alternator (as we don't offer any API to restore the snapshot...),
and suggested that we should automatically disable this option for
Alternator tables. However, this issue has been open for more than three
years, and we never changed this default.

So until we solve that issue - if we ever do - let's add a paragraph
in docs/alternator/alternator.md recommending to the user to disable
this option in the configuration themselves. The text explains why,
and also provides a link to the issue.

Refs #5283

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13103
2023-03-08 10:50:59 +02:00
Jan Ciolek
0417c48bdc cql-pytest: test unset value in UPDATE and LWT UPDATE
Add a test which performs an UPDATE and
tries to pass an UNSET_VALUE as a value
for the primary key.

There is also an LWT variant of this test
that tries to set an UNSET_VALUE
in the IF condition.

These two tests are analogous to
test_insert_update_where and
test_insert_update_where_lwt,
but use an UPDATE instead of INSERT.

It's useful to test UPDATE as well as INSERT.
When I was developing a fix for #13001
I initially added the condition for unset value
inside insert_statement, but this didn't handle
update statements. These two tests allowed me
to see that UPDATE still causes a crash.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #13058
2023-03-08 10:39:26 +02:00
Raphael S. Carvalho
3fae46203d replica: Fix undefined behavior in table::generate_and_propagate_view_updates()
Undefined behavior because the evaluation order is undefined.

With GCC, where evaluation is right-to-left, schema will be moved
once it's forwarded to make_flat_mutation_reader_from_mutations_v2().

The consequence is that memory tracking of mutation_fragment_v2
(for tracking only permit used by view update), which uses the schema,
can be incorrect. However, it's more likely that Scylla will crash
when estimating memory usage for row, which access schema column
information using schema::column_at(), which in turn asserts that
the requested column does really exist.

Fixes #13093.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13092
2023-03-08 07:38:55 +02:00
Nadav Har'El
ef50e4022c test: drop our "pytest" wrapper script
When Fedora 37 came out, we discovered that its "pytest" script started
to run Python with the "-s" option, which caused problems for packages
installed personally via pip. We fixed this by adding our own wrapper
script test/pytest.

But this bug (https://bugzilla.redhat.com/show_bug.cgi?id=2152171) was
already fixed in Fedora 37, and the new version already reached our
dbuild. So we no longer need this wrapper script. Let's remove it.

Fixes #12412

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13083
2023-03-08 07:31:37 +02:00
Jan Ciolek
63a7235017 prepare_expr: improve readability in cast_prepare_expression
cast_prepare_expression takes care of preparing expr::cast,
which is responsible for CQL C-style casts.

At the first glance it can be hard to figure out what exactly
does it do, so I added some comments to make things clearer.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-08 03:24:17 +01:00
Jan Ciolek
03d37bdc14 cql-pytest: test expr::cast in test_cast.py
CQL supports C-style casts with the destination type specified
inside parenthesis e.g `blob_column = (blob)funcThatReturnsInt()`.

These casts can be used to convert values of types
that have compatible binary representation, or as a type hint
to specify the type where the situation is ambiguous.

I didn't find any cql-pytest tests for this feature,
so I added some.

It looks like the feature works, but only partially.
Doing things like this works:
`blob_column = (blob)funcThatReturnsInt()`
But trying to do something a bit more complex fails:
`blob_column = (blob)(int)1234`

This is the case in both Cassandra and Scylla,
the tests introduced in this commit pass on both of them.

In future commits I will extend this feature
to support the more complex cases as well,
then some tests will have to be marked scylla_only.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-03-08 03:24:13 +01:00
Nadav Har'El
cdedc79050 cql: add configurable restriction of minimum RF
We have seen users unintentionally use RF=1 or RF=2 for a keyspace.
We would like to have an option for a minimal RF that is allowed.

Cassandra recently added, in Cassandra 4.1 (see apache/cassandra@5fdadb2
and https://issues.apache.org/jira/browse/CASSANDRA-14557), exactly such
a option, called "minimum_keyspace_rf" - so we chose to use the same option
name in Scylla too. This means that unlike the previous "safe mode"
options, the name of this option doesn't start with "restrict_".

The value of the minimum_keyspace_rf option is a number, and lower
replication factors are rejected with an error like:

  cqlsh> CREATE KEYSPACE x WITH REPLICATION = { 'class' : 'SimpleStrategy',
         'replication_factor': 2 };

  ConfigurationException: Replication factor replication_factor=2 is
  forbidden by the current configuration setting of minimum_keyspace_rf=3.
  Please increase replication factor, or lower minimum_keyspace_rf set in
  the configuration.

This restriction applies to both CREATE KEYSPACE and ALTER KEYSPACE
operations. It applies to both SimpleStrategy and NetworkTopologyStrategy,
for all DCs or a specific DC. However, a replication factor of zero (0)
is *not* forbidden - this is the way to explicitly request not to
replicate (at all, or in a specific DC).

For the time being, minimum_keyspace_rf=0 is still the default, which
means that any replication factor is allowed, as before. We can easily
change this default in a followup patch.

Note that in the current implementation, trying to use RF below
minimum_keyspace_rf is always an error - we don't have a syntax
to make into just a warning. In any case the error message explains
exactly which configuration option is responsible for this restriction.

Fixes #8891.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #9830
2023-03-07 19:04:06 +02:00
Kamil Braun
2b44631ded Merge 'storage_service: Make node operations safer by detecting asymmetric abort' from Tomasz Grabiec
This patch fixes a problem which affects decommission and removenode
which may lead to data consistency problems under conditions which
lead one of the nodes to unliaterally decide to abort the node
operation without the coordinator noticing.

If this happens during streaming, the node operation coordinator would
proceed to make a change in the gossiper, and only later dectect that
one of the nodes aborted during sending of decommission_done or
removenode_done command. That's too late, because the operation will
be finalized by all the nodes once gossip propagates.

It's unsafe to finalize the operation while another node aborted. The
other node reverted to the old topolgy, with which they were running
for some time, without considering the pending replica when handling
requests. As a result, we may end up with consistency issues. Writes
made by those coordinators may not be replicated to CL replicas in the
new topology. Streaming may have missed to replicate those writes
depending on timing.

It's possible that some node aborts but streaming succeeds if the
abort is not due to network problems, or if the network problems are
transient and/or localized and affect only heartbeats.

There is no way to revert after we commit the node operation to the
gossiper, so it's ok to close node_ops sessions before making the
change to the gossiper, and thus detect aborts and prevent later aborts
after the change in the gossiper is made. This is already done during
bootstrap (RBNO enabled) and replacenode. This patch canges removenode
to also take this approach by moving sending of remove_done earlier.

We cannot take this approach with decommission easily, because
decommission_done command includes a wait for the node to leave the
ring, which won't happen before the change to the gossiper is
made. Separating this from decommission_done would require protocol
changes. This patch adds a second-best solution, which is to check if
sessions are still there right before making a change to the gossiper,
leaving decommission_done where it was.

The race can still happen, but the time window is now much smaller.

The PR also lays down infrastructure which enables testing the scenarios. It makes node ops
watchdog periods configurable, and adds error injections.

Fixes #12989
Refs #12969

Closes #13028

* github.com:scylladb/scylladb:
  storage_service: node ops: Extract node_ops_insert() to reduce code duplication
  storage_service: Make node operations safer by detecting asymmetric abort
  storage_service: node ops: Add error injections
  service: node_ops: Make watchdog and heartbeat intervals configurable
2023-03-07 17:36:51 +01:00
Nadav Har'El
e69c9069d6 Merge 'build: enable more warnings' from Kefu Chai
when comparing the disabled warnings specified by `configured.py` and the ones specified by `cmake/mode.common.cmake`, it turns out we are now able to enable more warning options. so let's enable them. the change was tested using Clang-17 and GCC-13.

there are many errors from GCC-13, like:

```
/home/kefu/dev/scylladb/db/view/view.hh:114:17: error: declaration of ‘column_kind db::view::clustering_or_static_row::column_kind() const’ changes meaning of ‘column_kind’ [-fpermissive]
  114 |     column_kind column_kind() const {
      |                 ^~~~~~~~~~~
```
so the build with GCC failed.

and with this change, Clang-17 is able to build build the tree without warnings.

Closes #13096

* github.com:scylladb/scylladb:
  build: enable more warnings
  test: do not initialize plain number with {}
  test: do not initialize a time_t with braces
2023-03-07 17:37:54 +02:00
Wojciech Mitros
4609a45ce3 wasm: convert compilation to a future
After we move the compilation to a alien thread, the completion
of the compilation will be signaled by fulfilling a seastar promise.
As a result, the `precompile` function will return a future, and
because of that, other functions that use the `precompile` functions
will also become futures.
We can do all the neccessary adjustments beforehand, so that the actual
patch that moves the compilation will contain less irrelevant changes.
2023-03-07 14:27:38 +01:00
Avi Kivity
6aa91c13c5 Merge 'Optimize topology::compare_endpoints' from Benny Halevy
The code for compare_endpoints originates at the dawn of time (bc034aeaec)
and is called on the fast path from storage_proxy via `sort_by_proximity`.

This series considerably reduces the function's footprint by:
1. carefully coding the many comparisons in the function so to reduce the number of conditional banches (apparently the compiler isn't doing a good enough job at optimizing it in this case)
2. avoid sstring copy in topology::get_{datacenter,rack}

Closes #12761

* github.com:scylladb/scylladb:
  topology: optimize compare_endpoints
  to_string: add print operators for std::{weak,partial}_ordering
  utils: to_sstring: deinline std::strong_ordering print operator
  move to_string.hh to utils/
  test: network_topology: add test_topology_compare_endpoints
2023-03-07 15:17:19 +02:00
Kamil Braun
fe14d14ce9 Merge 'Eliminate extraneous copies of dht::token_range_vector' from Benny Halevy
In several places we copy token range vectors where we could move them and eliminate unnecessary memory copies.

Ref #11005

Closes #12344

* github.com:scylladb/scylladb:
  dht/range_streamer: stream_async: move ranges_to_stream to do_streaming
  streaming: stream_session: maybe_yield
  streaming: stream_session: prepare: move token ranges to add_transfer_ranges
  streaming: stream_plan: transfer_ranges: move token ranges towards add_transfer_ranges
  dht/range_streamer: stream_async: do_streaming: move ranges downstream
  dht/range_streamer: add_ranges: clear_gently ranges_for_keyspace
  dht/range_streamer: get_range_fetch_map: reduce copies
  dht/range_streamer: add_ranges: move ranges down-stream
  dht/boot_strapper: move ranges to add_ranges
  dht/range_streamer: stream_async: incrementally update _nr_ranges_remaining
  dht/range_streamer: stream_async: erase from range_vec only after do_streaming success
2023-03-07 13:46:33 +01:00
Nadav Har'El
f05ea80fb5 test/cql-pytest: remove unused async marker
One test in test/cql-pytest/test_batch.py accidentally had the asyncio
marker, despite not using any async features. Remove it. The test still
runs fine.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13002
2023-03-07 14:33:34 +02:00
Botond Dénes
3f0ace0114 Merge 'cmake: sync with configure.py (10/n)' from Kefu Chai
- build: cmake: use different names for output of check_cxx_compiler_flag
- build: cmake: only add supported warning flags to CMAKE_CXX_FLAGS
- build: cmake: limit the number of link job

Closes #13098

* github.com:scylladb/scylladb:
  build: cmake: limit the number of link job
  build: cmake: only add supported warning flags to CMAKE_CXX_FLAGS
  build: cmake: use different names for output of check_cxx_compiler_flag
2023-03-07 14:24:26 +02:00
Kefu Chai
063b3be8a7 api: reference httpd::* symbols like 'httpd::*'
it turns out we have `using namespace httpd;` in seastar's
`request_parser.rl`, and we should not rely on this statement to
expose the symbols in `seatar::httpd` to `seastar` namespace.
in this change,

* api/*.hh: all httpd symbols are referenced by `httpd::*`
  instead of being referenced as if they are in `seastar`.
* api/*.cc: add `using namespace seastar::httpd`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-07 18:21:03 +08:00
Kefu Chai
a37610f66a alternator: using chrono_literals before using it
we should assume that some included header does this for us.

we'd have following compiling failure if seastar's
src/http/request_parser.rl does not `using namespace httpd;` anymore.

```
/home/kefu/dev/scylladb/alternator/streams.cc:433:55: error: no matching literal operator for call to 'operator""h' with argument of type 'unsigned long long' or 'const char *', and no matching literal operator template
static constexpr auto dynamodb_streams_max_window = 24h;
                                                      ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-07 18:20:36 +08:00
Vlad Zolotarov
ae6724f155 transport: refactor CQL metrics
This patch reorganizes and extends CQL related metrics.
Before this patch we only had counters for specific CQL requests.

However, many times we need to reason about the size of CQL queries: corresponding
requests and response sizes.

This patch adds corresponding metrics:
  - Arranges all 3 per-opcode statistics counters in a single struct.
  - Defines a vector of such structs for each CQL opcode.
  - Adjusts statistics updates accordingly - the code is much simpler
    now.
  - Removes old metrics that were accounting some CQL opcodes.
  - Adds new per-opcode metrics for requests number, request and response sizes:
     - New metrics are of a derived kind - rate() should be applied to them.
     - There are 3 new metrics names:
       - 'cql_requests_count'
       - 'cql_request_bytes'
       - 'cql_response_bytes'
     - New metrics have a per-opcode label - 'kind'.

 For example:

 A number of response bytes for an EXECUTE opcode on shard 0 looks as follows:

 scylla_transport_cql_response_bytes{kind="EXECUTE",shard="0"}

Ref #13061

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20230302154816.299721-1-vladz@scylladb.com>
2023-03-07 12:02:34 +02:00
Kefu Chai
577b1c679c build: enable more warnings
when comparing the disabled warnings specified by `configured.py`
and the ones specified by `cmake/mode.common.cmake`, it turns out
we are now able to enable more warning options. so let's enable them.
the change was tested using Clang-17 and GCC-13.

there are many errors from GCC-13, like:

```
/home/kefu/dev/scylladb/db/view/view.hh:114:17: error: declaration of ‘column_kind db::view::clustering_or_static_row::column_kind() const’ changes meaning of ‘column_kind’ [-fpermissive]
  114 |     column_kind column_kind() const {
      |                 ^~~~~~~~~~~
```
so the build with GCC failed.

and with this change, Clang-17 is able to build build the tree without
warnings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-07 17:54:53 +08:00
Kefu Chai
f0659cb1bb test: do not initialize plain number with {}
this silences warnings like:

```
test/boost/secondary_index_test.cc:1578:5: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init]
    { -7509452495886106294 },
    ^~~~~~~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-07 17:54:53 +08:00
Kefu Chai
7331edbc7a test: do not initialize a time_t with braces
time_t is defined as a "Arithmetic type capable of representing times".
so we can just initialize it with 0 without braces. this change should
silence warning like:

```
test/boost/aggregate_fcts_test.cc:238:45: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init]
            auto tp = db_clock::from_time_t({ 0 }) + std::chrono::milliseconds(1);
                                            ^~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-07 17:54:53 +08:00
Pavel Emelyanov
a0718d2097 test: Don't populate / with sstables
The sstable_compaction_test::simple_backlog_controller_test makes
sstables with empty dir argument. Eventually this means that sstables
happen in / directory [1], which's not nice.

As a side effect this also makes sstable::storage::prefix() returns
empty string which, in turn, confuses the code that tries to analyze the
prefix contents (refs: #13090)

[1] See, e.g. logs from https://jenkins.scylladb.com/job/releng/job/Scylla-CI/4757/consoleText

```
INFO  2023-03-06 21:23:04,536 [shard 0] compaction - [Compact ks.cf 51489760-bc54-11ed-a08c-7d3f1d77e2e4] Compacting [/la-1-big-Data.db:level=0:origin=]
```

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13094
2023-03-07 11:44:33 +02:00
Kefu Chai
4da82b4117 data_dictionary: mark dtor of user_types_storage virtual
we have another solution, to mark db_user_types_storage `final`. as we
don't destruct `db_user_types_storage` with a pointer to any of its base
classes. but it'd be much simpler to just mark the dtor virtual of the
first base class which has virtual method(s). it's much idiomatic this
way, and less error-prune.

this change should silence following warning:

```
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/stl_construct.h:88:2: error: destructor called on non-final 'replica::db_user_types_storage' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
        __location->~_Tp();
        ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/stl_construct.h:149:12: note: in instantiation of function template specialization 'std::destroy_at<replica::db_user_types_storage>' requested here
      std::destroy_at(__pointer);
           ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/alloc_traits.h:674:9: note: in instantiation of function template specialization 'std::_Destroy<replica::db_user_types_storage>' requested here
        { std::_Destroy(__p); }
               ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/shared_ptr_base.h:613:28: note: in instantiation of function template specialization 'std::allocator_traits<std::allocator<void>>::destroy<replica::db_user_types_storage>' requested here
        allocator_traits<_Alloc>::destroy(_M_impl._M_alloc(), _M_ptr());
                                  ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/shared_ptr_base.h:599:2: note: in instantiation of member function 'std::_Sp_counted_ptr_inplace<replica::db_user_types_storage, std::allocator<void>, __gnu_cxx::_S_atomic>::_M_dispose' requested here
        _Sp_counted_ptr_inplace(_Alloc __a, _Args&&... __args)
        ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/shared_ptr_base.h:972:6: note: in instantiation of function template specialization 'std::_Sp_counted_ptr_inplace<replica::db_user_types_storage, std::allocator<void>, __gnu_cxx::_S_atomic>::_Sp_counted_ptr_inplace<replica::database &>' requested here
            _Sp_cp_type(__a._M_a, std::forward<_Args>(__args)...);
            ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/shared_ptr_base.h:1712:14: note: in instantiation of function template specialization 'std::__shared_count<>::__shared_count<replica::db_user_types_storage, std::allocator<void>, replica::database &>' requested here
        : _M_ptr(), _M_refcount(_M_ptr, __tag, std::forward<_Args>(__args)...)
                    ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/shared_ptr.h:464:4: note: in instantiation of function template specialization 'std::__shared_ptr<replica::db_user_types_storage>::__shared_ptr<std::allocator<void>, replica::database &>' requested here
        : __shared_ptr<_Tp>(__tag, std::forward<_Args>(__args)...)
          ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/shared_ptr.h:1009:14: note: in instantiation of function template specialization 'std::shared_ptr<replica::db_user_types_storage>::shared_ptr<std::allocator<void>, replica::database &>' requested here
      return shared_ptr<_Tp>(_Sp_alloc_shared_tag<_Alloc>{__a},
             ^
/home/kefu/dev/scylladb/replica/database.cc:313:24: note: in instantiation of function template specialization 'std::make_shared<replica::db_user_types_storage, replica::database &>' requested here
    , _user_types(std::make_shared<db_user_types_storage>(*this))
                       ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13062
2023-03-07 10:36:03 +02:00
Wojciech Mitros
d4851ccae7 treewide: rename the "xwasm" UDF language to "wasm"
When the WASM UDFs were first introduced, the LANGUAGE required in
the CQL statements to use them was "xwasm", because the ABI for the
UDFs was still not specified and changes to it could be backwards
incompatible.
Now, the ABI is stabilized, but if backwards incompatible changes
are made in the future, we will add a new ABI version for them, so
the name "xwasm" is no longer needed and we can finally
change it to "wasm".

Closes #13089
2023-03-07 10:21:11 +02:00
Botond Dénes
d1619eb38a Merge 'Remove qctx from helpers that retrieve truncation record' from Pavel Emelyanov
There are two places that do it -- commitlog and batchlog replayers. Both can have local system-keyspace reference and use system-keyspace local query-processor for it. The peering save_truncation_record() is not that simple and is not patched by this PR

Closes #13087

* github.com:scylladb/scylladb:
  system_keyspace: Unstatic get_truncation_record()
  system_keyspace: Unstatic get_truncated_at()
  batchlog_manager: Add system_keyspace dependency
  main: Swap batchlog manager and system keyspace starts
  system_keyspace: Unstatic get_truncated_position()
  system_keyspace: Remove unused method
  commitlog: Create commitlog_replayer with system keyspace
  test: Make cql_test_env::get_system_keyspace() return sharded
  commiltlog: Line-up field definitions
2023-03-07 10:19:55 +02:00
Nadav Har'El
e7f9e57d64 docs/alternator: link to issue about too many stream shards
docs/alternator/compatibility.md mentions a known problem that
Alternator Streams are divided into too many "shards". This patch
add a link to a github issue to track our work on this issue - like
we did for most other differences mentioned in compatibility.md.

Refs #13080

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13081
2023-03-07 10:04:13 +02:00
Kefu Chai
b25a6d5a9c build: cmake: limit the number of link job
this mirrors the settings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-07 15:34:12 +08:00
Kefu Chai
5e38845057 build: cmake: only add supported warning flags to CMAKE_CXX_FLAGS
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-07 15:24:02 +08:00
Kefu Chai
2b23de31ca build: cmake: use different names for output of check_cxx_compiler_flag
* use the value of disabled_warnings, not the variable name for warning
  options, otherwise we'd checking options like `-Wno-disabled_warnings`.
* use different names for the output of check_cxx_compiler_flag() calls.
  as the output variable of check_cxx_compiler_flag(..) call is cached,
  we cannot reuse it for checking different warning options,

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-07 15:24:02 +08:00
Kefu Chai
5522080f80 api: s/request/http::request/
seastar::httpd::request was deprecated in favor of `seastar::http::request`
since bdd5d929891d2cb821eca25896e25ed4ff658b7a.
so let's use the latter. this change also silences the warning of:

```
/home/kefu/dev/scylladb/api/authorization_cache.cc: In function ‘void api::set_authorization_cache(http_context&, seastar::httpd::routes&, seastar::sharded<auth::service>&)’:
/home/kefu/dev/scylladb/api/authorization_cache.cc:19:104: error: ‘using seastar::httpd::request = struct seastar::http::request’ is deprecated: Use http::request instead [-Werror=deprecated-declarations]
   19 |     httpd::authorization_cache_json::authorization_cache_reset.set(r, [&auth_service] (std::unique_ptr<request> req) -> future<json::json_return_type> {
      |                                                                                                        ^~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-07 14:03:42 +08:00
Botond Dénes
2f4a793457 reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict()
Instead of open-coding the same, in an incomplete way.
clear_inactive_reads() does incomplete eviction in severeal ways:
* it doesn't decrement _stats.inactive_reads
* it doesn't set the permit to evicted state
* it doesn't cancel the ttl timer (if any)
* it doesn't call the eviction notifier on the permit (if there is one)

The list goes on. We already have an evict() method that all this
correctly, use that instead of the current badly open-coded alternative.

This patch also enhances the existing test for clear_inactive_reads()
and adds a new one specifically for `stop()` being called while having
inactive reads.

Fixes: #13048

Closes #13049
2023-03-07 08:45:04 +03:00
Kefu Chai
cee597560a build: enable -Wdefaulted-function-deleted warning
in general, the more static analysis the merrier. with the updated
Seastar, which includes the commit of "core/sstring: define <=> operator
for sstring", all defaulted '<=> operator' which previously rely
on sstring's operator<=> will not be deleted anymore, so we can
enable `-Wdefaulted-function-deleted` now.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12861
2023-03-06 18:41:44 +02:00
Kefu Chai
020483aa59 Update seastar submodule and main
this change also includes change to main, to make this commit compile.
see below:

* seastar 9b6e181e42...9cbc1fe889 (46):
  > Merge 'Make io-tester jobs share sched classes' from Pavel Emelyanov
  > io_tester.md: Update the `rps` configuration option description
  > io_tester: Add option to limit total number of requests sent
  > Merge 'Keep outgoing queue all cancellable while negotiating (again)' from Pavel Emelyanov
  > io_tester: Add option to share classes between jobs
  > rpc: Abort connection if send_entry() fails
  > Merge 'build: build dpdk with `-fPIC` if BUILD_SHARED_LIBS' from Kefu Chai
  > build: cooking.sh: use the same BUILD_SHARED_LIBS when building ingredients
  > build: cooking.sh: use the same generator when building ingredients
  > core/memory: handle `strerror_r` returning static string
  > Merge 'build, rpc: lz4 related cleanups' from Kefu Chai
  > build, rpc: do not support lz4 < 1.7.3
  > build: set the correct version when finding lz4
  > build: include CheckSymbolExists
  > rpc: do not include lz4.h in header
  > build: set CMP0135 for Cooking.cmake
  > docs: drop building-*.md
  > Merge 'seastar-addr2line: cleanups' from Kefu Chai
  > seastar-addr2line: refactor tests using unittest
  > seastar-addr2line: extract do_test() and main()
  > seastar-addr2line: do not import unused modules
  > scheduling: add a `rename` callback to scheduling_group_key_config
  > reactor: syscall thread: wakeup up reactor with finer granularity
  > build: build dpdk with `-fPIC` if BUILD_SHARED_LIBS
  > build: extract dpdk_extra_cflags out
  > core/sstring: remove a temporary variable
  > Merge 'treewide: include what we use, and add a checkheaders target' from Kefu Chai
  > perftune.py: auto-select the same number of IRQ cores on each NUMA
  > prometheus: remove unused headers
  > core/sstring: define <=> operator for sstring
  > Merge 'core: s/reserve_additional_memory/reserve_additional_memory_per_shard/' from Kefu Chai
  > include: do not include <concepts> directly
  > coding_style: note on self-contained header requirement
  > circileci: build checkheaders in addition to default target
  > build: add checkheaders target
  > net/toeplitz: s/u_int/unsigned/
  > net/tcp-stack: add forward declaration for seastar::socket
  > core, net, util: include used headers

* main: set reserved memory for wasm on per-shard basis

  this change is a follow-up of
  f05d612da8 and
  4a0134a097.

  this change depends on the related change in Seastar to reserve
  additional memory on a per-shard basis.

  per Wojciech Mitros's comment:

  > it should have probably been 50MB per shard

  in other words, as we always execute the same set of udf on all
  shards. and since one cannot predict the number of shards, but she
  could have a rough estimation on the size of memory a regular (set
  of) udf could use. so a per-shard setting makes more sense.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-06 18:41:34 +02:00
Jan Ciolek
aa604bd935 cql3: preserve binary_operator.order in search_and_replace
There was a bug in `expr::search_and_replace`.
It doesn't preserve the `order` field of binary_operator.

`order` field is used to mark relations created
using the SCYLLA_CLUSTERING_BOUND.
It is a CQL feature used for internal queries inside Scylla.
It means that we should handle the restriction as a raw
clustering bound, not as an expression in the CQL language.

Losing the SCYLLA_CLUSTERING_BOUND marker could cause issues,
the database could end up selecting the wrong clustering ranges.

Fixes: #13055

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #13056
2023-03-06 16:28:06 +02:00
Kefu Chai
6b249dd301 utils: UUID: throw marshal_exception when fail to parse uuid
* throw marshal_exception if not the whole string is parsed, we
  should error out if the parsed string contains gabage at the end.
  before this change, we silent accept uuid like
  "ce84997b-6ea2-4468-9f02-8a65abf4wxyz", and parses it as
  "ce84997b-6ea2-4468-9f02-8a65abf4". this is not correct.
* throw marshal_exception if stoull() throws,
  `stoull()` throws if it fails to parse a string to an unsigned long
  long, we should translate the exception to `marshal_exception`, so
  we can handle these exception in a consistent manner.

test is updated accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13069
2023-03-06 12:59:41 +02:00
Pavel Emelyanov
1be9b0df50 system_keyspace: Unstatic get_truncation_record()
Now when both callers of this method are non-static, it can be made
non-static too. While at it make two more changes:

1. move the thing to private
2. remove explicit cql3::query_processor::cache_internal::yes argument,
   the system_keyspace::execute_cql() applies it on itw own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:40 +03:00
Pavel Emelyanov
109e032f61 system_keyspace: Unstatic get_truncated_at()
It's called from batchlog replayer which now has local system keyspace
reference and can use it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:40 +03:00
Pavel Emelyanov
1907518034 batchlog_manager: Add system_keyspace dependency
The manager will need system ks to get truncation record from, so add it
explicitly. Start-stop sequence no allows that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:40 +03:00
Pavel Emelyanov
40b762b841 main: Swap batchlog manager and system keyspace starts
The former needs the latter to get truncation records from and will thus
need it as explicit dependency. In order to have it bathlog needs to
start after system ks. This works as starting batchlog manager doesn't
do anything that's required by system keyspace. This is indirectly
proven by cql-test-env in which batchlog manager starts later than it
does in main

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:40 +03:00
Pavel Emelyanov
dcbe3e467b system_keyspace: Unstatic get_truncated_position()
It's called from commitlog replayer which has system keyspace instance
on board and can use it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:40 +03:00
Pavel Emelyanov
2501ba3887 system_keyspace: Remove unused method
The get_truncated_position() overload that filters records by shard is
nowadays unused. Drop one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:40 +03:00
Pavel Emelyanov
47b61389b5 commitlog: Create commitlog_replayer with system keyspace
The replayer code needs system keyspace to fetch truncation records
from, thus it needs this explicit dependency. By the time it runs system
keyspace is fully initialized already

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:28:36 +03:00
Kefu Chai
ac575d0b0e auth: use zero initialization
instead of passing '0' in the initializer list to do aggregate
initialization, just use zero initialization. simpler this way.

also, this helps to silence a `-Wmissing-braces` warning, like

```
/home/kefu/dev/scylladb/auth/passwords.cc:21:43: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
static thread_local crypt_data tlcrypt = {0, };
                                          ^
                                          {}
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13060
2023-03-06 12:28:10 +02:00
Kefu Chai
36da27f2e0 sstables: generation_type: do not specialize to_sstring
because `seastar::to_sstring()` defaults to `fmt::format_to()`. so
any type which is supported by `fmt::formatter()` is also supported
by `seastar::to_sstring()`. and the behavior of existing implementation
is exactly the same as the defaulted one.

so let's drop the specialization and let
`fmt::formatter<sstables::generation_type>` do its job.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13070
2023-03-06 12:18:00 +02:00
Pavel Emelyanov
6f9924ff44 test: Make cql_test_env::get_system_keyspace() return sharded
It now returns sys_ks.local(), but next patch would need the whole
sharded reference

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:17:21 +03:00
Pavel Emelyanov
73ab1bd74b commiltlog: Line-up field definitions
Just a cosmetic change, so that next patch adding a new member to the
class looks nice

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-03-06 13:15:27 +03:00
Alejo Sanchez
eaed778f4a test/cql-pytest: print driver version
Print driver version for cql-pytest tests.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #12840
2023-03-06 11:31:26 +02:00
Botond Dénes
4919b2f956 Merge 'cmake: sync with configure.py (9/n)' from Kefu Chai
- build: cmake: find ANTLR3 before using it
- build: cmake: define FMT_DEPRECATED_OSTREAM
- build: cmake: add include directory for lua
- build: cmake: link redis against db

Closes #13071

* github.com:scylladb/scylladb:
  build: cmake: add more tests
  build: cmake: find and link against RapidJSON
  build: cmake: link couple libraries as whole archive
  build: cmake: find ANTLR3 before using it
  build: cmake: define FMT_DEPRECATED_OSTREAM
  build: cmake: add include directory for lua
  build: cmake: link redis against db
2023-03-06 08:52:13 +02:00
Avi Kivity
97f315cc29 Merge 'build: reenable disabled warnings' from Kefu Chai
in general, the more static analysis the merrier. these warnings were previously added to silence warnings from Clang and/or GCC, but since we've addressed all of them, let's reenable them to detect potential issues early.

Closes #13063

* github.com:scylladb/scylladb:
  build: reenable disabled warnings
  test: lib: do not return a local reference
  dht: incremental_owned_ranges_checker: use lower_bound()
  types: reimplement in terms of a variable template
  query_id: extract into new header
  test/cql-pytest: test for CLUSTERING ORDER BY verification in MV
  test/cql-pytest: allow "run-cassandra" without building Scylla
  build: reenable unused-{variable,lambda-capture} warnings
  test: reader_concurrency_semaphore_test: define target_memory in debug mode
  flat_mutation_reader_test: cleanup, seastar::async -> SEASTAR_THREAD_TEST_CASE
  make_nonforwardable: test through run_mutation_source_tests
  make_nonforwardable: next_partition and fast_forward_to when single_partition is true
  make_forwardable: fix next_partition
  flat_mutation_reader_v2: drop forward_buffer_to
  nonforwardable reader: fix indentation
  nonforwardable reader: refactor, extract reset_partition
  nonforwardable reader: add more tests
  nonforwardable reader: no partition_end after fast_forward_to()
  nonforwardable reader: no partition_end after next_partition()
  nonforwardable reader: no partition_end for empty reader
  api::failure_detector: mark set_phi_convict_threshold unimplemented
  test: memtable_test: mark dummy variable for loop [[maybe_unused]]
  idl-compiler: mark captured this used
  raft: reference this explicitly
  util/result_try: reference this explicitly
  sstables/sstables: mark dummy variable for loop [[maybe_unused]]
  treewide: do not define/capture unused variables
  service: storage_service: clear _node_ops in batch
  cql-pytest: add tests for sum() aggregate
  build: cmake: extract mutation,db,replica,streaming out
  build: cmake: link the whole auth
  build: cmake: extract thrift out
  build: cmake: expose scylla_gen_build_dir from "interface"
  build: cmake: find libxcrypt before using it
  build: cmake: find Thrift before using it
  build: cmake: support thrift < 0.11.0
  test/cql-pytest: move aggregation tests to one file
  Revert "Revert "storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops""
  storage_service: Wait for normal state handler to finish in replace
  storage_service: Wait for normal state handler to finish in bootstrap
  row_cache: pass partition_start though nonforwardable reader
  doc: fix the version in the comment on removing the note
  doc: specify the versions where Alternator TTL is no longer experimental
2023-03-05 17:37:33 +02:00
Kefu Chai
6742493a94 build: reenable disabled warnings
in general, the more static analysis the merrier. these warnings
were previously disabled to silence warnings from Clang and/or GCC,
but since we've addressed all of them, let's reenable them to
detect potential issues early.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-05 17:37:33 +02:00
Kefu Chai
fe80b5e0d0 test: lib: do not return a local reference
the type of return value of `get_table_views()` is a reference, so we
cannot return a reference to a temporary value.

in this change, a member variable is added to hold the _table_schema,
so it can outlive the function call.

this should silence following warning from Clang:
```
  test/lib/expr_test_utils.cc:543:16: error: returning reference to local temporary object [-Werror,-Wreturn-stack-address]
          return {view_ptr(_table_schema)};
                 ^~~~~~~~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-05 17:37:33 +02:00
Kefu Chai
11124ee972 build: cmake: add more tests
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-04 13:11:25 +08:00
Kefu Chai
eeb8553305 build: cmake: find and link against RapidJSON
despite that RapidJSON is a header-only library, we still need to
find it and "link" against it for adding the include directory.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-04 13:11:25 +08:00
Kefu Chai
c5d1a69859 build: cmake: link couple libraries as whole archive
turns out we are using static variables to register entries in
global registries, and these variables are not directly referenced,
so linker just drops them when linking the executables or shared
libraries. to address this problem, we just link the whole archive.
another option would be create a linker script or pass
--undefined=<symbol> to linker. neither of them is straightforward.

a helper function is introduced to do this, as we cannot use CMake
3.24 as yet.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-04 13:11:25 +08:00
Kefu Chai
58f13dfa0a build: cmake: find ANTLR3 before using it
if ANTLR3's header files are not installed into the /usr/include, or
other directories searched by compiler by default. there are chances,
we cannot build the tree. so we have to find it first. as /opt/scylladb
is the directory where `scylla-antlr35-c++-dev` is installed on
debian derivatives, this directory is added so the find package module
can find the header files.

```
In file included from /home/kefu/dev/scylla/db/legacy_schema_migrator.cc:38:
In file included from /home/kefu/dev/scylla/cql3/util.hh:21:
/home/kefu/dev/scylla/build/cmake/cql3/CqlParser.hpp:55:10: fatal error: 'antlr3.hpp' file not found
         ^~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-04 13:11:25 +08:00
Kefu Chai
914ba1329d build: cmake: define FMT_DEPRECATED_OSTREAM
otherwise the tree would file to compile with fmt v9.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-04 13:11:25 +08:00
Kefu Chai
b6a927ce3f build: cmake: add include directory for lua
otherwise there are chances the compiler cannot find the
lua header(s).

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-04 13:11:25 +08:00
Kefu Chai
e72321f873 build: cmake: link redis against db
otherwise, we'd have
```
In file included from /home/kefu/dev/scylla/redis/keyspace_utils.cc:19:
In file included from /home/kefu/dev/scylla/db/query_context.hh:14:
In file included from /home/kefu/dev/scylla/cql3/query_processor.hh:24:
In file included from /home/kefu/dev/scylla/lang/wasm_instance_cache.hh:19:
/home/kefu/dev/scylla/lang/wasm.hh:14:10: fatal error: 'rust/wasmtime_bindings.hh' file not found
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-04 13:11:25 +08:00
Anna Stuchlik
4b71f87594 doc: Update the documentation landing page
This commit makes the following changes to the docs landing page:

- Adds the ScyllaDB enterprise docs as one of three tiles.

- Modifies the three tiles to reflect the three flavors of ScyllaDB.

- Moves the "New to ScyllaDB? Start here!" under the page title.

- Renames "Our Products" to "Other Products" to list the products other
  than ScyllaDB itself. In addtition, the boxes are enlarged from to
  large-4 to look better.

The major purpose of this commit is to expose the ScyllaDB
documentation.

docs: fix the link

Closes #13065
2023-03-03 15:48:30 +02:00
Botond Dénes
fb898d214c Merge 'Shard major compaction task' from Aleksandra Martyniuk
Implementation of task_manager's task that covers major keyspace compaction
on one shard.

Closes #12662

* github.com:scylladb/scylladb:
  test: extend major keyspace compaction tasks test
  compaction: create task manager's task for major keyspace compaction on one shard
2023-03-02 15:06:31 +02:00
Botond Dénes
91d64372db Merge 'cmake: sync with configure.py (8/n)' from Kefu Chai
- build: cmake: extract more subsystem out into its own CMakeLists.txt
- build: cmake: remove swagger_gen_files
- build: cmake: remove stale TODO comments
- build: cmake: expose scylla_gen_build_dir
- build: cmake: link against cryptopp
- build: cmake: add missing source to utils
- build: cmake: move lib sources into test-lib
- build: cmake: add test/perf

Closes #13059

* github.com:scylladb/scylladb:
  build: cmake: add expr_test test
  build: cmake: allow test to specify the sources
  build: cmake: add test/perf
  build: cmake: move lib sources into test-lib
  build: cmake: add missing source to utils
  build: cmake: link against cryptopp
  build: cmake: expose scylla_gen_build_dir
  build: cmake: remove stale TODO comments
  build: cmake: remove swagger_gen_files
  build: cmake: extract more subsystem out into its own CMakeLists.txt
2023-03-02 14:22:35 +02:00
Botond Dénes
e70be47276 Merge 'commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off' from Calle Wilund
Fixes #12810

We did not update total_size_on_disk in commitlog totals when use o_dsync was off.
This means we essentially ran with no registered footprint, also causing broken comparisons in delete_segments.

Closes #12950

* github.com:scylladb/scylladb:
  commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off
  commitlog: change type of stored size
2023-03-02 12:39:11 +02:00
Botond Dénes
1b5f8916d6 Merge 'Generalize sstable::move_to_new_dir() method' from Pavel Emelyanov
This method requires callers to remember that the sstable is the collection of files on a filesystem and to know what exact directory they are all in. That's not going to work for object storage, instead, sstable should be moved between more abstract states.

This PR replaces move_to_new_dir() call with the change_state() one that accepts target sub-directory string and moves files around. Currently supported state changes:

* staging -> normal
* upload -> normal | staging
* any -> quarantine

All are pretty straightforward and move files between table basedir subdirectories with the exception that upload -> quarantine should move into upload/quarantine subdirectory. Another thing to keep in mind, that normal state doesn't have its subdir but maps directory to table's base directory.

Closes #12648

* github.com:scylladb/scylladb:
  sstable: Remove explicit quarantization call
  test: Move move_to_new_dir() method from sstable class
  sstable, dist.-loader: Introduce and use pick_up_from_upload() method
  sstables, code: Introduce and use change_state() call
  distributed_loader: Let make_sstables_available choose target directory
2023-03-02 09:22:14 +02:00
Kefu Chai
1fe180ffbe build: cmake: add expr_test test
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 14:26:55 +08:00
Kefu Chai
29dc4b0da5 build: cmake: allow test to specify the sources
some tests are compiled from more source files, so add an extra
parameter, so they can customize the sources.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 14:26:55 +08:00
Kefu Chai
78773c2ebd build: cmake: add test/perf
due to circular dependency: the .cc files under the root of
project references the symbols defined by the source files under
subdirectories, but the source files under subdirectories also
reference the symbols defined by the .cc files under the root
of project, the targets in test/perf do not compile. but
the general structure is created.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 10:15:25 +08:00
Kefu Chai
a51c928e69 build: cmake: move lib sources into test-lib
less convoluted this way, so each target only includes the sources
in its own directory.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 10:15:25 +08:00
Kefu Chai
40fb6ff728 build: cmake: add missing source to utils
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 10:15:25 +08:00
Kefu Chai
074281c450 build: cmake: link against cryptopp
since we include cryptopp/ headers, we need find it and link against
it explicitly, instead of relying on seastar to do this.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 10:15:25 +08:00
Kefu Chai
167d018ca7 build: cmake: expose scylla_gen_build_dir
should have exposed the base directory of genereted headers, not
the one with "rust" component.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 10:15:25 +08:00
Kefu Chai
47a06e76a2 build: cmake: remove stale TODO comments
they have been addressed already.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 10:15:25 +08:00
Kefu Chai
1e040e0e12 build: cmake: remove swagger_gen_files
which has been moved into api/CMakeLists.txt

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 10:15:25 +08:00
Kefu Chai
563fbb2d11 build: cmake: extract more subsystem out into its own CMakeLists.txt
namely, cdc, compaction, dht, gms, lang, locator, mutation_writer, raft, readers, replica,
service, tools, tracing and transport.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 10:15:25 +08:00
Aleksandra Martyniuk
24edcd27d4 test: extend major keyspace compaction tasks test 2023-03-01 18:56:31 +01:00
Aleksandra Martyniuk
b188060535 compaction: create task manager's task for major keyspace compaction on one shard
Implementation of task_manager's task that covers major keyspace compaction
on one shard.
2023-03-01 18:56:26 +01:00
Tomasz Grabiec
2d935e255a storage_service: node ops: Extract node_ops_insert() to reduce code duplication 2023-03-01 18:43:13 +01:00
Tomasz Grabiec
d5021d5a1b storage_service: Make node operations safer by detecting asymmetric abort
This patch fixes a problem which affects decommission and removenode
which may lead to data consistency problems under conditions which
lead one of the nodes to unliaterally decide to abort the node
operation without the coordinator noticing.

If this happens during streaming, the node operation coordinator would
proceed to make a change in the gossiper, and only later dectect that
one of the nodes aborted during sending of decommission_done or
removenode_done command. That's too late, because the operation will
be finalized by all the nodes once gossip propagates.

It's unsafe to finalize the operation while another node aborted. The
other node reverted to the old topolgy, with which they were running
for some time, without considering the pending replica when handling
requests. As a result, we may end up with consistency issues. Writes
made by those coordinators may not be replicated to CL replicas in the
new topology. Streaming may have missed to replicate those writes
depending on timing.

It's possible that some node aborts but streaming succeeds if the
abort is not due to network problems, or if the network problems are
transient and/or localized and affect only heartbeats.

There is no way to revert after we commit the node operation to the
gossiper, so it's ok to close node_ops sessions before making the
change to the gossiper, and thus detect aborts and prevent later aborts
after the change in the gossiper is made. This is already done during
bootstrap (RBNO enabled) and replacenode. This patch canges removenode
to also take this approach by moving sending of remove_done earlier.

We cannot take this approach with decommission easily, because
decommission_done command includes a wait for the node to leave the
ring, which won't happen before the change to the gossiper is
made. Separating this from decommission_done would require protocol
changes. This patch adds a second-best solution, which is to check if
sessions are still there right before making a change to the gossiper,
leaving decommission_done where it was.

The race can still happen, but the time window is now much smaller.

Fixes #12989
Refs #12969
2023-03-01 18:43:13 +01:00
Kefu Chai
d85af3dca4 dht: incremental_owned_ranges_checker: use lower_bound()
instead of using a while loop for finding the lower_bound,
just use std::lower_bound() for finding if current node owns given
token. this has two advantages:

* better readability: as lower_bound is exactly what this loop
  calculates.
* lower_bound uses binary search for searching the element,
  this algorithm should be faster than linear under most
  circumstances.
* lower_bound uses std::advance() and prefix increment operator,
  this should be more performant than the postfix increment operator.
  as it does not create an temporary instance of iterator.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13008
2023-03-01 11:29:46 +02:00
Avi Kivity
3042deb930 types: reimplement in terms of a variable template
data_type_for() is a function template that converts a C++
type to a database dynamic type (data_type object).

Instead of implementing a function per type, implement a variable
template instance. This is shorter and nicer.

Since the original type variables (e.g. long_type) are defined separately,
use a reference instead of copying to avoid initialization order problems.

To catch misuses of data_type_for the general data_type_for_v variable
template maps to some unused tag type which will cause a build error
when instantiated.

The original motivation for this was to allow for partial
specialization of data_type_for() for tuple types, but this isn't
really workable since the native type for tuples is std::vector<data_value>,
not std::tuple, and I only checked this after getting the work done,
so this isn't helping anything; it's just a little nicer.

Closes #13043
2023-03-01 11:25:39 +02:00
Botond Dénes
d5dee43be7 Merge 'doc: specify the versions where Alternator TTL is no longer experimental' from Anna Stuchlik
This PR adds a note to the Alternator TTL section to specify in which Open Source and Enterprise versions the feature was promoted from experimental to non-experimental.

The challenge here is that OSS and Enterprise are (still) **documented together**, but they're **not in sync** in promoting the TTL feature: it's still experimental in 5.1 (released) but no longer experimental in 2022.2 (to be released soon).

We can take one of the following approaches:
a) Merge this PR with master and ask the 2022.2 users to refer to master.
b) Merge this PR with master and then backport to branch-5.1. If we choose this approach, it is necessary to backport https://github.com/scylladb/scylladb/pull/11997 beforehand to avoid conflicts.

I'd opt for a) because it makes more sense from the OSS perspective and helps us avoid mess and backporting.

Closes #12295

* github.com:scylladb/scylladb:
  doc: fix the version in the comment on removing the note
  doc: specify the versions where Alternator TTL is no longer experimental
2023-03-01 11:24:52 +02:00
Botond Dénes
92fde47261 Merge 'test/cql-pytest - aggregation tests' from Nadav Har'El
This small series reorganizes the existing functional tests for aggregation (min, max, count) and adds additional tests for sum reproducing the strange (but Cassandra-compatible) behavior described in issue #13027.

Closes #13038

* github.com:scylladb/scylladb:
  cql-pytest: add tests for sum() aggregate
  test/cql-pytest: move aggregation tests to one file
2023-03-01 11:02:08 +02:00
Avi Kivity
6822e3b88a query_id: extract into new header
query_id currently lives query-request.hh, a busy place
with lots of dependencies. In turn it gets pulled by
uuid.idl.hh, which is also very central. This makes
test/raft/randomized_nemesis_test.cc which is nominally
only dependent on Raft rebuild on random header file changes.

Fix by extracting into a new header.

Closes #13042
2023-03-01 10:25:25 +02:00
Botond Dénes
46efdfa1a1 Merge 'readers/nonforwarding: don't emit partition_end on next_partition,fast_forward_to' from Gusev Petr
The series fixes the `make_nonforwardable` reader, it shouldn't emit `partition_end` for previous partition after `next_partition()` and `fast_forward_to()`

Fixes: #12249

Closes #12978

* github.com:scylladb/scylladb:
  flat_mutation_reader_test: cleanup, seastar::async -> SEASTAR_THREAD_TEST_CASE
  make_nonforwardable: test through run_mutation_source_tests
  make_nonforwardable: next_partition and fast_forward_to when single_partition is true
  make_forwardable: fix next_partition
  flat_mutation_reader_v2: drop forward_buffer_to
  nonforwardable reader: fix indentation
  nonforwardable reader: refactor, extract reset_partition
  nonforwardable reader: add more tests
  nonforwardable reader: no partition_end after fast_forward_to()
  nonforwardable reader: no partition_end after next_partition()
  nonforwardable reader: no partition_end for empty reader
  row_cache: pass partition_start though nonforwardable reader
2023-03-01 09:58:14 +02:00
Botond Dénes
1c0b47ee9b Merge 'treewide: remove unused variable and reference used one explicitly' from Kefu Chai
- treewide: do not define/capture unused variables
- sstables/sstables: mark dummy variable for loop [[maybe_unused]]
- util/result_try: reference this explicitly
- raft: reference this explicitly
- idl-compiler: mark captured this used
- build: reenable unused-{variable,lambda-capture} warnings

Closes #12915

* github.com:scylladb/scylladb:
  build: reenable unused-{variable,lambda-capture} warnings
  test: reader_concurrency_semaphore_test: define target_memory in debug mode
  api::failure_detector: mark set_phi_convict_threshold unimplemented
  test: memtable_test: mark dummy variable for loop [[maybe_unused]]
  idl-compiler: mark captured this used
  raft: reference this explicitly
  util/result_try: reference this explicitly
  sstables/sstables: mark dummy variable for loop [[maybe_unused]]
  treewide: do not define/capture unused variables
  service: storage_service: clear _node_ops in batch
2023-03-01 09:44:37 +02:00
Nadav Har'El
363f326d49 test/cql-pytest: test for CLUSTERING ORDER BY verification in MV
Since commit 73e258fc34, Scylla has partial
verification for the CLUSTERING ORDER BY clause in CREATE MATERIALIZED
VIEW. Specifically, invalid column names are rejected. But for reasons
explained in issue #12936 and in the test in this patch, Cassandra
demands that if CLUSTERING ORDER BY appears it must list all the
clustering columns, with no duplicates, and do so in the right order.

This patch replaces an existing test which suggested it is fine
(an extention over Cassandra) to accept a partial list of clustering
columns, by a test that verifies that such a partial list, or an
incorrectly-ordered list, or list with duplicates, should be rejected.
The new test fails on Scylla, and passes on Cassandra, so marked as xfail.

Refs #12936.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12938
2023-03-01 08:02:39 +02:00
Botond Dénes
84e26ed9c3 Merge 'Enable RBNO by default' from Asias He
This pr fixes the seastar::rpc::closed_error error in the test_topology suite and enables RBNO by default.

Closes #12970

* github.com:scylladb/scylladb:
  Revert "Revert "storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops""
  storage_service: Wait for normal state handler to finish in replace
  storage_service: Wait for normal state handler to finish in bootstrap
2023-03-01 07:55:46 +02:00
Nadav Har'El
7dc54771e1 test/cql-pytest: allow "run-cassandra" without building Scylla
Before this patch, all scripts which use test/cql-pytest/run.py
looked for the Scylla executable as their first step. This is usually
the right thing to do, except in two cases where Scylla is *not* needed:

1. The script test/cql-pytest/run-cassandra.
2. The script test/alternator/run with the "--aws" option.

So in this patch we change run.py to only look for Scylla when actually
needed (the find_scylla() function is called). In both cases mentioned
above, find_scylla() will never get called and the script can work even
if Scylla was never built.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13010
2023-03-01 07:54:19 +02:00
Botond Dénes
eb10623dd2 Merge 'build: cmake: sync with configure.py (7/n)' from Kefu Chai
- build: cmake: support thrift < 0.11.0
- build: cmake: find Thrift before using it
- build: cmake: find libxcrypt before using it
- build: cmake: expose scylla_gen_build_dir from "interface"
- build: cmake: extract thrift out
- build: cmake: link the whole auth
- build: cmake: extract mutation,db,replica,streaming out

Closes #12990

* github.com:scylladb/scylladb:
  build: cmake: extract mutation,db,replica,streaming out
  build: cmake: link the whole auth
  build: cmake: extract thrift out
  build: cmake: expose scylla_gen_build_dir from "interface"
  build: cmake: find libxcrypt before using it
  build: cmake: find Thrift before using it
  build: cmake: support thrift < 0.11.0
2023-03-01 07:35:21 +02:00
Kefu Chai
f59542a01a build: reenable unused-{variable,lambda-capture} warnings
now that all -Wunused-{variable,lambda-capture} warnings are taken
care of. let's reenable these warnings so they can help us to identify
potential issues.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-01 10:45:18 +08:00
Kefu Chai
efe96e7fc6 test: reader_concurrency_semaphore_test: define target_memory in debug mode
otherwise we'd have following warning

```
test/boost/reader_concurrency_semaphore_test.cc:1380:20: error: unused variable 'target_memory' [-Werror,-Wunused-const-variable]
constexpr uint64_t target_memory = uint64_t(1) << 28; // 256MB
                     ^
1 error generated.`
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-01 10:45:18 +08:00
Kefu Chai
ffffcdb48a cql3: mark cf_name final
as `cf_name` is not derived from any class, it's viable
to mark it `final`.

this change is created to to silence the warning from Clang,
like:

```
/home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_LOCALE -DFMT_SHARED -DHAVE_LZ4_COMPRESS_DEFAULT -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=6 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/build/cmake -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -Wall -Werror -Wno-mismatched-tags -Wno-missing-braces -Wno-c++11-narrowing  -O0 -g -gz -std=gnu++20 -U_FORTIFY_SOURCE -DSEASTAR_SSTRING -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT CMakeFiles/scylla.dir/data_dictionary/data_dictionary.cc.o -MF CMakeFiles/scylla.dir/data_dictionary/data_dictionary.cc.o.d -o CMakeFiles/scylla.dir/data_dictionary/data_dictionary.cc.o -c /home/kefu/dev/scylladb/data_dictionary/data_dictionary.cc
In file included from /home/kefu/dev/scylladb/data_dictionary/data_dictionary.cc:9:
In file included from /home/kefu/dev/scylladb/data_dictionary/data_dictionary.hh:11:
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/optional:287:2: error: destructor called on non-final 'cql3::cf_name' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
        _M_payload._M_value.~_Stored_type();
        ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/optional:318:4: note: in instantiation of member function 'std::_Optional_payload_base<cql3::cf_name>::_M_destroy' requested here
          _M_destroy();
          ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/optional:439:57: note: in instantiation of member function 'std::_Optional_payload_base<cql3::cf_name>::_M_reset' requested here
      _GLIBCXX20_CONSTEXPR ~_Optional_payload() { this->_M_reset(); }
                                                        ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/optional:514:17: note: in instantiation of member function 'std::_Optional_payload<cql3::cf_name>::~_Optional_payload' requested here
      constexpr _Optional_base() = default;
                ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/optional:739:17: note: in defaulted default constructor for 'std::_Optional_base<cql3::cf_name>' first required here
      constexpr optional(nullopt_t) noexcept { }
                ^
/home/kefu/dev/scylladb/cql3/statements/raw/batch_statement.hh:37:28: note: in instantiation of member function 'std::optional<cql3::cf_name>::optional' requested here
            : cf_statement(std::nullopt)
                           ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/optional:287:23: note: qualify call to silence this warning
        _M_payload._M_value.~_Stored_type();
                             ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13039
2023-02-28 22:26:43 +02:00
Petr Gusev
1709a17c38 flat_mutation_reader_test: cleanup, seastar::async -> SEASTAR_THREAD_TEST_CASE 2023-02-28 23:42:44 +04:00
Petr Gusev
992ccb6255 make_nonforwardable: test through run_mutation_source_tests 2023-02-28 23:42:43 +04:00
Petr Gusev
989ef9d358 make_nonforwardable: next_partition and fast_forward_to when single_partition is true
This flag designates that we should consume only one
partition from the underlying reader. This means that
attempts to move to another partition should cause an EOS.
2023-02-28 23:42:34 +04:00
Petr Gusev
a67776b750 make_forwardable: fix next_partition
When next_partition is called, the buffer could
contain partition_start and possibly static_row.
In this case clear_buffer_to_next_partition will
not remove anything from the buffer and the
reader position should not change. Before this patch,
however, we used to set _end_of_stream=false,
which violated the forwardable-reader
contract - the data of the next partition
was emitted after the data of the first partition
without intermediate EOS.

This bug was found when debugging
test_make_nonforwardable_from_mutations_as_mutation_source flakiness.
A corresponding focused test_make_forwardable_next_partition
has been added to exercise this problem.
2023-02-28 23:11:45 +04:00
Petr Gusev
64427b9164 flat_mutation_reader_v2: drop forward_buffer_to
This is just a strange method I came across.
It effectively does nothing but clear_buffer().
2023-02-28 23:00:02 +04:00
Petr Gusev
a517e1d6ad nonforwardable reader: fix indentation 2023-02-28 23:00:02 +04:00
Petr Gusev
beeffb899f nonforwardable reader: refactor, extract reset_partition
No observable behaviour changes, just refactor
the code.
2023-02-28 23:00:02 +04:00
Petr Gusev
023ed0ad00 nonforwardable reader: add more tests
Add more test cases for completeness.
2023-02-28 23:00:02 +04:00
Petr Gusev
88cd1c3700 nonforwardable reader: no partition_end after fast_forward_to()
This patch fixes the problem with method fast_forward_to
which is similar to the one with next_partition, no
partition_end should be injected for the partition if
fast_forward_to was called inside it.
2023-02-28 23:00:02 +04:00
Petr Gusev
8ff96e1bce nonforwardable reader: no partition_end after next_partition()
Before the patch, nonforwardable reader injected
partition_end unconditionally. This caused problems
in case next_partition() was called, the downstream
reader might have already injected its own
partition_end marker, and the one from nonforwardable
reader was a duplicate.

Fixes: #12249
2023-02-28 23:00:02 +04:00
Petr Gusev
9c5c380b0b nonforwardable reader: no partition_end for empty reader
The patch introduces the _partition_is_open flag,
inject partition_end only if there was some data
in the input reader.

A simple unit test has been added for
the nonforwardable reader which checks this
new behaviour.
2023-02-28 22:59:56 +04:00
Wojciech Mitros
6d2e785b5c docs: update wasm.md
The WASM UDF implementation has changed since the last time the docs
were written. In particular, the Rust helper library has been
released, and using it should be the recommended method.
Some decisions that were only experimental at the start, were also
"set in stone", so we should refer to them as such.

The docs also contain some code examples. This patch adds tests for
these examples to make sure that they are not wrong and misleading.

Closes #12941
2023-02-28 20:59:25 +02:00
Kefu Chai
2434a4d345 utils: small_vector: define operator<=>
small_vector should be feature-wise compatible with std::vector<>,
let's add operator<=> for it.

also, there is not needd to define operator!=() explicitly, C++20
define this for us if operator==() is defined, so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13032
2023-02-28 20:04:22 +02:00
Benny Halevy
06a0902708 dht/range_streamer: stream_async: move ranges_to_stream to do_streaming
Currently the ranges_to_stream variable lives
on the caller state, and do_streaming() moves its
contents down to request_ranges/transfer_ranges
and then calls clear() to make it ready for reuse.

This works in principle but it makes it harder
for an occasional reader of this code to figure out
what going on.

This change transfers control of the ranges_to_stream vector
to do_streaming, by calling it with (std::exchange(do_streaming, {}))
and with that that moved vector doesn't need to be cleared by
do_streaming, and the caller is reponsible for readying
the variable for reuse in its for loop.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 17:38:34 +02:00
Benny Halevy
1392c7e1cf streaming: stream_session: maybe_yield
To prevent reactor stalls when freeing many/long
token range vectors.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 17:32:44 +02:00
Avi Kivity
20e1908c55 Merge 'treewide: use (defaulted) operator<=> when appropriate' from Kefu Chai
- db/view: use operator<=> to define comparison operators
- utils: UUID: use defaulted operator<=>
- db: schema_tables: use defaulted operator<=>
- cdc: generation: schema_tables: use defaulted operator<=>
- db::commitlog::replay_position: use defaulted operator<=>

Closes #13033

* github.com:scylladb/scylladb:
  db::commitlog::replay_position: use defaulted operator<=>
  cdc: generation: schema_tables: use defaulted operator<=>
  db: schema_tables: use defaulted operator<=>
  utils: UUID: use defaulted operator<=>
  db/view: use operator<=> to define comparison operators
2023-02-28 17:05:45 +02:00
Benny Halevy
c4836ab9e9 streaming: stream_session: prepare: move token ranges to add_transfer_ranges
Reduce copies on the path to calling add_transfer_ranges.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 17:04:47 +02:00
Benny Halevy
12eb3d210f streaming: stream_plan: transfer_ranges: move token ranges towards add_transfer_ranges
Rather than copying the ranges vector.

Note that add_transfer_ranges itself cannot simply move the ranges
since it copies them for multiple tables.

While at it, move also the keyspace and column_family strings.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 17:03:51 +02:00
Benny Halevy
775c6b9697 dht/range_streamer: stream_async: do_streaming: move ranges downstream
The ranges can be moved rather than copied to both
`request_ranges` and `transfer_ranges` as they are only cleared
after this point.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:56:55 +02:00
Benny Halevy
3cd8838a09 dht/range_streamer: add_ranges: clear_gently ranges_for_keyspace
After calling get_range_fetch_map, ranges_for_keyspace
is not used anymore.
Synchronously destroying it may potentially stall in large clusters
so use utils::clear_gently to gently clear the map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:52:30 +02:00
Benny Halevy
a80c2d16dd dht/range_streamer: get_range_fetch_map: reduce copies
Use const& to refer to the input ranges and endpoints
rather than copying them individually along the way
more than needed to.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:52:30 +02:00
Benny Halevy
9d6e5d50d1 dht/range_streamer: add_ranges: move ranges down-stream
Eliminate extraneous copy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:52:27 +02:00
Benny Halevy
c61f058aa5 dht/boot_strapper: move ranges to add_ranges
Eliminate extraneous copy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:50:40 +02:00
Benny Halevy
27b382dcce dht/range_streamer: stream_async: incrementally update _nr_ranges_remaining
Rather than calling nr_ranges_to_stream() inside `do_streaming`.
As nr_ranges_to_stream depends on the `_to_stream` that will be updated
only later on after the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:50:40 +02:00
Benny Halevy
c3c7efffb1 dht/range_streamer: stream_async: erase from range_vec only after do_streaming success
range_vec is used for calculating nr_ranges_to_stream.
Currently, the ranges_to_stream that were
moved out of range_vec are push back on exception,
but this isn't safe, since they may have moved already
to request_ranges or transfer_ranges.

Instead, erase the ranges we pass to do_streaming
only after it succeeds so on exception, range_vec
will not need adjusting.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:50:40 +02:00
Kefu Chai
7de2d1c714 api::failure_detector: mark set_phi_convict_threshold unimplemented
let it throw if "set_phi_convict_threshold" is called, as we never
populate the specified \Phi.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:56:55 +08:00
Kefu Chai
60eac12db6 test: memtable_test: mark dummy variable for loop [[maybe_unused]]
without C++23 `std::ranges::repeat_view`, it'd be cumbersume to
implement a loop without dummy variable. this change helps to
silence following warning:

```
test/boost/memtable_test.cc:1135:26: error: unused variable 'value' [-Werror,-Wunused-variable]
                for (int value : boost::irange<int>(0, num_flushes)) {
                         ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:56:55 +08:00
Kefu Chai
2caf9b4e1c idl-compiler: mark captured this used
sometime the captured `this` is used in the generated C++ code,
while some time it is not. to reenable `-Wunused-lambda-capture`
warning, let's mark this `this` as used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:56:55 +08:00
Kefu Chai
b926105eae raft: reference this explicitly
Clang complains that the captured `this` is not used, like
```
/home/kefu/dev/scylladb/raft/fsm.hh:644:21: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
    auto visitor = [this, from, msg = std::move(msg)](const auto& state) mutable {
                    ^
/home/kefu/dev/scylladb/raft/server.cc:738:11: note: in instantiation of function template specialization 'raft::fsm::step<raft::append_request>' requested here
    _fsm->step(from, std::move(append_request));
          ^
```
but `step(..)` is a non-static member function of `fsm`, so `this`
is actually used. to silence Clang's warning, let's just reference it
explicitly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:56:55 +08:00
Kefu Chai
5e7c8cc4b7 util/result_try: reference this explicitly
quote from Avi's comment

> It's supposed to be illegal to call handle(...) without this->,
> because handle() is a dependent name (but many compilers don't
> insist, gcc is stricter here). So two error messages competed,
> and "unused this capture" won.

without this change, Clang complains that `this` is not used with
`-Wunused-lambda-capture`.

in this change, `this` is used. in this change, `this` is explicitly
referenced to silence Clang's warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:56:55 +08:00
Kefu Chai
1171c326a9 sstables/sstables: mark dummy variable for loop [[maybe_unused]]
without C++23 `std::ranges::repeat_view`, it'd be cumbersume to
implement a loop without dummy variable

```
/home/kefu/dev/scylladb/sstables/sstables.cc:484:15: error: unused variable '_' [-Werror,-Wunused-variable]
    for (auto _ : boost::irange<key_type>(0, nr_elements)) {
              ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:56:55 +08:00
Kefu Chai
3ae11de204 treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:56:53 +08:00
Kefu Chai
be47874a42 service: storage_service: clear _node_ops in batch
before this change, _node_ops are cleared one after another in
`storage_service::node_ops_abort()` when `ops_uuid` is not specified.
but this

* is not efficient
* is not quite readable
* introduces an unused variable

so, in this change, we just clear it in batch. this should silence
a `-Wno-unused-variable` warning from Clang.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:52:25 +08:00
Nadav Har'El
130c090251 cql-pytest: add tests for sum() aggregate
This patch adds regression tests for the strange (but Cassandra-compatible)
behavior described in issue #13027 - that sum of no results returns 0
(not null or nothing), and if also asking for p, we get a null there too.

Refs #13027.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-02-28 15:35:21 +02:00
Botond Dénes
6b72f4a6fa Merge 'main: display descriptions of all tools' from Kefu Chai
- main: expose tools as a vector<>
- main: use a struct for representing tool
- main: track tools descriptin in tool struct
- main: add missing descriptions for tools
- main: move get_tools() into main()

Fixes #13026

Closes #13030

* github.com:scylladb/scylladb:
  main: move get_tools() into main()
  main: add missing descriptions for tools
  main: track tools descriptin in tool struct
  main: use a struct for representing tool
  main: expose tools as a vector<>
2023-02-28 15:32:11 +02:00
Kefu Chai
af3968bf6e build: cmake: extract mutation,db,replica,streaming out
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:28:46 +08:00
Kefu Chai
6f3a44cde9 build: cmake: link the whole auth
without this change, linker would like to remove the .o which is not
referenced by auther translation units. but we do use static variables
to, for instance, register classess to a global registry.

so, let's force the linker to include the whole archive.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:28:46 +08:00
Kefu Chai
3e75df6917 build: cmake: extract thrift out
also, move "interface" linkage from scylla to "thrift", because
it is "thrift" who is using "interface".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:28:46 +08:00
Kefu Chai
4bb0134f1d build: cmake: expose scylla_gen_build_dir from "interface"
as it builds headers like "gen/Cassandra.h", and the target
uses "interface" via these headers, so "interface" is obliged
to expose this include directory.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:28:46 +08:00
Kefu Chai
1aafeac023 build: cmake: find libxcrypt before using it
we should find libxcrypt library before using it. in this change,
Findlibxcrypt.cmake is added to find libxcrypt library.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:28:46 +08:00
Kefu Chai
607858db51 build: cmake: find Thrift before using it
we should find Thrift library before using it. in this change,
FindThrift.cmake is added to find Thrift library.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:28:46 +08:00
Kefu Chai
f30e7f9da1 build: cmake: support thrift < 0.11.0
define THRIFT_USES_BOOST if thrift < 0.11.0, see also #4538

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:28:46 +08:00
Nadav Har'El
e1f97715eb test/cql-pytest: move aggregation tests to one file
We had separate test files test_minmax.py and test_count.py but the
separate was artificial (and test_count.py even had one test using
min()). Now I that want to add another test for sum(), I don't know
where to put it. So in this patch I combine test_minmax.py and
test_count.py into one test file - test_aggregate.py, and we can
later add sum() tests in the same file.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-02-28 14:39:04 +02:00
Kefu Chai
67b334385c dist/redhat: specify version in Obsoletes:
to silence the warning from rpmbuild, like
```
RPM build warnings:
      line 202: It's not recommended to have unversioned Obsoletes: Obsoletes:	tuned
```

more specific this way. quote from the commit message of
303865d979 for the version number:

> tuned 2.11.0-9 and later writes to kerned.sched_wakeup_granularity_ns
> and other sysctl tunables that we so laboriously tuned, dropping
> performance by a factor of 5 (due to increased latency). Fix by
> obsoleting tuned during install (in effect, we are a better tuned,
> at least for us).

with this change, it'd be easier to identify potential issues when
building / packaging.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12721
2023-02-28 13:55:04 +02:00
Marcin Maliszkiewicz
bd7caefccf docs: link general repairs page to RBNO page
Information was duplicated before and the version on this page was outdated - RBNO is enabled for replace operation already.

Closes #12984
2023-02-28 13:04:32 +02:00
Tomasz Grabiec
fddd93da4e storage_service: node ops: Add error injections 2023-02-28 11:32:18 +01:00
Tomasz Grabiec
5c8ad2db3c service: node_ops: Make watchdog and heartbeat intervals configurable
Will be useful for writing tests which trigger failures, and for
warkarounds in production.
2023-02-28 11:31:55 +01:00
Kefu Chai
5bf6e9ba97 db::commitlog::replay_position: use defaulted operator<=>
the default generated operator<=> is exactly the same as the
handcrafted one. so let compiler do its job. also, since
operator<=> is defaulted, there is no need to define operator==
anymore, so drop it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:25:30 +08:00
Kefu Chai
aed681fa3c cdc: generation: schema_tables: use defaulted operator<=>
the default generated operator<=> is exactly the same as the
handcrafted one. so let compiler do its job. also, since
operator<=> is defaulted, there is no need to define operator==
anymore, so drop it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:25:30 +08:00
Kefu Chai
56c9c9d29e db: schema_tables: use defaulted operator<=>
the default generated operator<=> is exactly the same as the
handcrafted one. so let compiler do its job. also, since
operator<=> is defaulted, there is no need to define operator==
anymore, so drop it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:25:30 +08:00
Kefu Chai
9ec8b4844b utils: UUID: use defaulted operator<=>
the default generated operator<=> is exactly the same as the
handcrafted one. so let compiler do its job.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:25:30 +08:00
Kefu Chai
ab5d772d63 db/view: use operator<=> to define comparison operators
also, there is no need to define operator!=() if
operator==() is defined, so drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:25:30 +08:00
Kefu Chai
7550be1fc6 main: move get_tools() into main()
there is not need to have a dedicated function which is only consumed
by `main()`. so let's move the body of `get_tools()` into `main`. and
with this change, a plain C array would suffice. so just use a plain
array for tools.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:09:46 +08:00
Kefu Chai
128dbebb76 main: add missing descriptions for tools
Fixes #13026
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:09:46 +08:00
Kefu Chai
ef0dfeb2fa main: track tools descriptin in tool struct
so we can manage the tools in a more structured way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:09:46 +08:00
Kefu Chai
ffbbd59486 main: use a struct for representing tool
so we can encapsulate the description of a certain tool in this
struct with a more readable field name in comparison with a tuple<>,
if we want to track all tools in this vector.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:09:46 +08:00
Kefu Chai
73cf62469b main: expose tools as a vector<>
so, in addition to looking up a tool by the name in it, we will be
able to list all tools in this vector. this change paves the road to
a more general solution to handle `--list-tools`.

in this change

* `lookup_main_func()` is replaced by `get_tools()`.
* instead of checking `main_func` out of the if block,
  check it in the `if` block. as we already know if we have a matched
  tool in the `if` block, and we can early return right there.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:09:46 +08:00
Kefu Chai
991379bdb3 raft: broadcast_tables: remove unused asyncio mark
test_broadcast_kv_store does not use await or yield at all, so
there is no need to mark it with "asyncio" mark.

tested using
```
SCYLLA_HOME=$HOME/scylla build/cmake/scylla --overprovisioned --developer-mode=yes --consistent-cluster-management=true --experimental-features=broadcast-tables
...
pytest broadcast_tables/test_broadcast_tables.py
```

the test still passes.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13006
2023-02-28 11:05:15 +02:00
Asias He
8fb786997a Revert "Revert "storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops""
This reverts commit fd4ee4878a.
2023-02-28 09:00:13 +08:00
Asias He
5856e69462 storage_service: Wait for normal state handler to finish in replace
Similar to "storage_service: Wait for normal state handler to finish in
bootstrap", this patch enables the check on the replace procedure.
2023-02-28 09:00:13 +08:00
Asias He
53636167ca storage_service: Wait for normal state handler to finish in bootstrap
In storage_service::handle_state_normal, storage_service::notify_joined
will be called which drops the rpc connections to the node becomes
normal. This causes rpc calls with that node fail with
seastar::rpc::closed_error error.

Consider this:

- n1 in the cluster
- n2 is added to join the cluster
- n2 sees n1 is in normal status
- n2 starts bootstrap process
- notify_joined on n2 closes rpc connection to n1 in the middle of
  bootstrap
- n2 fails to bootstrap

For example, during bootstrap with RBNO, we saw repair failed in a
test that sets ring_delay to zero and does not wait for gossip to
settle.

repair - repair[9cd0dbf8-4bca-48fc-9b1c-d9e80d0313a2]: sync data for
keyspace=system_distributed_everywhere, status=failed:
std::runtime_error ({shard 0: seastar::rpc::closed_error (connection is
closed)})

This patch fixes the race by waiting for the handle_state_normal handler
to finish before the bootstrap process.

Fixes #12764
Fixes #12956
2023-02-28 09:00:13 +08:00
Kefu Chai
b6e4275511 configure.py: build and use libseastar.so in debug and dev modes
now that Seastar can be built as shared libraries, we can use it for
faster development iteration with less disk usage.

in this change

* configure.py:
  - 'build_seastar_shared_libs' is added as yet another mode value,
     so different modes have its own setting. 'debug' and 'dev' have
     this enabled, while other modes disable it.
  - link scylla with rpath specified, so it can find `libseastar.so`
    in build directory.
* install.sh: remove the rpath as the rpath in the elf image will
  not be available after the relocatable package is installed, also
  rpmbuild will error out when it uses check-rpaths to verify
  the elf images (executables and shared libraries), as the rpath
  encoded in them are not known ones. patchelf() will take care of
  the shared libraries linked by the executables. so we don't need
  to worry about libseastar.so or libseastar_testing.so.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12801
2023-02-27 21:08:34 +02:00
Kefu Chai
4f3bc915a6 cql-pytest: remove duplicated words in README.md
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13005
2023-02-27 17:28:32 +02:00
Nadav Har'El
3b32440993 test/cql-pytest: add regression test for UNSET key in insert
Recently, we overhauled the error handling of UNSET_VALUE in various
places where it is not allowed. This patch adds two more regression
tests for this error handling. Both tests pass on Scylla today, pass
on Cassandra, but fail on earlier Scylla (e.g., I tested 5.1.5):

The first test does INSERT into clustering key UNSET_VALUE.
An UNSET_VALUE is designed to skip part of the write - not an entire
write - so this attempt should fail - not silently be skipped.
The write indeed fails with an error on Cassandra, and on recent
Scylla, but silently did nothing in older Scylla which leads this
test to fail there.

The second test does the same thing with LWT (adding an "IF NOT EXISTS")
added to the insert. Scylla's failure here was even more spectacular -
it crashed (as reported in issue #13001) instead of silently skipping
the right. The test passes on Scylla today and on Cassandra, which
both report the failure cleanly.

Refs #13001.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13007
2023-02-27 17:20:22 +02:00
Petr Gusev
a46df5af63 row_cache: pass partition_start though nonforwardable reader
Now the nonforwardable reader unconditionally produces
a partition_end, even if the input reader was empty.
This is strange in itself, but it also hinders to
properly fix its next_partition() method, which is
our ultimate goal. So we are going to change this
and produce partition_end only if there were some
data in the stream. However, this makes a problem:
now we pop partition_start from the underlying reader
in autoupdating_underlying_reader::move_to_next_partition
and manually push it back to downstream readers
bypassing nonforwardable reader. This means if we
change the logic in nonforwardable reader as described
we will end up with partition_start without partition_end
in the downstream readers.

This patch rectifies this by making sure that
nonforwardable will see the initial partition_start.
We inject this partition_start just before the
nonforwardable reader, into delegating_reader.
This also makes the result type of
range_populating_reader::operator() a bit simpler,
we don't need to pass partition_start anymore.
2023-02-27 18:46:31 +04:00
Nadav Har'El
73e258fc34 materialized views: verify CLUSTERING ORDER BY clause
Cassandra is very strict in the CLUSTERING ORDER BY clause which it
allows when creating a materialized view - if it appears, it must
list all the clustering columns of the view. Scylla is less strict -
a subset of the clustering columns may be specified. But Scylla was
*too* lenient - a user could specify non-clustering columns and even
non-existent columns and Scylla would not fail the MV creation.
This patch fixes that - with it MV creation fails if anything besides
clustering columns are listed on CLUSTERING ORDER BY.

An xfailing test we had for this case no longer fails after this
patch so its xfail mark is removed. We also add a few more corner
cases to the tests.

This patch also fixs one C++ test which had exactly the error that this
patch detects - the test author tried to use the partition key, instead
of the clustering key, in CLUSTERING ORDER BY (this error had no effect
because the specified order, "asc", was the default anyway).

Fixes #10767

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12885
2023-02-27 15:09:42 +02:00
Kefu Chai
7fd303044e tools/schema_loader: drop unused functions
`load_one_schema()` and `load_schemas_from_file()` are dropped,
as they are neither used by `scylla-sstable` or tested by
`schema_loader_test.cc` . the latter tests `load_schemas()`, which
is quite the same as `load_one_schema_from_file()`, but is more
permissive in the sense that it allows zero schema or more than
one schema in the specified path.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13003
2023-02-27 13:03:05 +02:00
Avi Kivity
6f88dc8009 Merge 'Fix memory leaks caused by throwing reader_concurrency_semaphore::consume()' from Botond Dénes
Said method can now throw `std::bad_alloc` since aab5954. All call-sites should have been adapted in the series introducing the throw, but some managed to slip through because the oom unit test didn't run in debug mode. This series fixes the remaining unpatched call-sites and makes sure the test runs in debug mode too, so leaks like this are detected.

Fixes: #12767

Closes #12756

* github.com:scylladb/scylladb:
  test/boost/reader_concurreny_semaphore_test: run oom protection tests in debug mode
  treewide: adapt to throwing reader_concurrency_semaphore::consume()
2023-02-27 12:27:30 +02:00
Anna Stuchlik
91b611209f doc: fixes https://github.com/scylladb/scylladb/issues/12954, adds the minimal version from which the 2021.1-to-2022.1 upgrade is supported for Ubuntu, Debian, and image
Closes #12974
2023-02-27 12:15:49 +02:00
David Garcia
20bff2bd10 docs: Update ScyllaDB Enterprise link
Closes #12985
2023-02-27 08:39:50 +02:00
Anna Stuchlik
95ce2e8980 doc: fix the option name LWT_OPTIMIZATION_META_BIT_MASK
Fixes #12940.

Closes #12982

[avi: move fixes tag out of subject]
2023-02-26 19:51:20 +02:00
Avi Kivity
c863186dc5 Merge 'Fixes for docs/dev/building.md' from Kamil Braun
Closes #12071

* github.com:scylladb/scylladb:
  docs/dev: building.md: mention node-exporter packages
  docs/dev: building.md: replace `dev` with `<mode>` in list of debs
2023-02-26 19:27:33 +02:00
Kefu Chai
410035f03d abstract_replication_strategy: remove unnecessary virtual specifier
`effective_replication_map` is not a base class of any other class. so
there is no need to mark any of its member function as `virtual`. this
change should address following waring from Clang:

```
/home/kefu/dev/scylladb/seastar/include/seastar/core/shared_ptr.hh:205:9: error: delete called on non-final 'locator::effective_replication_map' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
        delete value_ptr;
        ^
/home/kefu/dev/scylladb/seastar/include/seastar/core/shared_ptr.hh:202:9: note: in instantiation of member function 'seastar::internal::lw_shared_ptr_accessors_esft<locator::effective_replication_map>::dispose' requested here
        dispose(static_cast<T*>(counter));
        ^
/home/kefu/dev/scylladb/seastar/include/seastar/core/shared_ptr.hh:317:27: note: in instantiation of member function 'seastar::internal::lw_shared_ptr_accessors_esft<locator::effective_replication_map>::dispose' requested here
            accessors<T>::dispose(_p);
                          ^
/home/kefu/dev/scylladb/locator/abstract_replication_strategy.hh:263:12: note: in instantiation of member function 'seastar::lw_shared_ptr<locator::effective_replication_map>::~lw_shared_ptr' requested here
    return make_lw_shared<effective_replication_map>(std::move(rs), std::move(tmptr), std::move(replication_map), replication_factor);
           ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12992
2023-02-26 19:16:28 +02:00
Kefu Chai
79d2eb1607 cql3: functions: validate arguments for 'token()' also
since "token()" computes the token for a given partition key,
if we pass the key of the wrong type, it should reject.

in this change,

* we validate the keys before returning the "token()" function.
* drop the "xfail" decorator from two of the tests. they pass
  now after this fix.
* change the tests which previously passed the wrong number of
  arguments containing null to "token()" and expect it to return
  null, so they verify that "token()" should reject these
  arguments with the expected error message.

Fixes #10448
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12991
2023-02-26 19:01:58 +02:00
Gleb Natapov
1ce7ad1ee6 lwt: do not destroy capture in upgrade_if_needed lambda since the lambda is used more then once
If on the first call the capture is destroyed the second call may crash.

Fixes: #12958

Message-Id: <Y/sks73Sb35F+PsC@scylladb.com>
2023-02-26 16:13:16 +02:00
Kefu Chai
f3e6c9168c sstables: generation_type: define fmt::formatter for generation_type
turns out what we need is a fmt::formatter<sstables::generation_type>
not operator<<(ostream&, sstables::generation_type), as its only use
case is the formatter used by seastar::format().

to specialize fmt::formatter<sstables::generation_type>

* allows us to be one step closer to drop `FMT_DEPRECATED_OSTREAM`
* allows us to customize the way how generation_type is printed by
  customizing the format specifier.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12983
2023-02-26 15:38:10 +02:00
Avi Kivity
8a0a784131 Merge 'utils: UUID: use default generated comparison operators' from Kefu Chai
- utils: UUID: define operator<=> for UUID
- utils: UUID: define operator==() only

Closes #12981

* github.com:scylladb/scylladb:
  utils: UUID: define operator==() only
  utils: UUID: define operator<=> for UUID
2023-02-26 15:31:46 +02:00
Piotr Smaroń
c1760af26c cql3: adding missing privileged on cache size eviction metric
Fixes #10463

Closes #12865
2023-02-26 14:33:46 +02:00
Kefu Chai
1c71151eda utils: UUID: define operator==() only
as, in C++20, compiler is able to generate the operator==() for us,
and the default generated one is identical to what we have now.

also, in C++20, operator!=() is generated by compiler if operator==()
is defined, so we can dispense with the former.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-25 09:36:11 +08:00
Kefu Chai
300e0b1d1c utils: UUID: define operator<=> for UUID
instead of the family of comparison operators, just define <=>. as
in C++20, compiler will define all six comparison operators for us.

in this change, the operator<=> is defined, so we can more compacted
code.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-25 09:36:11 +08:00
Asias He
ba919aa88a storage_service: Send heartbeat earlier for node ops
Node ops has the following procedure:

1   for node in sync_nodes
      send prepare cmd to node

2   for node in sync_nodes
      send heartbeat cmd to node

If any of the prepare cmd in step 1 takes longer than the heartbeat
watchdog timeout, the heartbeat in step 2 will be too late to update the
watchdog, as a result the watchdog will abort the operation.

To prevent slow prepare cmd kills the node operations, we can start the
heartbeat earlier in the procedure.

Fixes #11011
Fixes #12969

Closes #12980
2023-02-24 22:31:40 +01:00
Botond Dénes
61e67b865a Merge 'service:forward_service: use long type instead of counter in function mocking' from Michał Jadwiszczak
Aggregation query on counter column is failing because forward_service is looking for function with counter as an argument and such function doesn't exist. Instead the long type should be used.

Fixes: #12939

Closes #12963

* github.com:scylladb/scylladb:
  test:boost: counter column parallelized aggregation test
  service:forward_service: use long type when column is counter
2023-02-24 15:25:10 +02:00
Raphael S. Carvalho
d73ffe7220 sstables: Temporarily disable loading of first and last position metadata
It's known that reading large cells in reverse cause large allocations.
Source: https://github.com/scylladb/scylladb/issues/11642

The loading is preliminary work for splitting large partitions into
fragments composing a run and then be able to later read such a run
in an efficiency way using the position metadata.

The splitting is not turned on yet, anywhere. Therefore, we can
temporarily disable the loading, as a way to avoid regressions in
stable versions. Large allocations can cause stalls due to foreground
memory eviction kicking in.
The default values for position metadata say that first and last
position include all clustering rows, but they aren't used anywhere
other than by sstable_run to determine if a run is disjoint at
clustering level, but given that no splitting is done yet, it
does not really matter.

Unit tests relying on position metadata were adjusted to enable
the loading, such that they can still pass.

Fixes #11642.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12979
2023-02-24 12:14:18 +02:00
Michał Jadwiszczak
4c6675bf1a test:boost: counter column parallelized aggregation test 2023-02-24 10:24:23 +01:00
Michał Jadwiszczak
68d2e1fff8 service:forward_service: use long type when column is counter
Previously aggregations on counter columns were failing because
function mocking was looking for function with counter arguemnt,
which doesn't exist.
2023-02-24 10:24:16 +01:00
Botond Dénes
be232ff024 Merge 'Shard of shard repair task impl' from Aleksandra Martyniuk
Shard id is logged twice in repair (once explicitly, once added by logger).
Redundant occurrence is deleted.

shard_repair_task_impl::id (which contains global repair shard)
is renamed to avoid further confusion.

Fixes: #12955

Closes #12959

* github.com:scylladb/scylladb:
  repair: rename shard_repair_task_impl::id
  repair: delete redundant shard id from logs
2023-02-24 08:43:54 +02:00
Botond Dénes
80f653d65e Merge 'Major keyspace compaction task' from Aleksandra Martyniuk
Task manager task implementation that covers the major keyspace
compaction which can be start through /storage_service/keyspace_compaction/
api.

Closes #12661

* github.com:scylladb/scylladb:
  test: add test for major keyspace compaction tasks
  compaction: create task manager's task for major keyspace compaction
  compaction: copy run_on_existing_tables to task_manager_module.cc
  compaction: add major_compaction_task_impl
  compacition: add pure virtual compaction_task_impl
  compaction: add compaction module getter to compaction manager
2023-02-24 07:08:06 +02:00
Guy Shtub
c47b7c4cb2 Replacing user-group with community forum, added link to U. lesson on Spring Boot Fixed author/email details
Closes #12748
2023-02-23 19:05:26 +02:00
Aleksandra Martyniuk
e9f01c7cce test: add test for major keyspace compaction tasks 2023-02-23 15:48:25 +01:00
Aleksandra Martyniuk
159e603ac4 compaction: create task manager's task for major keyspace compaction
Implementation of task_manager's task covering major keyspace compaction
that can be started through storage_service api.
2023-02-23 15:48:05 +01:00
Aleksandra Martyniuk
6b1d7f5979 compaction: copy run_on_existing_tables to task_manager_module.cc
Copy run_on_existing_tables from api/storage_service.cc to
compaction/task_manager_module.cc
2023-02-23 15:31:59 +01:00
Anna Stuchlik
4dd1659d0b doc: fixes https://github.com/scylladb/scylladb/issues/12964, removes the information that the CDC options are experimental
Closes #12973
2023-02-23 15:06:53 +02:00
Kefu Chai
412953fdd5 compress, transport: do not detect LZ4_compress_default()
`LZ4_compress_default()` was introduced in liblz4 v1.7.3, despite
that the release note (https://github.com/lz4/lz4/releases/tag/v1.7.3)
of v1.7.3 didn't mention this. if we check the commit which added
this API, we can find all releases including it: see
```
$ git tag --contains 1b17bf2ab8cf66dd2b740eca376e2d46f7ad7041
lz4-r130
r129
r130
r131
rc129v0
v1.7.3
v1.7.4
v1.7.4.2
v1.7.5
v1.8.0
v1.8.1
v1.8.1.2
v1.8.2
v1.8.3
v1.9.0
v1.9.1
v1.9.2
v1.9.3
v1.9.4
```

and v1.7.3 was released in Nov 17, 2016. some popular distros
releases also package new enough liblz4:

- fedora 35 ships lz4-devel 1.9.3,
- CentOS 7 ships lz4-devel 1.8.3
- debian 10 ships liblz4-dev 1.8.3
- ubuntu 18.04 ships liblz4-dev r131

so, in this change, we drop the support of liblz4 < 1.7.3 for better
code readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12971
2023-02-23 14:39:20 +02:00
Pavel Emelyanov
0959739216 sstables: Remove always-false sstable_writer_config::leave_unsealed
It was used in sstables streaming code up until e5be3352 (database,
streaming, messaging: drop streaming memtables) or nearby, then the
whole feature was reworked.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12967
2023-02-23 12:50:06 +01:00
Botond Dénes
624d176b3b Merge 'Refine usage of sstable_test_env::reusable_sst() method' from Pavel Emelyanov
Some test cases can be made a bit more compact by using the sugar provided by the aforementioned sugar

Closes #12965

* github.com:scylladb/scylladb:
  test: Make use of reusable_sst default format
  tests: Use reusable_sst() where applicable
2023-02-23 12:50:06 +01:00
Botond Dénes
a5979c0662 Merge 'treewide: remove invalid defaulted move ctor' from Kefu Chai
- test/boost/chunked_vector_test: remove defaulted exception_safe_class's move ctor
- tools/scylla-sstable: remove defaulted move ctor
- sstables/mx/partition_reversing_data_source: remove defaulted move ctor
- cql3/statements/truncate_statement: remove defaulted move ctor

Closes #12914

* github.com:scylladb/scylladb:
  test/boost/chunked_vector_test: remove defaulted exception_safe_class's move ctor
  tools/scylla-sstable: remove defaulted move ctor
  sstables/mx/partition_reversing_data_source: remove defaulted move ctor
  cql3/statements/truncate_statement: remove defaulted move ctor
2023-02-23 12:50:05 +01:00
Avi Kivity
665429d85b cql3: remove assignment_testable::test_all
Was replaced with cql3::expr::test_assignment_all().

Closes #12951
2023-02-23 12:50:05 +01:00
Botond Dénes
0c756af137 Merge 'build: cmake: sync with configure.py (6/n)' from Kefu Chai
- build: cmake: correct linker flags
- build: cmake: enable boost tests only if BUILD_TESTING
- build: cmake: reuse test-lib library
- build: cmake: extract redis out

Closes #12961

* github.com:scylladb/scylladb:
  build: cmake: extract interface out
  build: cmake: extract redis out
  build: cmake: reuse test-lib library
  build: cmake: enable boost tests only if BUILD_TESTING
  build: cmake: correct linker flags
2023-02-23 12:50:05 +01:00
Aleksandra Martyniuk
d889a599e8 repair: rename shard_repair_task_impl::id
shard_repair_task_impl::id stores global repair id. To avoid confusion
with the task id, the field is renamed to global_repair_id.
2023-02-23 11:29:00 +01:00
Aleksandra Martyniuk
f7c88edec5 repair: delete redundant shard id from logs
In repair shard id is logged twice. Delete repeated occurence.
2023-02-23 11:25:18 +01:00
Pavel Emelyanov
5b311bb724 test: Make use of reusable_sst default format
The sstable_test_env::reusable_sst() has default value for the format
argument. Patch the test cases that don't use one while at it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-22 17:04:10 +03:00
Pavel Emelyanov
7aabffff19 tests: Use reusable_sst() where applicable
The reusable_sst() is intented to be used to load the pre-existing
sstable from the test/resources directory and .load() them. Some test
cases, however, still do it "by hand".

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-22 17:03:15 +03:00
Kefu Chai
5b3fd57c25 build: cmake: extract interface out
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-22 18:35:11 +08:00
Kefu Chai
64879fb6f7 build: cmake: extract redis out
and move `redis/protocol_parser.rl` related rules into `redis`, as
it is a file used for the implementation of redis.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-22 18:35:11 +08:00
Kefu Chai
43d9055b89 build: cmake: reuse test-lib library
it already includes the necessary bits used by test-perf, so let's
just link the latter to the former.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-22 18:35:11 +08:00
Kefu Chai
d07b649791 build: cmake: enable boost tests only if BUILD_TESTING
BUILD_TESTING is an option exposed by CTest module, so let's
include CTest module, and check if BUILD_TESTING is enabled
before include boost based tests.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-22 18:35:11 +08:00
Kefu Chai
59698cc495 build: cmake: correct linker flags
s/sha/sha1/. turns out 867b58c62c
failed to include the latest change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-22 18:35:11 +08:00
Aleksandra Martyniuk
b908369e85 compaction: add major_compaction_task_impl
All major compaction tasks will share some methods like
type or abort. The common part of the tasks should be
inherited from major_compaction_task_impl.
2023-02-22 09:52:04 +01:00
Aleksandra Martyniuk
be101078a0 compacition: add pure virtual compaction_task_impl
Add compaction_task_impl that is a pure virtual class from which
all compaction tasks implementations will inherit.
2023-02-22 09:51:57 +01:00
Pavel Emelyanov
f51762c72a headers: Refine view_update_generator.hh and around
The initial intent was to reduce the fanout of shared_sstable.hh through
v.u.g.hh -> cql_test_env.hh chain, but it also resulted in some shots
around v.u.g.hh -> database.hh inclusion.

By and large:
- v.u.g.hh doesn't need database.hh
- cql_test_env.hh doesn't need v.u.g.hh (and thus -- the
  shared_sstable.hh) but needs database.hh instead
- few other .cc files need v.u.g.hh directly as they pulled it via
  cql_test_env.hh before
- add forward declarations in few other places

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12952
2023-02-22 09:32:30 +02:00
Botond Dénes
e183dc4345 Merge 'Wrap sstable directory scan state in components_lister' from Pavel Emelyanov
The sstable_directory now combines two activities:
* scans the list of files in /var/lib/data and generates sstable-s object from it
* maintains the found sstable-s throughout necessary processing (populate/reshard/reshape)

The former part is in fact storage-specific. If sstables are on a filesystem, then it should be scanned with listdir, there can be dangling files, like temp-TOC, pending deletion log and comonents not belonging to any TOCs. If sstables are on some other storage, then this part should work some other way.

Said that, the sstable_directory is to be split into two pieces -- lister and "processing state". The latter would (may?) require renaming the sstable_directory into something more relevant, but that's huge and intrusive change. For now, just collect the lister stuff in one place.

Closes #12843

* github.com:scylladb/scylladb:
  sstable_directory: Keep lister internals private
  sstable_directory: Move most of .commit_directory_changes() on lister
  sstable_directory: Remove temporary aliases
  sstable_directory: Move most of .process_sstable_dir() on lister
  sstable_directory: Move .handle_component() to components_lister
  sstable_directory: Keep files_for_removal on scan_state
  sstable_directory: Keep components_lister aboard
  sstable_directory: Keep scan_state on components_lister
2023-02-22 08:10:04 +02:00
Calle Wilund
97881091d3 commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off
Fixes #12810

We did not update total_size_on_disk in commitlog totals when use o_dsync was off.
This means we essentially ran with no registered footprint, also causing broken
comparisons in delete_segments.
2023-02-21 16:35:23 +00:00
Calle Wilund
64102780fe commitlog: Use static (reused) regex for (left over) descriptor parse
Refs #11710

Allows reusing regex for segment matching (for opening left-over segments after crash).
Should remove any stalls caused by commitlog replay preparation.

v2: Add unit test for descriptor parsing

Closes #12112
2023-02-21 18:34:04 +02:00
Botond Dénes
ef548e654d types: unserialize_value for multiprecision_int,bool: don't read uninitialized memory
Check the first fragment before dereferencing it, the fragment might be
empty, in which case move to the next one.
Found by running range scan tests with random schema and random data.

Fixes: #12821
Fixes: #12823
Fixes: #12708

Closes #12824
2023-02-21 17:39:18 +02:00
Tomasz Grabiec
c8e2bf1596 db: schema_tables: Optimize schema merge
Currently, applying a schema change on a replica works like this:

  Collect all affected keyspaces from incoming mutations
  Read current state of schema
  Apply the mutations
  Read new state of schema

The "Read ... state of schema" step reads all kinds of schema
objects. In particular, to read the "table" objects, it does the
following:

  for every affected keyspace k:
       read all mutations from system_schema.tables for k
       extract all existing table names from those mutations
       for every existing table:
             read mutations from {tables, columns, indexes, view_virtual_columns, ...} for that table

As you can see, the number of reads performed is O(nr tables in a
keyspace), not O(nr tables in a change). This means that making a
sequence of schema changes, like adding a table, is quadratic.

Another aspect which magnifies this is that we don't read those tables
using a single scan, but issue individual queries for each table
separately.

This patch optimizes this by considering only affected tables when
reading schema for the purpose of diff calculation.

When mutations contain multi-table deletions, we still read the
set of tables, like before. This could be optimized by looking
at the database to get the list, but it's not part of the patch.

I tested this using a test case provided by Kamil (kbr-scylla@53fe154)

  ./test.py --mode debug test_many_schema_changes -s

The test bootstraps a cluster and then creates about 40 schema
changes. Then a new node is bootstrapped and replays those changes via
group0.

In debug mode, each change takes roughly 2s to process before the
patch, and 0.5s after the patch.

The whole replay is reduced to 56% of what was before:

Before (1m19s) :

INFO  2023-01-20 19:44:35,848 [shard 0] raft_group0 - setup_group0: ensuring that the cluster has fully upgraded to use Raft...
INFO  2023-01-20 19:45:54,844 [shard 0] raft_group0 - setup_group0: waiting for peers to synchronize state...

After (45s):

INFO  2023-01-20 22:02:51,869 [shard 0] raft_group0 - setup_group0: ensuring that the cluster has fully upgraded to use Raft...
INFO  2023-01-20 22:03:36,834 [shard 0] raft_group0 - setup_group0: waiting for peers to synchronize state...

Closes #12592

Closes #12592
2023-02-21 17:26:57 +02:00
Calle Wilund
6f972ee68b commitlog: change type of stored size
known_size() is technically not a size_t.
2023-02-21 15:26:02 +00:00
Pavel Emelyanov
abab4d446d sstable: Remove explicit quarantization call
Now all callers are patched to use new change_state() call, so it can be
removed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 17:44:55 +03:00
Pavel Emelyanov
bbf192e775 test: Move move_to_new_dir() method from sstable class
There's a bunch of test cases that check how moving sstables files
around the filesystem works. These need the generic move_to_new_dir()
method from sstable, so move it there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 17:42:18 +03:00
Pavel Emelyanov
bb0140531e sstable, dist.-loader: Introduce and use pick_up_from_upload() method
When "uploading" an sstable scylla uses a short-cut -- the sstable's
files are to be put into upload/ subdir by the caller, then scylla just
pulls them in in the cheapest way possible -- by relinking the files.

When this happens sstable also changes its generation, which is the only
place where this happens at all. For object storage uploading is not
going to be _that_ simple, so for now add an fs-specific method to pick
up an sstable from upload dir with the intent to generalize it (if
possible) when object-storage uploading appears.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 17:40:00 +03:00
Pavel Emelyanov
8a061bd862 sstables, code: Introduce and use change_state() call
The call moves the sstable to the specified state.

The change state is translated into the storage driver state change
which is for todays filesystem storage means moving between directories.
The "normal" state maps to the base dir of the table, there's no
dedicated subdir for this state and this brings some trouble into the
play.

The thing is that in order to check if an sstable is in "normal" state
already its impossible to compare filename of its path to any
pre-defined values, as tables' basdirs are dynamic. To overcome this,
the change-state call checks that the sstable is in one of "known"
sub-states, and assumes that it's in normal state otherwise.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 17:39:34 +03:00
Pavel Emelyanov
e67751ee92 distributed_loader: Let make_sstables_available choose target directory
When sstables are loaded from upload/ subdir, the final step is to move
them from this directory into base or staging one. The uploading code
evaluates the target directory, then pushes it down the stack towards
make_sstables_available() method.

This patch replaces the path argument with bool to_staging one. The
goal is to remove the knowlege of exact sstable location (nowadays --
its files' path) from the distributed loader and keep it in sstable
object itself. Next patches will make full use of this change.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 17:23:59 +03:00
Botond Dénes
763fe54637 Merge 'build: cmake: sync with configure.py (5/n) ' from Kefu Chai
- build: cmake: build release.cc as a library
- build: cmake: link alternator against cql3
- build: cmake: link scylla against xxHash::xxhash
- build: cmake: use lld or gold as linker if available

Closes #12942

* github.com:scylladb/scylladb:
  build: cmake: use lld or gold as linker if available
  build: cmake: link scylla against xxHash::xxhash
  build: cmake: link alternator against cql3
  build: cmake: build release.cc as a library
2023-02-21 16:19:24 +02:00
Pavel Emelyanov
41d65daa29 sstables: Remove dangling ready future from .close_files()
Was left unnoticed while 7c7eb81a ('Encapsulate filesystem access by
sstable into filesystem_storage subsclass')

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12946
2023-02-21 15:47:55 +02:00
Pavel Emelyanov
398f7704dc sstable_directory: Keep lister internals private
Now the lister procides two-calls API to the user -- process and commit.
The rest can and should be marked as private.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 16:44:50 +03:00
Pavel Emelyanov
e6941d0baa sstable_directory: Move most of .commit_directory_changes() on lister
Committing any changes made while scanning the storage is
storage-specific. Just like .process() was moved on lister, the
.commit() now does the same.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 16:44:49 +03:00
Pavel Emelyanov
70d6bfc109 sstable_directory: Remove temporary aliases
Previous patches created a bunch of local aliases-references in
components_lister::process(). This patch just removes those aliases, no
functional changes are made here.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 16:42:24 +03:00
Pavel Emelyanov
c4037270a3 sstable_directory: Move most of .process_sstable_dir() on lister
Processing storage with sstable files/objects is storage-specific. The
components_lister is the right components to handle it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 16:42:24 +03:00
Pavel Emelyanov
4c4aeba9b6 sstable_directory: Move .handle_component() to components_lister
This method is in charge of collecting a found file on scan_state, it
logically belogs to the components_lister and its internals.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 16:42:24 +03:00
Pavel Emelyanov
58f4076117 sstable_directory: Keep files_for_removal on scan_state
This list is the list of on-disk files, which is the property of
filesystem scan state. When committing directory changes (read: removing
those files) the list can be moved-from the state.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 16:42:23 +03:00
Pavel Emelyanov
df5384cb1e sstable_directory: Keep components_lister aboard
The lister is supposed to be alive throughout .process_sstable_dir() and
can die after .commit_directory_changes().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 16:32:06 +03:00
Pavel Emelyanov
5d98e34c16 sstable_directory: Keep scan_state on components_lister
The scan_state keeps the state of listing directory with sstables. It
now lives on the .process_sstable_dir() stack, but it can as well live
on the lister itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 16:32:06 +03:00
Kamil Braun
318f1f64c2 docs: update pygments dependency version
Closes #12949
2023-02-21 13:06:39 +02:00
Botond Dénes
372ac57c96 Merge 'doc: remove the incorrect information about IPs from the Restore page' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/12945

This PR removes the incorrect information and updates the link to the relevant page in the Manager docs.

Closes #12947

* github.com:scylladb/scylladb:
  doc: update the link to the Restore page in the ScyllaDB Manager documentation
  doc: remove the wrong info about IPs from the note on the Restore page
2023-02-21 12:30:31 +02:00
Kamil Braun
d56c060b4e Merge 'various raft fixes' from Gleb Natapov
The series fixes a race in case of a leader change while
add_entry_on_leader is sleeping and an abort during raft shutdown.

* '12863-fix-v1' of github.com:scylladb/scylla-dev:
  raft: abort applier fiber when a state machine aborts
  raft: fix race in add_entry_on_leader that may cause incorrect log length accounting
2023-02-21 10:57:04 +01:00
Anna Stuchlik
d743146313 doc: update the link to the Restore page in the ScyllaDB Manager documentation 2023-02-21 10:30:02 +01:00
Anna Stuchlik
1e85df776f doc: remove the wrong info about IPs from the note on the Restore page 2023-02-21 10:24:06 +01:00
Pavel Emelyanov
3f88d3af62 Merge 'test_shed_too_large_request fix: disable compression' from Gusev Petr
The test relies on exact request size, this doesn't work if compression is applied. The driver enables compression only if both the server and the client agree on the codec to use. If compression package
(e.g. lz4) is not installed, the compression
is not used.

The trick with locally_supported_compressions is needed since I couldn't find any standard means to disable compression other than the compression flag
on the cluster object, which seemed too broad.

fixes: #12836

Closes #12854

* github.com:scylladb/scylladb:
  test_shed_too_large_request: clarify the comments
  test_shed_too_large_request: use smaller test string
  test_shed_too_large_request fix: disable compression
2023-02-21 10:35:59 +03:00
Kefu Chai
867b58c62c build: cmake: use lld or gold as linker if available
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-21 14:24:18 +08:00
Kefu Chai
69b1e7651e build: cmake: link scylla against xxHash::xxhash
instead of adding `XXH_PRIVATE_API` to compile definitions, link
scylla against xxHash::xxhash, which provides this definition for us.

also move the comment on `XXH_PRIVATE_API` into `FindxxHash.cmake`,
where this definition is added.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-21 14:24:18 +08:00
Kefu Chai
0fffd34be8 build: cmake: link alternator against cql3
otherwise we'd have
```
In file included from /home/kefu/dev/scylladb/alternator/executor.cc:37:
/home/kefu/dev/scylladb/cql3/util.hh:21:10: fatal error: 'cql3/CqlParser.hpp' file not found
         ^~~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-21 14:24:18 +08:00
Kefu Chai
957403663f build: cmake: build release.cc as a library
so we can attach compiling definitions in a simpler way.

this change is based on Botond Dénes's change which gives an overhaul
to the existing CMake building system.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-21 14:23:04 +08:00
Botond Dénes
d7b6cf045f Merge 'build: cmake: sync with configure.py (4/n)' from Kefu Chai
- build: cmake: link cql3 against wasmtime_bindings
- build: cmake: output rust binding headers in expected dir
- build: cmake: link auth against cql3

Closes #12927

* github.com:scylladb/scylladb:
  build: cmake: link auth against cql3
  build: cmake: output rust binding headers in expected dir
  build: cmake: link cql3 against wasmtime_bindings
2023-02-20 12:46:15 +01:00
Botond Dénes
3c30531202 Merge 'test: mutation_test: Fix sporadic failure due to continuity mismatch' from Tomasz Grabiec
In test_v2_apply_monotonically_is_monotonic_on_alloc_failures we
generate mutations with non-full continuity, so we should pass
is_evictable::yes to apply_monotonically(). Otherwise, it will assume
fully-continuous versions and not try to maintain continuity by
inserting sentinels.

This manifested in sporadic failures on continuity check.

Fixes #12882

Closes #12921

* github.com:scylladb/scylladb:
  test: mutation_test: Fix sporadic failure due to continuity mismatch
  test: mutation_test: Fix copy-paste mistake in trace-level logging
2023-02-20 12:46:15 +01:00
Pavel Emelyanov
273999b9fa sstable: Mark version and format members const
These two are indeed immutable throughout the object lifetime

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12918
2023-02-20 12:46:15 +01:00
Kefu Chai
adbcc3db8f dist/debian: drop unused Makefile variable
this change was previously reverted by
cbc005c6f5 . it turns out this change
was but the offending change. so let's resurrect it.

`job` was introduced back in 782ebcece4,
so we could consume the option specified in DEB_BUILD_OPTIONS
environmental variable. but now that we always repackage
the artifacts prebuilt in the relocatable package. we don't build
them anymore when packaging debian packages. see
9388f3d626 . and `job` is not
passed to `ninja` anymore.

so, in this change, `job` is removed from debian/rules as well, as
it is not used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12924
2023-02-20 12:46:15 +01:00
Nadav Har'El
328cdb2124 cql-pytest: translate Cassandra's tests for compact tables
This is a translation of Cassandra's CQL unit test source file
validation/operations/CompactStorageTest.java into our cql-pytest
framework.

This very large test file includes 86 tests for various types of
operations and corner cases of WITH COMPACT STORAGE tables.

All 86 tests pass on Cassandra (except one using a deprecated feature
that needs to be specially enabled). 30 of the tests fail on Scylla
reproducing 7 already-known Scylla issues and 7 previously-unknown issues:

Already known issues:

Refs #3882: Support "ALTER TABLE DROP COMPACT STORAGE"
Refs #4244: Add support for mixing token, multi- and single-column
            restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Refs #8627: Cleanly reject updates with indexed values where value > 64k

New issues:

Refs #12471: Range deletions on COMPACT STORAGE is not supported
Refs #12474: DELETE prints misleading error message suggesting
             ALLOW FILTERING would work
Refs #12477: Combination of COUNT with GROUP BY is different from
             Cassandra in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column
Refs #12526: Support filtering on COMPACT tables
Refs #12749: Unsupported empty clustering key in COMPACT table
Refs #12815: Hidden column "value" in compact table isn't completely hidden

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12816
2023-02-20 12:46:15 +01:00
Raphael S. Carvalho
fbeee8b65d Optimize load-and-stream
load-and-stream implements no policy when deciding which SSTables will go in
each streaming round (batch of 16 SSTables), meaning the choice is random.

It can take advantage of the fact that the LSM-tree layout, with ICS and LCS,
is a set of SSTable runs, where each run is composed of SSTables that are
disjoint in their key range.

By sorting SSTables to be streamed by their first key, the effect is that
SSTable runs will be incrementally streamed (in token order).

SSTable runs in the same replica group (or in the same node) will have their
content deduplicated, reducing significantly the amount of data we need to
put on the wire. The improvement is proportional to the space amplification
in the table, which again, depends on the compaction strategy used.

Another important benefit is that the destination nodes will receive SSTables
in token order, allowing off-strategy compaction to be more efficient.

This is how I tested it:

1) Generated a 5GB dataset to a ICS table.
2) Started a fresh 2-node cluster. RF=2.
3) Ran load-and-stream against one of the replicas.

BEFORE:

$ time curl -X POST "http://127.0.0.1:10000/storage_service/sstables/keyspace1?cf=standard1&load_and_stream=true"

real	4m40.613s
user	0m0.005s
sys	0m0.007s

AFTER:

$ time curl -X POST "http://127.0.0.1:10000/storage_service/sstables/keyspace1?cf=standard1&load_and_stream=true"

real	2m39.271s
user	0m0.005s
sys	0m0.004s

That's ~1.76x faster.

That's explained by deduplication:

BEFORE:

INFO  2023-02-17 22:59:01,100 [shard 0] stream_session - [Stream #79d3ce7a-ea47-4b6e-9214-930610a18ccd] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3445376, received_partitions=2755835
INFO  2023-02-17 22:59:41,491 [shard 0] stream_session - [Stream #bc6bad99-4438-4e1e-92db-b2cb394039c8] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3308288, received_partitions=2836491
INFO  2023-02-17 23:00:20,585 [shard 0] stream_session - [Stream #e95c4f49-0a2f-47ea-b41f-d900dd87ead5] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3129088, received_partitions=2734029
INFO  2023-02-17 23:00:49,297 [shard 0] stream_session - [Stream #255cba95-a099-4fec-a72c-f87d5cac2b1d] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2544128, received_partitions=1959370
INFO  2023-02-17 23:01:33,110 [shard 0] stream_session - [Stream #96b5737e-30c7-4af8-a8b8-96fecbcbcbd0] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3624576, received_partitions=3085681
INFO  2023-02-17 23:02:20,909 [shard 0] stream_session - [Stream #3185a48b-fb9e-4190-88f4-5c7a386bc9bd] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3505024, received_partitions=3079345
INFO  2023-02-17 23:03:02,039 [shard 0] stream_session - [Stream #0d2964dc-d5e3-4775-825c-97f736d14713] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2808192, received_partitions=2655811

AFTER:

INFO  2023-02-17 23:12:49,155 [shard 0] stream_session - [Stream #bf00963c-3334-4035-b1a9-4b3ceb7a188a] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2965376, received_partitions=1006535
INFO  2023-02-17 23:13:13,365 [shard 0] stream_session - [Stream #1cd2e3ac-a68b-4cb5-8a06-707e91cf59db] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3543936, received_partitions=1406157
INFO  2023-02-17 23:13:37,474 [shard 0] stream_session - [Stream #5a278230-6b4b-461f-8396-c15df7092d03] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3639936, received_partitions=1371298
INFO  2023-02-17 23:14:02,132 [shard 0] stream_session - [Stream #19f40dc3-e02a-4321-a917-a6590d99dd03] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3638912, received_partitions=1435386
INFO  2023-02-17 23:14:26,673 [shard 0] stream_session - [Stream #d47507eb-2067-4e8f-a4f7-c82d5fbd4228] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3561600, received_partitions=1423024
INFO  2023-02-17 23:14:49,307 [shard 0] stream_session - [Stream #d42ee911-253a-4de6-ac89-6a3c05b88d66] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2382592, received_partitions=1452656
INFO  2023-02-17 23:15:10,067 [shard 0] stream_session - [Stream #1f78c1bf-8e20-41bd-95de-16de3fc5f86c] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2632320, received_partitions=1252298

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20230219191924.37070-1-raphaelsc@scylladb.com>
2023-02-20 12:46:14 +01:00
guy9
917e085919 Update manager-monitoring-integration.rst
Changing default manager from 56090 to 5090
@amnonh please review
@annastuchlik please change if other locations in Docs require this change

Closes #12682
2023-02-20 12:46:14 +01:00
Avi Kivity
6d5c242651 Update tools/java submodule (hdrhistogram failure with Java 11)
* tools/java f0bab7af66...ab0a613fdc (1):
  > Fix cassandra-stress -log hdrfile=... with java 11
2023-02-20 12:46:14 +01:00
Aleksandra Martyniuk
4f67c0c36a compaction: add compaction module getter to compaction manager 2023-02-20 11:19:29 +01:00
Kefu Chai
df63e2ba27 types: move types.{cc,hh} into types
they are part of the CQL type system, and are "closer" to types.
let's move them into "types" directory.

the building systems are updated accordingly.

the source files referencing `types.hh` were updated using following
command:

```
find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} +
```

the source files under sstables include "types.hh", which is
indeed the one located under "sstables", so include "sstables/types.hh"
instea, so it's more explicit.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12926
2023-02-19 21:05:45 +02:00
Tzach Livyatan
f97a23a9e3 Add a warnining: altering a service level timeout doesn't affect existing connections
Closes #12928
Refs #12923
2023-02-19 14:49:23 +02:00
Kefu Chai
ee97c332d9 test/boost/chunked_vector_test: remove defaulted exception_safe_class's move ctor
because it has a member variable whose type is a reference. and a
reference cannot be reassigned. this silences following warning from Clang:

```
/home/kefu/dev/scylladb/test/boost/chunked_vector_test.cc:152:27: error: explicitly defaulted move assignment operator is implicitly deleted [-Werror,-Wdefaulted-function-deleted]
    exception_safe_class& operator=(exception_safe_class&&) = default;
                          ^
/home/kefu/dev/scylladb/test/boost/chunked_vector_test.cc:132:31: note: move assignment operator of 'exception_safe_class' is implicitly deleted because field '_esc' is of reference type 'exception_safety_checker &'
    exception_safety_checker& _esc;
                              ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-19 12:58:22 +08:00
Kefu Chai
2bb61b8c18 tools/scylla-sstable: remove defaulted move ctor
```
/home/kefu/dev/scylladb/tools/scylla-sstable.cc:2301:9: error: explicitly defaulted move constructor is implicitly deleted [-Werror,-Wdefaulted-function-deleted]
        impl(impl&&) = default;
        ^
/home/kefu/dev/scylladb/tools/scylla-sstable.cc:2291:16: note: move constructor of 'impl' is implicitly deleted because field '_reader' has an inaccessible move constructor
        reader _reader;
               ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-19 12:57:40 +08:00
Kefu Chai
cca9b7c4cd sstables/mx/partition_reversing_data_source: remove defaulted move ctor
as partition_reversing_data_source_impl has indirectly a member variable which
a member of reference type. this should addres following warning from
Clang:

```
/home/kefu/dev/scylladb/sstables/mx/partition_reversing_data_source.cc:476:43: error: explicitly defaulted move assignment operator is implicitly deleted [-Werror,-Wdefaulted-function-deleted]
    partition_reversing_data_source_impl& operator=(partition_reversing_data_source_impl&&) noexcept = default;
                                          ^
/home/kefu/dev/scylladb/sstables/mx/partition_reversing_data_source.cc:365:19: note: move assignment operator of 'partition_reversing_data_source_impl' is implicitly deleted because field '_schema' is of reference type 'const schema &'
    const schema& _schema;
                  ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-19 12:57:40 +08:00
Kefu Chai
958f8bf79f cql3/statements/truncate_statement: remove defaulted move ctor
```
/home/kefu/dev/scylladb/cql3/statements/truncate_statement.hh:29:5: error: explicitly defaulted move constructor is implicitly deleted [-Werror,-Wdefaulted-function-deleted]
    truncate_statement(truncate_statement&&) = default;
    ^
/home/kefu/dev/scylladb/cql3/statements/truncate_statement.hh:25:39: note: move constructor of 'truncate_statement' is implicitly deleted because field '_attrs' has a deleted move constructor
    const std::unique_ptr<attributes> _attrs;
                                      ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:523:7: note: 'unique_ptr' has been explicitly marked deleted here
      unique_ptr(const unique_ptr&) = delete;
      ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-19 12:57:40 +08:00
Kefu Chai
6803f38a7a build: cmake: link auth against cql3
as auth headers references cql3

```
In file included from /home/kefu/dev/scylladb/auth/authenticator.cc:16:
In file included from /home/kefu/dev/scylladb/cql3/query_processor.hh:24:
/home/kefu/dev/scylladb/lang/wasm_instance_cache.hh:20:10: fatal error: 'rust/cxx.h' file not found
         ^~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-19 12:46:51 +08:00
Kefu Chai
a2668f8ba8 build: cmake: output rust binding headers in expected dir
we include rust binding headers like `rust/wasmtime_bindings.hh`.
so they should be located in directory named "rust".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-19 12:46:51 +08:00
Kefu Chai
494ed41a54 build: cmake: link cql3 against wasmtime_bindings
as it references headers provided by wasmtime_bindings:

```
In file included from /home/kefu/dev/scylladb/cql3/functions/user_function.cc:9:
In file included from /home/kefu/dev/scylladb/cql3/functions/user_function.hh:16:
/home/kefu/dev/scylladb/lang/wasm.hh:14:10: fatal error: 'rust/wasmtime_bindings.hh' file not found
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-19 12:46:51 +08:00
Gleb Natapov
941407b905 database: fix do_apply_many() to handle empty array of mutations
Currently the code will assert because cl pointer will be null and it
will be null because there is no mutations to initialize it from.
Message-Id: <20230212144837.2276080-3-gleb@scylladb.com>
2023-02-17 22:58:22 +01:00
Yaron Kaikov
a4e08ee48a Revert "dist/debian: bump up debhelper compatibility level to 10"
This reverts commit 75eaee040b.

Since it's causing a regression preventing from Scylla service to start in deb OS

Fixes: #12738

Closes #12897
2023-02-17 17:34:12 +02:00
Michał Chojnowski
e88f590eda sstables: partition_index_cache: clean up an unused type alias
`list_ptr` is a type alias that isn't used in any meaningful way. Remove it.

Closes #10978
2023-02-17 17:58:26 +03:00
Tomasz Grabiec
2ae8f74cec test: mutation_test: Fix sporadic failure due to continuity mismatch
In test_v2_apply_monotonically_is_monotonic_on_alloc_failures we
generate mutations with non-full continuity, so we should pass
is_evictable::yes to apply_monotonically(). Otherwise, it will assume
fully-continuous versions and not try to maintain continuity by
inserting sentinels.

This manifested in sporadic failures on continuity check.

Fixes #12882
2023-02-17 14:43:32 +01:00
Tomasz Grabiec
22063713d7 test: mutation_test: Fix copy-paste mistake in trace-level logging 2023-02-17 14:42:47 +01:00
Botond Dénes
f62e62f151 Merge 'build: cmake: sync with configure.py (3/n)' from Kefu Chai
* build: cmake: add test
* build: cmake: expose the bridged rust library
* build: cmake: correct library path
* build: cmake: add missing source files
* build: cmake: put generated sources into ${scylla_gen_build_dir}
* build: cmake: silence -Wuninitialized warning
* build: cmake: extract idl library out
* build: cmake: ignore -Wparentheses-equality

Closes #12893

* github.com:scylladb/scylladb:
  build: cmake: add unit tests
  build: cmake: extract sstables out
  build: cmake: extract auth and schema
  build: utils: extract utils out
  build: cmake: link Boost::regex with ICU::i18n
  build: cmake: add test
  build: cmake: expose the bridged rust library
  build: cmake: correct library path
  build: cmake: add missing source files
  build: cmake: put generated sources into ${scylla_gen_build_dir}
  build: cmake: silence -Wuninitialized warning
  build: cmake: extract idl library out
  build: cmake: ignore -Wparentheses-equality
2023-02-17 13:13:01 +02:00
Kefu Chai
05ecc3f1c9 build: cmake: add unit tests
this change is based on Botond Dénes's change which gave an overhaul
to the original CMake building system. this change is not enough
to build tests with CMake, as we still need to sort out the
dependencies.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:41:40 +08:00
Kefu Chai
f76a169025 build: cmake: extract sstables out
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:41:40 +08:00
Kefu Chai
f3714f1706 build: cmake: extract auth and schema
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:41:40 +08:00
Kefu Chai
3e481c9d15 build: utils: extract utils out
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:41:39 +08:00
Kefu Chai
4d7ae07e9e build: cmake: link Boost::regex with ICU::i18n
it turns out Boost::regex references ICU::i18n, but it fails to
bring the linkage to its public interface. so let's do this on behalf
of it.

```
: && /home/kefu/.local/bin/clang++ -Wall -Werror -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-missing-braces -Wno-overloaded-virtual -Wno-parentheses-equality -Wno-unsupported-friend -march=westmere  -O0 -g -gz  CMakeFiles/scylla.dir/absl-flat_hash_map.cc.o CMakeFiles/$
ld.lld: error: undefined symbol: icu_67::Collator::createInstance(icu_67::Locale const&, UErrorCode&)
>>> referenced by icu.hpp:56 (/usr/include/boost/regex/icu.hpp:56)
>>>               CMakeFiles/scylla.dir/utils/like_matcher.cc.o:(boost::re_detail_107500::icu_regex_traits_implementation::icu_regex_traits_implementation(icu_67::Locale const&))
>>> referenced by icu.hpp:61 (/usr/include/boost/regex/icu.hpp:61)
>>>               CMakeFiles/scylla.dir/utils/like_matcher.cc.o:(boost::re_detail_107500::icu_regex_traits_implementation::icu_regex_traits_implementation(icu_67::Locale const&))
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:39:44 +08:00
Kefu Chai
02de9f1833 build: cmake: add test
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:39:44 +08:00
Kefu Chai
f5750859f7 build: cmake: expose the bridged rust library
so that scylla can be linked against it when it is linked with
wasmtime_bindings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:39:44 +08:00
Kefu Chai
7569424d86 build: cmake: correct library path
it encodes the profile in it. so, in this change, the used profile
is added in the path.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:39:44 +08:00
Kefu Chai
affebc35be build: cmake: add missing source files
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:39:43 +08:00
Kefu Chai
c0824c6c25 build: cmake: put generated sources into ${scylla_gen_build_dir}
to be aligned with the convention of configure.py

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:38:44 +08:00
Kefu Chai
db8a2c15fa build: cmake: silence -Wuninitialized warning
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:38:44 +08:00
Kefu Chai
7b431748a8 build: cmake: extract idl library out
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:38:44 +08:00
Kefu Chai
d89602c6a2 build: cmake: ignore -Wparentheses-equality
antlr3 generates code like `((foo == bar))`. but Clang does not
like it. let's disable this warning. and explore other options later.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-17 18:38:44 +08:00
Avi Kivity
7fc7cbd3bf build: nix: switch to non-static zstd
When we added zstd (f14e6e73bb), we used the static library
as we used some experimental APIs. However, now the dynamic
library works, so apparently the experimenal API is now standard.

Switch to the dynamic library. It doesn't improve anything, but it
aligns with how we do things.

Closes #12902
2023-02-17 10:29:34 +02:00
Avi Kivity
ae3489382e build: nix: update clang
Clang 15 is now packaged by Nix, so use it.

Closes #12901
2023-02-17 10:26:44 +02:00
Kefu Chai
50f68fe475 test/perf: do not brace interger with {}
`int_range::make_singular()` accepts a single `int` as its parameter,
so there is no need to brace the paramter with `{}`. this helps to silence
the warning from Clang, like:

```
/home/kefu/dev/scylladb/test/perf/perf_fast_forward.cc:1396:63: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init]
            check_no_disk_reads(test(int_range::make_singular({100}))),
                                                              ^~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12903
2023-02-17 10:24:24 +02:00
Botond Dénes
2b1f10a41c Merge 'doc: add a KB about the new tombstones compaction process in ICS' from Anna Stuchlik
Fixes https://github.com/scylladb/scylla-docs/issues/4140

This PR adds a new Knowledge Base article about improved garbage collection in ICS. It's based on the document created by @raphaelsc https://docs.google.com/document/d/1fA7uBcN9tgxeHwCbWftPJz071dlhucoOYO1-KJeOG8I/edit?usp=sharing.

@raphaelsc Could you review it? I've made some improvements to the language and text organization, but I didn't add or remove any content, so it should be a quick review.

@tzach requested a diagram, but we can add it later. It would be great to have this content published asap.

Closes #12792

* github.com:scylladb/scylladb:
  doc: add the new KB to the list of topics
  doc: add a new KB article about timbstone garbage collection in ICS
2023-02-17 10:20:01 +02:00
Aleksandra Martyniuk
5d826f13e7 api: move get_and_update_ttl to task manager api
Task ttl can be set with task manager test api, which is disabled
in release mode.

Move get_and_update_ttl from task manager test api to task manager
api, so that it can be used in release mode.

Closes #12894
2023-02-17 10:19:06 +02:00
Piotr Smaroń
d2bfe124ad doc: fix command invoking tests
The developer documentation from `building.md` suggested to run unit tests with `./tools/toolchain/dbuild test` command, however this command only invokes `test` bash tool, which immediately returns with status `1`:
```
[piotrs@new-host scylladb]$ ./tools/toolchain/dbuild test
[piotrs@new-host scylladb]$ echo $?
1
```
This was probably unintended mistake and what author really meant was invoking `dbuild ninja test`.

Closes #12890
2023-02-17 10:16:33 +02:00
Botond Dénes
0961a3f79b test/boost/reader_concurreny_semaphore_test: run oom protection tests in debug mode
Said tests require on being run with a limited amount of memory to be
really useful. When the memory amount is unexpected, they silently exit.
Which is exactly what they did in debug mode too, where the amount of
memory available cannot be controlled.
Disable the check in debug mode.
2023-02-17 00:46:56 -05:00
Botond Dénes
1a9fdebb49 treewide: adapt to throwing reader_concurrency_semaphore::consume()
Said method can now throw `std::bad_alloc` since aab5954. All call-sites
should have been adapted in the series introducing the throw, but some
managed to slip through because the oom unit test didn't run in debug
mode. In this commit the remaining unpatched call-sites are fixed.
2023-02-17 00:46:56 -05:00
Avi Kivity
e2f6e0b848 utils: move hashing related files to utils/ module
Closes #12884
2023-02-17 07:19:52 +02:00
Kefu Chai
2f0cb9e68f db/virtual_table: mark the dtor of base class virtual
as `my_result_collector` has virtual function, and its dtor is not
marked virtual, Clang complains. let's mark its base class virtual
to be on the safe side.

```
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on non-final 'my_result_collector' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
        delete __ptr;
        ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<my_result_collector>::operator()' requested here
          get_deleter()(std::move(__ptr));
          ^
/home/kefu/dev/scylladb/db/virtual_table.cc:134:25: note: in instantiation of member function 'std::unique_ptr<my_result_collector>::~unique_ptr' requested here
        auto consumer = std::make_unique<my_result_collector>(s, permit, &pr, std::move(reader_and_handle.second));
                        ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12879
2023-02-17 07:11:18 +02:00
Botond Dénes
79bf347e04 Merge 'Remove sstables::test_setup in favor of sstables::test_env' from Pavel Emelyanov
The former is a convenience wrapper over the latter. There's no real benefit in using it, but having two test_env-s is worse than just one.

Closes #12794

* github.com:scylladb/scylladb:
  sstable_utils: Move the test_setup to perf/
  sstable_utils: Remove unused wrappers over test_env
  sstable_test: Open-code do_with_cloned_tmp_directory
  sstable_test: Asynchronize statistics_rewrite case
  tests: Replace test_setup::do_with_tmp_directory with test_env::do_with(_async)?
2023-02-17 07:09:34 +02:00
Anna Stuchlik
bcca706ff5 doc: fixes https://github.com/scylladb/scylladb/issues/12754, document the metric update in 5.2
Closes #12891
2023-02-16 19:05:48 +02:00
Nadav Har'El
02682aa40d test/cql-pytest: add reproducer for ALLOW FILTERING bug
This patch adds a reproducer for the bug described in issue #7964 -
The restriction `where k=1 and c=2` (when k,c are the key columns)
returns (at most) a single row so doesn't need ALLOW FILTERING,
but if we add a third restriction, say `v=2`, this still processes
at most a single row so doesn't need ALLOW FILTERING - and both
Scylla and Cassandra get it wrong - so it's marked with both xfail
and cassandra_bug.

The patch also adds another test that for longer partition slices,
e.g., `where k=1 and c>2`, although the slice itself doesn't need
filtering, if we add `v=2` here we suddenly do need ALLOW FILTERING,
because the slice itself may be a large number of rows, and adding
`v=2` may restrict it to just a few results. This test passes
on both Scylla and Cassandra.

Issue #7964 mentioned these scenarios and even had some example code,
but we never added it to the test suite, so we finally do it now.

Refs #7964

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12850
2023-02-16 19:05:48 +02:00
Botond Dénes
dc3d47b1e4 Merge 'Get compaction history without using qctx' from Pavel Emelyanov
There are two methods to mess with compaction history -- update and get. The former had been patched to use local system-keyspace instance by 907fd2d3 (system_keyspace: De-static compaction history update) now it's time for the latter (spoiler: it's only used by the API handler)

Closes #12889

* github.com:scylladb/scylladb:
  system_keyspace; Make get_compaction_history non static and drop qctx
  api, compaction_manager: Get compaction history via manager
  system_keyspace: Move compaction_history_entry to namespace scope
2023-02-16 19:05:48 +02:00
Anna Stuchlik
826f67a298 doc: related https://github.com/scylladb/scylladb/issues/12658, fix the service name in the upgrade guide from 2022.1 to 2022.2
Closes #12698
2023-02-16 19:05:48 +02:00
Botond Dénes
87f7ac920e Merge 'Add task manager utils for tests' from Aleksandra Martyniuk
Tests of each module that is integrated with task manager use
calls to task manager api. Boilerplate to call, check status, and
get result may be reduced using functions.

task_manager_utils.py contains wrappers for task manager api
calls and helpers that may be reused by different tests.

Closes #12844

* github.com:scylladb/scylladb:
  test: use functions from task_manager_utils.py in test_task_manager.py
  test: add task_manager_utils.py
2023-02-16 19:05:48 +02:00
Kefu Chai
fcdea9f950 test/perf: mark output_writer::~output_writer() as virtual
as an abstract base class `output_writer` is inherited by both
`json_output_writer` and `text_output_writer`. and `output_manager`
manages the lifecycles of used writers using
`std::unique_ptr<output_writer>`.

before this change, the dtor of `output_writer` is not marked as
virtual, so when its dtor is invoked, what gets called is the base
class's dtor. but the dtor of `json_output_writer` is non-trivial
in the sense that this class is aggregated by a bunch of member
variables. if we don't invoke its dtor when destroying this object,
leakage is expected.

so, in this change, the dtor of `output_writer` is marked as virtual,
this makes all of its derived classes' dtor virtual. and the right
dtor is always called.

test/perf is only designed for testing, and not used in production,
also, this feature was recently integrated into scylla executable in
228ccdc1c7.

so there is no need to backport this change.

change should also silence the warning from Clang 17:

```
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on 'output_writer' that is abstract but has non-virtual destructor [-Werror,-Wdelete-abstract-non-virtual-dtor]
        delete __ptr;
        ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<output_writer>::operator()' requested here
          get_deleter()(std::move(__ptr));
          ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/stl_construct.h:88:15: note: in instantiation of member function 'std::unique_ptr<output_writer>::~unique_ptr' requested here
        __location->~_Tp();
                     ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12888
2023-02-16 19:05:48 +02:00
Nadav Har'El
27ea908c69 test/cql-pytest: regression test for old secondary-index bug
This patch adds a cql-pytest test for an old secondary-index bug
that was described three years ago in issue #5823. cql-pytest makes
it easy to run the same test against different versions of Scylla,
and it was used to check that the bug existed in Scylla 2.3.0 but
was gone by 2.3.5, and also not present in master or in 2021.1.

A bit about the bug itself:

A secondary index is useful for equality restrictions (a=2) but can't be
used for inequality restrictions (a>=2). In Scylla 3.2.0 we used to have a
bug that because the restriction a>=2 couldn't be used through the index,
it was ignored completely. This is of course a mistake.

Refs #5823

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12856
2023-02-16 19:05:48 +02:00
Alejo Sanchez
16d92b7042 test/topology: pytest driver version use print...
instead of log

Use print instead of logging.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #12846
2023-02-16 19:05:48 +02:00
Kefu Chai
9520acb1a1 logalloc: mark segment_store_backend's virtual
before this change, `seastar_memory_segment_store_backend`
is class with virtual method, but it does not have a virtual
dtor. but we do use a unique_ptr<segment_store_backend> to
manage the lifecycle of an intance of its derived class.
to enable the compiler to call the right dtor, we should
mark the base class's dtor as virtual. this should address
following warings from Clang-17:

```
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on non-final 'logalloc::seastar_memory_segment_store_backend' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
        delete __ptr;
        ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<logalloc::seastar_memory_segment_store_backend>::operator()' requested here
          get_deleter()(std::move(__ptr));
          ^
/home/kefu/dev/scylladb/utils/logalloc.cc:812:20: note: in instantiation of member function 'std::unique_ptr<logalloc::seastar_memory_segment_store_backend>::~unique_ptr' requested here
        : _backend(std::make_unique<seastar_memory_segment_store_backend>())
                   ^
```
and
```
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on 'logalloc::segment_store_backend' that is abstract but has non-virtual destructor [-Werror,-Wdelete-abstract-non-virtual-dtor]
        delete __ptr;
        ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<logalloc::segment_store_backend>::operator()' requested here
          get_deleter()(std::move(__ptr));
          ^
/home/kefu/dev/scylladb/utils/logalloc.cc:811:5: note: in instantiation of member function 'std::unique_ptr<logalloc::segment_store_backend>::~unique_ptr' requested here
    contiguous_memory_segment_store()
    ^
```
Fixes #12872
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12873
2023-02-16 19:05:48 +02:00
Avi Kivity
abe157a873 Drop intrusive_set_external_comparator
Since 5c0f9a8180 ("mutation_partition: Switch cache of
rows onto B-tree") it's no longer in use, except in some
performance test, so remove it.

Although scylla-gdb.py is sometimes used with older releases,
it's so outdated we can remove it from there too.

Closes #12868
2023-02-16 19:05:48 +02:00
Kefu Chai
6eab8720c4 tools/schema_loader: do not return ref to a local variable
we should never return a reference to local variable.
so in this change, a reference to a static variable is returned
instead. this should address following warning from Clang 17:

```
/home/kefu/dev/scylladb/tools/schema_loader.cc:146:16: error: returning reference to local temporary object [-Werror,-Wreturn-stack-address]
        return {};
               ^~
```

Fixes #12875
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12876
2023-02-16 12:15:14 +02:00
Pavel Emelyanov
e234726123 system_keyspace; Make get_compaction_history non static and drop qctx
Now the call is done via the system_keyspace instance, so it can be
unmarked static and can use the local query processor instead of global
qctx.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-16 11:28:04 +03:00
Pavel Emelyanov
52f69643b6 api, compaction_manager: Get compaction history via manager
Right now the API handler directly calls static method from system
keyspace. Patching it to call compaction manager instead will let the
latter use on-board plugged system keyspace for that. If the system
keyspace is not plugged, it means early boot or late shutdown, not a
good time to get compaction history anyway.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-16 11:27:38 +03:00
Pavel Emelyanov
d0e47ace16 system_keyspace: Move compaction_history_entry to namespace scope
It's now a sub-class and it makes forward-declaration in another unit
impossible

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-16 11:24:23 +03:00
Takuya ASADA
bf27fdeaa2 scylla_coredump_setup: fix coredump timeout settings
We currently configure only TimeoutStartSec, but probably it's not
enough to prevent coredump timeout, since TimeoutStartSec is maximum
waiting time for service startup, and there is another directive to
specify maximum service running time (RuntimeMaxSec).

To fix the problem, we should specify RunTimeMaxSec and TimeoutSec (it
configures both TimeoutStartSec and TimeoutStopSec).

Fixes #5430

Closes #12757
2023-02-16 10:23:20 +02:00
Botond Dénes
e9258018d9 Merge 'date: cleanups to silence warnings from clang' from Kefu Chai
- date: drop implicitly generated ctor
- date: use std::in_range() to check for invalid year

Closes #12878

* github.com:scylladb/scylladb:
  date: use std::in_range() to check for invalid year
  date: drop implicitly generated ctor
2023-02-16 10:15:36 +02:00
Botond Dénes
ef50170120 Merge 'build: cmake: sync with configure (2/n)' from Kefu Chai
* build: cmake: extract idl out
* build: cmake: link cql3 against xxHash
* build: cmake: correct the check in Findlibdeflate.cmake
* build: cmake find_package(libdeflate) earlier
* build: cmake: set more properties to alternator library
* build: cmake: include generate_cql_grammar
* build: cmake: find xxHash package
* build: cmake: add build mode support

Closes #12866

* github.com:scylladb/scylladb:
  build: cmake: correct generate_cql_grammar
  build: cmake: extract idl out
  build: cmake: link cql3 against xxHash
  build: cmake: correct the check in Findlibdeflate.cmake
  build: cmake: find_package(libdeflate) earlier
  build: cmake: set more properties to alternator library
  build: cmake: include generate_cql_grammar
  build: cmake: find xxHash package
  build: cmake: add build mode support
2023-02-16 07:11:26 +02:00
Pavel Emelyanov
737f4acc10 features: Enable persisted features on all shards
Commit 1365e2f13e (gms: feature_service: re-enable features on node
startup) re-enabled features on feature service very early, so that on
boot a node sees its "correct" features state before it starts loading
system tables and replaying commitlog.

However, checking features happens on all shards independently, so
re-enabling should also happen on all shards.

One faced problem is in extract_scylla_specific_keyspace_info(). This
helper is used when loading non-system keyspace to read scylla-specific
keyspace options. The helper is called on all shards and on all-but-zero
it evaluates the checked SCYLLA_KEYSPACES feature to false leaving the
specific data empty. As the result, different shards have different view
of keyspaces' configuration.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12881
2023-02-16 00:52:05 +01:00
Kefu Chai
45f0449ccf sstables: mx/writer: remove defaulted move ctor
because its base class of `writer_impl` has a member variable
`_validator`, which has its copy ctor deleted. let's just
drop the defaulted move ctor, as compiler is not able to
generate one for us.

```
/home/kefu/dev/scylladb/sstables/mx/writer.cc:805:5: error: explicitly defaulted move constructor is implicitly deleted [-Werror,-Wdefaulted-function-deleted]
    writer(writer&& o) = default;
    ^
/home/kefu/dev/scylladb/sstables/mx/writer.cc:528:16: note: move constructor of 'writer' is implicitly deleted because base class 'sstable_writer::writer_impl' has a deleted move constructor
class writer : public sstable_writer::writer_impl {
               ^
/home/kefu/dev/scylladb/sstables/writer_impl.hh:29:48: note: copy constructor of 'writer_impl' is implicitly deleted because field '_validator' has a deleted copy constructor
    mutation_fragment_stream_validating_filter _validator;
                                               ^
/home/kefu/dev/scylladb/mutation/mutation_fragment_stream_validator.hh:188:5: note: 'mutation_fragment_stream_validating_filter' has been explicitly marked deleted here
    mutation_fragment_stream_validating_filter(const mutation_fragment_stream_validating_filter&) = delete;
    ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12877
2023-02-15 23:06:10 +02:00
Kefu Chai
0cb842797a treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-15 22:57:18 +02:00
Avi Kivity
ac2a69aab4 Merge 'Move population code into table_population_metadata' from Pavel Emelyanov
There's the distribtued_loader::populate_column_family() helper that manages sstables on their way towards table on boot. The method naturally belongs the the table_population_metadata -- a helper class that in fact prepares the ground for the method in question.

This PR moves the method into metadata class and removes whole lot of extra alias-references and private-fields exporting methods from it. Also it keeps start_subdir and populate_c._f. logic close to each other and relaxes several excessive checks from them.

Closes #12847

* github.com:scylladb/scylladb:
  distributed_loader: Rename table_population_metadata
  distributed_loader: Dont check for directory presense twice
  distributed_loader: Move populate calls into metadata.start()
  distributed_loader: Remove local aliases and exporters
  distributed_loader: Move populate_column_family() into population meta
2023-02-15 22:55:48 +02:00
Yaron Kaikov
cbc005c6f5 Revert "dist/debian: drop unused Makefile variable"
This reverts commit d2e3a60428.

Since it's causing a regression preventing from Scylla service to start in deb OS

Fixes: #12738

Closes #12857
2023-02-15 22:29:24 +02:00
Pavel Emelyanov
0c7efe38e1 distributed_loader: Rename table_population_metadata
It used to be just metadata by providing the meta for population, now it
does the population by itself, so rename it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-15 20:15:04 +03:00
Pavel Emelyanov
15926b22f4 distributed_loader: Dont check for directory presense twice
Both start_subdir() and populate_subdir() check for the directory to
exist with explicit file_exists() check. That's excessive, if the
directory wasn't there in the former call, the latter can get this by
checking the _sstable_directories map.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-15 20:15:04 +03:00
Pavel Emelyanov
eb477a13ad distributed_loader: Move populate calls into metadata.start()
This makes the metadata class export even shorter API, keeps the three
sub-directories scanned in one place and allows removing the zero-shard
assertion.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-15 20:15:04 +03:00
Nadav Har'El
ba18c318b9 Merge 'cql3: eliminate column_condition, streamline condition representation' from Avi Kivity
column_condition is an LWT-specific boolean expression construct, but
recent work allowed it to be re-expressed in terms of generic expressions.

This series completes the work and eliminates the column_condition classes
and source file. Furthermore, a statement's IF clause is represented as a
single expression, rather than a vector of per-column conditions.

Closes #12597

* github.com:scylladb/scylladb:
  cql3: modification_statement: unwrap unnecessary boolean_factors() call
  cql3: modification_statement: use single expression for conditions
  cql3: modification_statment: fix lwt null equality rules mangling
  cql3: broadcast tables: tighten checks on conditions
  cql3: grammar: communicate LWT IF conditions to AST as a simple expression
  cql3: column_condition: fold into modification_statement
  cql3: column_condition: inline column_condition_applies_to into its only caller
  cql3: column_condition: inline column_condition_collect_marker_specification into its only caller
  cql3: column_condition: eliminate column_condition class
  cql3: column_condition: move expression massaging to prepare()
  cql3: grammar: make columnCondition production return an expression
  cql3: grammar: eliminate duplication in LWT IF clause "IN (...)" vs "IN ?"
  cql3: grammar: remove duplication around columnCondition scalar/collection variants
  cql3: grammar: extract column references into a new production
  cql3: column_condition: eliminate column_condition::raw
2023-02-15 19:02:56 +02:00
Pavel Emelyanov
123a82adb2 distributed_loader: Remove local aliases and exporters
After previous patch all local alias references in
populate_column_family() are no longer requires. Neither are the
exporting calls from the table_population_metadata class.

Some non-obvious change is capturing 'this' instead of 'global_table' on
calls that are cross-shard. That's OK, table_population_metadata is not
sharded<> and is designed for cross-shard usage too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-15 19:57:41 +03:00
Pavel Emelyanov
16fca3fa8a distributed_loader: Move populate_column_family() into population meta
This ownership change also requires the auto& = *this alias and extra
specification where to call reshard() and reshape() from.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-15 19:57:41 +03:00
Kefu Chai
76355c056f build: cmake: correct generate_cql_grammar
should have escaped `&` with `\`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-16 00:07:40 +08:00
Kefu Chai
2718963a2a build: cmake: extract idl out
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-16 00:07:40 +08:00
Kefu Chai
9416af8b80 build: cmake: link cql3 against xxHash
turns out cql3 also indirectly uses the header file(s) which
in turn includes xxhash header.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-16 00:07:40 +08:00
Kefu Chai
d6746fc49c build: cmake: correct the check in Findlibdeflate.cmake
otherwise libdeflate is never found.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-16 00:07:40 +08:00
Kefu Chai
1ac5932440 build: cmake: find_package(libdeflate) earlier
so it can be linked by scylla

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-16 00:07:37 +08:00
Kefu Chai
bd1ea104fe build: cmake: set more properties to alternator library
alternator headers are exposed to the target which links against it,
so let's expose them using the `target_include_directories()`.
also, `alternator` uses Seastar library and uses xxHash indirectly.
we should fix the latter by exposing the included header instead,
but for now, let's just link alternator directly to xxHash.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-16 00:07:37 +08:00
Kefu Chai
a0f3c9ebf9 build: cmake: include generate_cql_grammar
we should include "generate_cql_grammar.cmake" for using
`generate_cql_grammar()` function.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-16 00:07:37 +08:00
Kefu Chai
b6a8341eef build: cmake: find xxHash package
we use private API in xxHash, it'd be handy to expose it in the form
of a library target.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-16 00:07:37 +08:00
Kefu Chai
b234c839e4 build: cmake: add build mode support
Scylla uses different build mode to customize the build for different
purposes. in this change, instead of having it in a python dictionary,
the customized settings are located in their own files, and loaded
on demand. we don't support multi-config generator yet.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-16 00:07:37 +08:00
Kefu Chai
55b46ab1a3 date: use std::in_range() to check for invalid year
for better readability, and to silence following warning
from Clang 17:

```
/home/kefu/dev/scylladb/utils/date.h:5965:25: error: result of comparison of constant 9223372036854775807 with expression of type 'int' is always true [-Werror,-Wtautological-constant-out-of-range-compare]
                      Y <= static_cast<int64_t>(year::max())))
                      ~ ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/kefu/dev/scylladb/utils/date.h:5964:57: error: result of comparison of constant -9223372036854775808 with expression of type 'int' is always true [-Werror,-Wtautological-constant-out-of-range-compare]
                if (!(static_cast<int64_t>(year::min()) <= Y &&
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-15 22:56:49 +08:00
Kefu Chai
90981ebb50 date: drop implicitly generated ctor
as one of its member variable does not have default constructor.
this silences following warning from Clang-17:

```
/home/kefu/dev/scylladb/utils/date.h:708:5: error: explicitly defaulted default constructor is implicitly deleted [-Werror,-Wdefaulted-function-deleted]
    year_month_weekday() = default;
    ^
/home/kefu/dev/scylladb/utils/date.h:705:27: note: default constructor of 'year_month_weekday' is implicitly deleted because field 'wdi_' has no default constructor
    date::weekday_indexed wdi_;
                          ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-15 22:56:49 +08:00
Gleb Natapov
9bdef9158e raft: abort applier fiber when a state machine aborts
After 5badf20c7a applier fiber does not
stop after it gets abort error from a state machine which may trigger an
assertion because previous batch is not applied. Fix it.

Fixes #12863
2023-02-15 15:54:19 +02:00
Gleb Natapov
dfcd56736b raft: fix race in add_entry_on_leader that may cause incorrect log length accounting
In add_entry_on_leader after wait_for_memory_permit() resolves but before
the fiber continue to run the node may stop becoming the leader and then
become a leader again which will cause currently hold units outdated.
Detect this case by checking the term after the preemption.
2023-02-15 15:51:59 +02:00
Petr Gusev
b37eee26e1 test_shed_too_large_request: clarify the comments 2023-02-15 17:18:17 +04:00
Petr Gusev
4328f52242 test_shed_too_large_request: use smaller test string
There was a vague comment about CI using
larger limits for shedding. This turned out
to be false, and the real reason of different
limits is that Scylla handles the -m
command line option differently in
debug and release builds.
Debug builds use the default memory allocator
and the value of -m Scylla option
is given to each shard. In release builds
memory is evenly distributed between shards.

To accommodate for this we read the current
memory limit from Scylla metrics.
The helper class ScyllaMetrics was introduced to
handle metrics parsing logic. It can
potentially be reused for dealing with
metrics in other tests.
2023-02-15 17:18:10 +04:00
Avi Kivity
9454844751 cql3: modification_statement: unwrap unnecessary boolean_factors() call
for_each_expression() will recurse anyway.
2023-02-15 14:21:26 +02:00
Avi Kivity
1d0854c0bc cql3: modification_statement: use single expression for conditions
Currently, we use two vectors for static and regular column conditions,
each element referring to a single column. There's a comment that keeping
them separate makes things simpler, but in fact we always treat both
equally (except in one case where we look at just the regular columns
and check that no static column conditions exist).

Simplify by storing just a single expression, which can be a conjunction
of mulitple column conditions.

add_condition() is renamed to analyze_condition(), since it now longers
adds to the vectors.
2023-02-15 14:21:26 +02:00
Avi Kivity
5cb7655a9f cql3: modification_statment: fix lwt null equality rules mangling
search_and_replace() needs to return std::nullopt when it doesn't match,
or it doesn't recurse properly. Currently it doesn't break anything
because we only call the function on a binary_operator, but soon it will.
2023-02-15 14:21:26 +02:00
Avi Kivity
c50c9c86b3 cql3: broadcast tables: tighten checks on conditions
We don't support checks on static columns in broadbast tables,
so explicitly reject them.
2023-02-15 14:21:26 +02:00
Avi Kivity
4d125bffdf cql3: grammar: communicate LWT IF conditions to AST as a simple expression
Instead of passing a vector of boolean factors, pass a single expression
(a conjunction).  This prepares the way for more complex expressions, but
no grammar changes are made here.

The expression is stored as optional, since we'll need a way to indicate
whether an IF clause was supplied or not. We could play games with
boolean_factors(), but it becomes too tricky.

The expressions are broken down back to boolean factors during prepare.
We'll later consolidate them too.
2023-02-15 14:21:26 +02:00
Avi Kivity
23bd7d24df cql3: column_condition: fold into modification_statement
Move column_condition_prepare() and its helper function into
modification_statement, its only caller. The column_condition.{cc,hh}
now become empty, so remove them.

This eliminates the column_condition concept, which was just a
custom expression, in favor of generic expressions. It still
has custom properties due to LWT specialness, but those custom
properties are isolated in column_condition_prepare().
2023-02-15 14:21:24 +02:00
Avi Kivity
12be5d4208 cql3: column_condition: inline column_condition_applies_to into its only caller
This two-liner can be trivilly inlined with no loss of meaning. Indeed
it's less confusing, because "applies_to" became less meaningful once
we integrated the column_value component into the expression.
2023-02-15 14:19:55 +02:00
Avi Kivity
82fb838a70 cql3: column_condition: inline column_condition_collect_marker_specification into its only caller
This one-liner can be trivilly inlined with no loss of meaning.
2023-02-15 14:19:55 +02:00
Avi Kivity
e7b9d9dab9 cql3: column_condition: eliminate column_condition class
It's become a wrapper around expression, so peel it off. The
methods are converted free functions, with the intent to later
inline them into their callers, as they are also mostly just
wrappers.
2023-02-15 14:19:55 +02:00
Avi Kivity
4e93cf9ae9 cql3: column_condition: move expression massaging to prepare()
Move logic out of the column_condition constructor so it becomes
a trivial wrapper, ripe for elimination.
2023-02-15 14:19:55 +02:00
Avi Kivity
31e37ff559 cql3: grammar: make columnCondition production return an expression
Instead of appending to a vector, just return an expression. This
makes the production self-sufficient and more natural to use.
2023-02-15 14:19:55 +02:00
Avi Kivity
d8d4d0bd72 cql3: grammar: eliminate duplication in LWT IF clause "IN (...)" vs "IN ?"
The IN operator recognition is duplicated; de-duplicate it by
introducing the (somewhat artificial) singleColumnInValuesOrMarkerExpr
production.
2023-02-15 14:19:55 +02:00
Avi Kivity
c47cf9858b cql3: grammar: remove duplication around columnCondition scalar/collection variants
columnCondition duplicates the grammar for scalar relations and subscripted
collection relations. Eliminate the duplication by introducing a
subscriptExpr production, which encapsulates the differences.
2023-02-15 14:19:55 +02:00
Avi Kivity
74da77f442 cql3: grammar: extract column references into a new production
Eliminate repetition by creating a new columnRefExpr and
referring to it. Only LWT IF is updated so far. No grammar changes.
2023-02-15 14:19:55 +02:00
Avi Kivity
4d7d3c78a2 cql3: column_condition: eliminate column_condition::raw
It's now a thin wrapper around an expression, so peel the wrapper
and keep just the expression. A boolean expression is, after all,
a condition, and we'll make the condition statement-wide soon
rather than apply just to a column.
2023-02-15 14:19:55 +02:00
guy9
4dd14af7d5 Adding ScyllaDB University LIVE Q1 2023 to Docs top banner
Closes #12860
2023-02-15 13:15:30 +02:00
Nadav Har'El
2d6c53c047 test/cql-pytest: reproduce bug in static-column index lookup
This patch adds a reproducer to a static-column index lookup bug
described in issue #12829: The restriction `where pk=0 and s=1 and c=3`
where pk,c are the primary key and s is an indexed static column,
results in an internal error: "clustering column id 2 >= 2".

Unfortunately, because on_internal_error() crashes Scylla in debug
mode, we need to mark this failing test with skip instead of xfail.

Refs #12829

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12852
2023-02-15 12:23:36 +02:00
Benny Halevy
bb36237cf4 topology: optimize compare_endpoints
This function is called on the fast data path
from storage_proxy when sorting multiple endpoints
by proximity.

This change calculates numeric node diff metrics
based on each address proximity to a given node
(by <dc, rack, same node>) to eliminate logic
branches in the function and reduce its footprint.

based on objdump -d output, compare_endpoints
footprint was reduced by 58.5% (3632 / 8752 bytes)
with clang version 15.0.7 (Fedora 15.0.7-1.fc37)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-15 11:48:24 +02:00
Benny Halevy
3ac2df9480 to_string: add print operators for std::{weak,partial}_ordering
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-15 11:09:04 +02:00
Benny Halevy
bd6f88c193 utils: to_sstring: deinline std::strong_ordering print operator
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-15 11:09:04 +02:00
Benny Halevy
25ebc63b82 move to_string.hh to utils/
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-15 11:09:04 +02:00
Benny Halevy
e7af35a64d test: network_topology: add test_topology_compare_endpoints
Add a regression unit test for topology::compare_endpoint
before it's optimized in the following patches.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-15 11:09:02 +02:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Petr Gusev
1f850374fa test_shed_too_large_request fix: disable compression
The test relies on exact request size, this doesn't
work if compression is applied. The driver enables
compression only if both the server and the client
agree on the codec to use. If compression package
(e.g. lz4) is not installed, the compression
is not used.

The trick with locally_supported_compressions is needed
since I couldn't find any standard means to disable
compression other than the compression flag
on the cluster object, which seemed too broad.

Fixes #12836
2023-02-15 11:55:49 +04:00
Nadav Har'El
c0114d8b02 test/cql-pytest: test another case of ALLOW FILTERING
In issue #12828 it was noted that Scylla requires ALLOW FILTERING
for `where b=1 and c=1` where b is an indexed static column and
c is a clustering key, and it was suggested that this is a bug.

This patch adds a test that confirms that both Scylla and Cassandra
require ALLOW FILTERING in this case. We explain in a comment that
this requirement is expected (i.e., it's not a bug), as the `b=1`
may match a huge number of rows, and the `c=1` may further match just
a few of those - i.e., it is filtering.

This test is virtually identical to the test we already had for
`where a=1 and c=1` - when `a` is an indexed regular column.
There too, the ALLOW FILTERING is required.

Closes #12828 as "not a bug".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12848
2023-02-15 08:43:19 +02:00
Raphael S. Carvalho
ba022f7218 replica: compaction_group: Use sstable_set::size()
More efficient than retrieving size from sstable_set::all() which
may involve copy of elements.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12835
2023-02-15 06:53:04 +02:00
Avi Kivity
19edaa9b78 Merge 'build: cmake: sync with configure.py' from Kefu Chai
this is the first step to reenable cmake to build scylla, so we can experiment C++20 modules and other changes before porting them to `configure.py` . please note, this changeset alone does not address all issues yet. as this is a low priority project, i want to do this in smaller (or tiny!) steps.

* build: cmake: s/Abseil/absl/
* build: cmake: sync with source files compiled in configure.py
* build: cmake: do not generate crc_combine_table at build time
* build: cmake: use packaged libdeflate

Closes #12838

* github.com:scylladb/scylladb:
  build: cmake: add rust binding
  build: cmake: extract cql3 and alternator out
  build: cmake: use packaged libdeflate
  build: cmake: do not generate crc_combine_table at build time
  build: cmake: sync with source files compiled in configure.py
  build: cmake: s/Abseil/absl/
2023-02-14 22:37:10 +02:00
Avi Kivity
df497a5a94 Merge 'treewide: remove implicitly deleted copy ctor and assignment operator' from Kefu Chai
clang 17 trunk helped to identify these issues. so let's fix them.

Closes #12842

* github.com:scylladb/scylladb:
  row_cache: drop defaulted move assignment operator
  utils/histogram: drop defaulted copy ctor and assignment operator
  range_tombstone_list: remove defaulted move assignment operator
  query-result: remove implicitly deleted copy ctor
2023-02-14 20:24:26 +02:00
Kefu Chai
95f8b4eab1 build: cmake: add rust binding
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-14 23:54:20 +08:00
Kefu Chai
f8671188c7 build: cmake: extract cql3 and alternator out
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-14 23:54:20 +08:00
Aleksandra Martyniuk
7b5e653fc9 test: use functions from task_manager_utils.py in test_task_manager.py 2023-02-14 13:34:11 +01:00
Aleksandra Martyniuk
02931163ef test: add task_manager_utils.py
Task manager api will be used in many tests. Thus, to make it easier
api calls to task manager are wrapped into functions in task_manager_utils.py.
Some helpers that may be reused in other tests are moved there too.
2023-02-14 13:34:04 +01:00
Kefu Chai
9ea8a46dd6 build: cmake: use packaged libdeflate
this mirrors the change in b8b78959fb

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-14 19:25:02 +08:00
Kefu Chai
89542232c9 row_cache: drop defaulted move assignment operator
as it has a reference type member variable. and Clang 17 warns
at seeing this
```
/home/kefu/dev/scylladb/row_cache.hh:359:16: warning: explicitly defaulted move assignment operator is implicitly deleted [-Wdefaulted-function-deleted]
    row_cache& operator=(row_cache&&) = default;
               ^
/home/kefu/dev/scylladb/row_cache.hh:214:20: note: move assignment operator of 'row_cache' is implicitly deleted because field '_tracker' is of reference type 'cache_tracker &'
    cache_tracker& _tracker;
                   ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-14 19:22:19 +08:00
Kefu Chai
68327123ac utils/histogram: drop defaulted copy ctor and assignment operator
as one of the (indirected) member variables has a user-declared move
ctor, this prevents the compiler from generating the default copy ctor
or assignment operator for the classes containing `timer`.

```
/home/kefu/dev/scylladb/utils/histogram.hh:440:5: warning: explicitly defaulted copy constructor is implicitly deleted [-Wdefaulted-function-deleted]
    timed_rate_moving_average_and_histogram(const timed_rate_moving_average_and_histogram&) = default;
    ^
/home/kefu/dev/scylladb/utils/histogram.hh:437:31: note: copy constructor of 'timed_rate_moving_average_and_histogram' is implicitly deleted because field 'met' has a deleted copy constructor
    timed_rate_moving_average met;
                              ^
/home/kefu/dev/scylladb/utils/histogram.hh:298:17: note: copy constructor of 'timed_rate_moving_average' is implicitly deleted because field '_timer' has a deleted copy constructor
    meter_timer _timer;
                ^
/home/kefu/dev/scylladb/utils/histogram.hh:212:13: note: copy constructor of 'meter_timer' is implicitly deleted because field '_timer' has a deleted copy constructor
    timer<> _timer;
            ^
/home/kefu/dev/scylladb/seastar/include/seastar/core/timer.hh:111:5: note: copy constructor is implicitly deleted because 'timer<>' has a user-declared move constructor
    timer(timer&& t) noexcept : _sg(t._sg), _callback(std::move(t._callback)), _expiry(std::move(t._expiry)), _period(std::move(t._period)),
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-14 19:22:19 +08:00
Kefu Chai
b13caeedda range_tombstone_list: remove defaulted move assignment operator
as `range_tombstone_list::reverter` has a member variable of
`const schema& _s`, which cannot be mutated, so it is not allowed
to have an assignment operator.

this change should address the warning from Clang 17:

```
/home/kefu/dev/scylladb/range_tombstone_list.hh:122:19: warning: explicitly defaulted move assignment operator is implicitly deleted [-Wdefaulted-function-deleted]
        reverter& operator=(reverter&&) = default;
                  ^
/home/kefu/dev/scylladb/range_tombstone_list.hh:111:23: note: move assignment operator of 'reverter' is implicitly deleted because field '_s' is of reference type 'const schema &'
        const schema& _s;
                      ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-14 19:22:19 +08:00
Kefu Chai
f36fdff622 query-result: remove implicitly deleted copy ctor
as one of the (indirect) member variables of `query::result` is not
copyable, compiler refuses to create a copy ctor or an assignment
operator for us, an Clang 17 warns at seeing this.

so let's just drop them for better readability and more importantly
to preserve the correctness.

```
/home/kefu/dev/scylladb/query-result.hh:385:5: warning: explicitly defaulted copy constructor is implicitly deleted [-Wdefaulted-function-deleted]
    result(const result&) = default;
    ^
/home/kefu/dev/scylladb/query-result.hh:321:34: note: copy constructor of 'result' is implicitly deleted because field '_memory_tracker' has a deleted copy constructor
    query::result_memory_tracker _memory_tracker;
                                 ^
/home/kefu/dev/scylladb/query-result.hh:97:23: note: copy constructor of 'result_memory_tracker' is implicitly deleted because field '_units' has a deleted copy constructor
    semaphore_units<> _units;
                      ^
/home/kefu/dev/scylladb/seastar/include/seastar/core/semaphore.hh:500:5: note: 'semaphore_units' has been explicitly marked deleted here
    semaphore_units(const semaphore_units&) = delete;
    ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-14 19:22:19 +08:00
Avi Kivity
4f5a460db9 Update seastar submodule
* seastar 943c09f869...9b6e181e42 (34):
  > semaphore: disallow move after used
  > Revert "semaphore: assert no outstanding units when moved"
  > reactor, tests: drop unused include
  > spawn_test: prolong termination time to be more tolerant.
  > net: s/offload_info()/get_offload_info()/
  > Merge 'Extend http client with keep-alive connections' from Pavel Emelyanov
  > util/gcc6-concepts.hh: drop gcc6-concepts.hh
  > treewide: do not inline tls variables in shared library
  > reactor: Remove --num-io-queues option
  > build: correct the comment
  > smp: do not inline function when BUILD_SHARED_LIBS
  > iostream: always flush _fd in do_flush
  > thread_pool: prevent missed wakeup when the reactor goes to sleep in parallel with a syscall completion
  > Merge 'build: do not always build seastar as a static library' from Kefu Chai
  > Revert "Merge 'Keep outgoing queue all cancellable while negotiating' from Pavel Emelyanov"
  > Merge 'Keep outgoing queue all cancellable while negotiating' from Pavel Emelyanov
  > memcached: prolong expiration time to be more tolerant
  > treewide: add non-seastar "#include"s
  > Merge 'Allow multiple abort requests' from Aleksandra Martyniuk
  > app-template: remove duplicated includes
  > include/seastar: s/SEASTAR_NODISCARD/[[nodiscard]]/
  > prometheus: Don't report labels that starts with __
  > memory: do not define variable only for assert
  > reactor: set_shard_field_width() after resource::allocate()
  > Merge 'reactor, core/resource: clean ups' from Kefu Chai
  > util/concepts: include <concepts>
  > build: use target_link_options() to pass options to linker
  > iostream: add doxygen comment for eof()
  > Merge 'util/print_safe, reactor: use concept for type constraints and refactory ' from Kefu Chai
  > Right align the memory diagnostics
  > Merge 'Add an API for the metrics layer to manipulate metrics dynamically.' from Amnon Heiman
  > semaphore: assert no outstanding units when moved
  > build: do not populate package registry by default
  > build: stop detecting concepts support

Closes #12827
2023-02-14 13:04:17 +02:00
Avi Kivity
c5e4bf51bd Introduce mutation/ module
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.

mutation_reader remains in the readers/ module.

mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.

This is a step forward towards librarization or modularization of the
source base.

Closes #12788
2023-02-14 11:19:03 +02:00
Kefu Chai
e2a20a108f tools: toolchain: dbuild: reindent a "case" block
to replace tabs with spaces, for better readability if the editor
fails to render tabs with the right tabstop setting.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12839
2023-02-14 10:37:25 +02:00
Raphael S. Carvalho
d6fe99abc4 replica: table: Update stats for newly added SSTables
Patch 55a8421e3d fixed an inefficiency when rebuilding
statistics with many compaction groups, but it incorrectly removed
the update for newly added SSTables. This patch restores it.
When a new SSTable is added to any of the groups, the stats are
incrementally updated (as before). On compaction completion,
statistics are still rebuilt by simply iterating through each
group, which keeps track of its own stats.
Unit tests are added to guarantee the stats are correct both after
compaction completion and memtable flush.

Fixes #12808.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12834
2023-02-14 10:28:53 +02:00
Wojciech Mitros
cab5b08948 git: remove Cargo.lock from .gitignore
When rust wasmtime bindings were added, we commited Cargo.lock to
make sure a given version of Scylla always builds using the same
versions of rust dependencies. Therefore, it should not be present
in .gitignore.

Closes #12831
2023-02-14 08:51:53 +02:00
Wojciech Mitros
8b756cb73f rust: update dependencies
Wasmtime added some improvements in recent releases - particularly,
two security issues were patched in version 2.0.2. There were no
breaking changes for our use other than the strategy of returning
Traps - all of them are now anyhow::Errors instead, but we can
still downcast to them, and read the corresponding error message.

The cxx, anyhow and futures dependency versions now match the
versions saved in the Cargo.lock.

Closes #12830
2023-02-14 08:51:20 +02:00
Nadav Har'El
14cdd034ee test/alternator: fix flaky test for partition-tombstone scan
The test test_scan.py::test_scan_long_partition_tombstone_string
checks that a full-table Scan operation ends a page in the middle of
a very long string of partition tombstones, and does NOT scan the
entire table in one page (if we did that, getting a single page could
take an unbounded amount of time).

The test is currently flaky, having failed in CI runs three times in
the past two months.

The reason for the flakiness is that we don't know exactly how long
we need to make the sequence of partition tombstones in the test before
we can be absolutely sure a single page will not read this entire sequence.
For single-partition scans we have the "query_tombstone_page_limit"
configuration parameter, which tells us exactly how long we need to
make the sequence of row tombstones. But for a full-table scan of
partition tombstones, the situation is more complicated - because the
scan is done in parallel on several vnodes in parallel and each of
them needs to read query_tombstone_page_limit before it stops.

In my experiments, using query_tombstone_limit * 4 consecutive tombstones
was always enough - I ran this test hundreds of times and it didn't fail
once. But since it did fail on Jenkins very rarely (3 times in the last
two months), maybe the multiplier 4 isn't enough. So this patch doubles
it to 8. Hopefully this would be enough for anyone (TM).

This makes this test even bigger and slower than it was. To make it
faster, I changed this test's write isolation mode from the default
always_use_lwt to forbid_rmw (not use LWT). This leaves the test's
total run time to be similar to what it was before this patch - around
0.5 seconds in dev build mode on my laptop.

Fixes #12817

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12819
2023-02-14 08:09:44 +02:00
Kefu Chai
cec2e2f993 build: cmake: do not generate crc_combine_table at build time
mirrors the change in 70217b5109

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-14 11:42:08 +08:00
Kefu Chai
a8fca52398 build: cmake: sync with source files compiled in configure.py
these source files are out of sync with the source files listed
in `configured.py`. some of them were removed, some of them were
added. let's try to keep them in sync. this pave the road to a
working CMakeLists.txt

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-14 11:42:04 +08:00
Kefu Chai
50ff27514c build: cmake: s/Abseil/absl/
find abseil library with the name of absl, instead of "Abseil".

absl's cmake config file is provided with the name of
`abslConfig.cmake`, not `AbseilConfig.cmake`.

see also
cde2f0eaae/CMakeLists.txt (L198)
.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-14 11:41:59 +08:00
Nadav Har'El
310638e84d Merge 'wasm: deserialize counters as integers' from Wojciech Mitros
Currently, because serialize_visitor::operator() is not implemented for counters, we cannot convert a counter returned by a WASM UDF to bytes when returning from wasm::run_script().

We could disallow using counters as WASM UDF return types, but an easier solution which we're already using in Lua UDFs is treating the returned counters as 64-bit integers when deserializing. This patch implements the latter approach and adds a test for it.

Closes #12806

* github.com:scylladb/scylladb:
  wasm udf: deserialize counters as integers
  test_wasm.py: add utility function for reading WASM UDF saved in files
2023-02-13 19:24:11 +02:00
Nadav Har'El
6a45881d22 Merge 'functions: handle replacing UDFs used in UDAs' from Wojciech Mitros
This patch is based on #12681, only last 3 commits are relevant.

As described in #12709, currently, when a UDF used in a UDA is replaced, the UDA is not updated until the whole node is restarted.

This patch fixes the issue by updating all affected UDAs when a UDF is replaced.
Additionally, it includes a few convenience changes

Closes #12710

* github.com:scylladb/scylladb:
  uda: change the UDF used in a UDA if it's replaced
  functions: add helper same_signature method
  uda: return aggregate functions as shared pointers
2023-02-13 16:30:24 +02:00
Benny Halevy
b2d3c1fcc2 abstract_replication_strategy: add for_each_natural_endpoint_until
Currently, effective_replication_map::do_get_ranges accepts
a functor that traverses the natural endpoints of each token
to decide whether a token range should be returned or not.

This is done by copying the natural endpoints vector for
each token.  However, other than special strategies like
everywhere and local, the functor can be called on the
precalculated inet_address_vector_replica_set in the
replication_map and there's no need to copy it for each call.

for_each_natural_endpoint_until passes a reference to the function
down to the abstract replication strategy to let it work either
on the precalculated inet_address_vector_replica_set or
on a ad-hoc vector prepared by the replication strategy.
The function returns stop_iteration::yes when a match or mismatch
are found, or stop_iteration::no while it has no definite result.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12737
2023-02-13 16:30:24 +02:00
Nadav Har'El
efed973dd3 Merge 'cql3: convert LWT IF clause to expressions' from Avi Kivity
LWT `IF` (column_condition) duplicates the expression prepare and evaluation code. Annoyingly,
LWT IF semantics are a little different than the rest of CQL: a NULL equals NULL, whereas usually
NULL = NULL evaluates to NULL.

This series converts `IF` prepare and evaluate to use the standard expression code. We employ
expression rewriting to adjust for the slightly different semantics.

In a few places, we adjust LWT semantics to harmonize them with the rest of CQL. These are pointed
out in their own separate patches so the changes don't get lost in the flood.

Closes #12356

* github.com:scylladb/scylladb:
  cql3: lwt: move IF clause expression construction to grammar
  cql3: column_condition: evaluate column_condition as a single expression
  cql3: lwt: allow negative list indexes in IF clause
  cql3: lwt: do not short-circuit col[NULL] in IF clause
  cql3: column_condition: convert _column to an expression
  cql3: expr: generalize evaluation of subscript expressions
  cql3: expr: introduce adjust_for_collection_as_maps()
  cql3: update_parameters: use evaluation_inputs compatible row prefetch
  cql3: expr: protect extract_column_value() from partial clustering keys
  cql3: expr: extract extract_column_value() from evaluation machinery
  cql3: selection: introduce selection_from_partition_slice
  cql3: expr: move check for ordering on duration types from restrictions to prepare
  cql3: expr: remove restrictions oper_is_slice() in favor of expr::is_slice()
  cql3: column_condition: optimize LIKE with constant pattern after preparing
  cql3: expr: add optimizer for LIKE with constant pattern
  test: lib: add helper to evaluate an expression with bind variables but no table
  cql3: column_condition: make the left-hand-side part of column_condition::raw
  cql3: lwt: relax constraints on map subscripts and LIKE patterns
  cql3: expr: fix search_and_replace() for subscripts
  cql3: expr: fix function evaluation with NULL inputs
  cql3: expr: add LWT IF clause variants of binary operators
  cql3: expr: change evaluate_binop_sides to return more NULL information
2023-02-13 16:30:24 +02:00
Nadav Har'El
621c49b621 test/alternator: more tests for listing streams
In issue #12601, a dtest involving paging of ListStreams showed
incorrect results - the paged results had one duplicate stream and one
missing stream. We believe that the cause of this bug was that the
unsorted map of tables can change order between pages. In this patch
we add a test test_list_streams_paged_with_new_table which can
demonstrate this bug - by adding a lot of tables in mid-paging, we
cause the unsorted map to be reshufled and the paging to break.
This is not the same situation as in #12601 (which did not involve
new tables) but we believe it demonstrates the same bug - and check
its fix. Indeed this passes with the fix in pull request #12614 and
fails without it.

This patch also adds a second test, test_stream_arn_unchanging:
That test eliminates a guess we had for the cause of #12601. We
thought that maybe stream ARN changing on a table if its schema
version changes, but the new test confirms that it actually behaves
as expected (the stream ARN doesn't change).

Refs #12601
Refs #12614

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12616
2023-02-13 16:30:24 +02:00
Nadav Har'El
25610c81fb test/cql-pytest: another reproducer for index+limit+filtering bug
This patch adds yet another reproducer for issue #10649, where a
the combination of filtering and LIMIT returns fewer results when
a secondary index is added to the table.

Whereas the previous tests we had for this issue involved a regular
(global) index, the new test uses a local index (a Scylla-only feature).
It shows that the same bug exists also for local indexes, as noticed
by a user in #12766.

Refs #10649
Refs #12766

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12783
2023-02-13 16:30:24 +02:00
Botond Dénes
e29e836aca docs/operating-scylla: add a document on diagnostic tools
ScyllaDB has wide variety of tools and source of information useful for
diagnosing problems. These are scattered all over the place and although
most of these are documented, there is currently no document listing all
the relevant tools and information sources when it comes to diagnosing a
problem.
This patch adds just that: a document listing the different tools and
information sources, with a brief description of how they can help in
diagnosing problems, and a link to the releveant dedicated documentation
pages.

Closes #12503
2023-02-13 16:30:24 +02:00
Botond Dénes
e55f475db1 Merge 'test/pylib: use larger timeout for decommission/removenode' from Kamil Braun
Recently we enabled RBNO by default in all topology operations. This
made the operations a bit slower (repair-based topology ops are a bit
slower than classic streaming - they do more work), and in debug mode
with large number of concurrent tests running, they might timeout.

The timeout for bootstrap was already increased before, do the same for
decommission/removenode. The previously used timeout was 300 seconds
(this is the default used by aiohttp library when it makes HTTP
requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which
is 1000 seconds.

Closes #12765

* github.com:scylladb/scylladb:
  test/pylib: use larger timeout for decommission/removenode
  test/pylib: scylla_cluster: rename START_TIMEOUT to TOPOLOGY_TIMEOUT
2023-02-13 16:30:24 +02:00
Kefu Chai
08b7e8b807 configure.py: use seastar_dep and seastar_testing_dep
now that these variables are set, let's reuse them when appropriate.
less repeatings this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12802
2023-02-13 16:30:24 +02:00
Nadav Har'El
ecfcb93ef5 test/cql-pytest: regression test for old bug of misused index
Issue #7659, which we solved long ago, was about a query which included
a non-EQ restriction and wrongly picked up one of the indexes. It had
a short C++ regression test, but here we add a more elaborate Python
test for the same bug. The advantages of the Python test are:

1. The Python test can be run against any version of Scylla (e.g., to
   whether a certain version contains a backport of the fix).

2. The Python test reproduces not only a "benign" query error, but also
   an assertion-failed crash which happened when the non-EQ restriction
   was an "IN".

3. The Python test reproduces the same bug not just for a regular
   index, but also a local index.

I checked that, as expected, these tests pass on master, but fail
(and crash Scylla) in old branches before the fix for #7659.

Refs #7659.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12797
2023-02-13 16:30:24 +02:00
Takuya ASADA
7e690bac62 install-dependencies.sh: update node_exporter to 1.5.0
Update node_exporter to 1.5.0.

Closes scylladb/scylla-pkg#3190

Closes #12793

[avi: regenerate frozen toolchain]

Closes #12813
2023-02-13 16:30:24 +02:00
Pavel Emelyanov
fa5f5a3299 sstable_test_env: Remove working_sst helper
It's only used by the single test and apparently exists since the times
seastar was missing the future::discard_result() sugar

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12803
2023-02-13 16:30:24 +02:00
Wojciech Mitros
b25ee62f75 wasm udf: deserialize counters as integers
Currently, because serialize_visitor::operator() is not implemented
for counters, we cannot convert a counter returned by a WASM UDF
to bytes when returning from wasm::run_script().

We could disallow using counters as WASM UDF return types, but an
easier solution which we're already using in Lua UDFs is treating
the returned counters as 64-bit integers when deserializing. This
patch implements the latter approach and adds a test for it.
2023-02-13 14:24:20 +01:00
Wojciech Mitros
3b8bf1ae3a test_wasm.py: add utility function for reading WASM UDF saved in files
Currently, we're repeating the same os.path, open, read, replace
each time we read a WASM UDF from a file.

To reduce code bloat, this patch adds a utility function
"read_function_from_file" that finds the file and reads it given
a function name and an optional new name, for cases when we want
to use a different name in cql (mostly for unique_names).
2023-02-13 14:24:20 +01:00
Nadav Har'El
a24600a662 Merge 'test/pylib: split and refactor topology tests' from Alecco
Move long running topology tests out of  `test_topology.py` and into their own files, so they can be run in parallel.

While there, merge simple schema tests.

Closes #12804

* github.com:scylladb/scylladb:
  test/topology: rename topology test file
  test/topology: lint and type for topology tests
  test/topology: move topology ip tests to own file
  test/topology: move topology test remove garbaje...
  test/topology: move topology rejoin test to own file
  test/topology: merge topology schema tests and...
  test/topology: isolate topology smp params test
  test/topology: move topology helpers to common file
2023-02-12 17:53:48 +02:00
Avi Kivity
87c0d09d03 cql3: lwt: move IF clause expression construction to grammar
Instead of the grammar passing expression bits to column_condition,
have the grammar construct an unprepared expression and pass it as
a whole. column_condition::raw then uses prepare_expression() to
prepare it.

The call to validate_operation_on_durations() is eliminated, since it's
already done be prepare_expression().

Some tests adjusted for slightly different wording.
2023-02-12 17:28:36 +02:00
Avi Kivity
37c9c46101 cql3: column_condition: evaluate column_condition as a single expression
Instead of laboriously hand-evaluating each expression component,
construct one expression for the entire column_condition during
prepare time, and evaluate it using the generic machinery.

LWT IF evaluates equality against NULL considering two NULLs
as equal. We handle that by rewriting such expressions to use
null_handling_style::lwt_nulls.

Note we use expr::evaluate() rather than is_satisfied_by(), since
the latter doesn't like functions on the top-level, which we have
due to LIKE with constant pattern optimization. evaluate() is more
generic anyway.
2023-02-12 17:28:05 +02:00
Avi Kivity
8e972b52c5 cql3: lwt: allow negative list indexes in IF clause
LWT IF clause errors out on negative list index. This deviates
from non-LWT subscript evaluation, PostgresQL, and too-large index,
all of which evaluate the subscript operation to NULL.

Make things more consistent by also evaluating list[-1] to NULL.

A test is adjusted.
2023-02-12 17:28:05 +02:00
Avi Kivity
433b778a4d cql3: lwt: do not short-circuit col[NULL] in IF clause
Currently if an LWT IF clause contains a subscript with NULL
as the key, then the entire IF clause is evaluated as FALSE.
This is incorrect, because col[NULL] = NULL would simplify
to NULL = NULL, which is interpreted as TRUE using the LWT
comparisons. Even with SQL NULL handling, "col[NULL] IS NULL"
should evaluate to true, but since we short-circuit as soon
as we encounter the NULL key, we cannot complete the evaluation.

Fix by setting cell_value to null instead of returning immediately.

Tests that check for this were adjusted. Since the test changed
behavior from not applying the statement to applying it, a new
statement is added that undoes the previous one, so downstream
statements are not affected.
2023-02-12 17:28:05 +02:00
Avi Kivity
b888e3d26a cql3: column_condition: convert _column to an expression
After this change, all components of column_condition are expressions.
One LWT-specific hack was removed from the evaluation path:

 - lists being represented as maps is made transparent by
   converting during evaluation with adjust_for_collections_as_maps()

column_condition::applies_to() previously handled a missing row
by materializing a NULL for the column being evaluated; now it
materializes a NULL row instead, since evaluation of the column is
moved to common code.

A few more cases in lwt_test became legal, though I'm not sure
exactly why in this patch.
2023-02-12 17:28:01 +02:00
Avi Kivity
568c1a5a36 cql3: expr: generalize evaluation of subscript expressions
Currently, evaluation of a subscript expression x[y] requires that
x be a column_value, but that's completely artificial. Generalize
it to allow any expression.

This is needed after we transform a LWT IF condition from
"a[x] = y" to "func(a)[x] = y", where func casts a from a
map represention of a list back to a list; but it's also generally
useful.
2023-02-12 17:25:46 +02:00
Avi Kivity
6de4032baf cql3: expr: introduce adjust_for_collection_as_maps()
LWT and some list operations represent lists using a form like
their mutations, so that the mutation list keys can be recovered
and used to update the list. But the evaluation machinery knows
nothing about that, and will return the map-form even though the type
system thinks it is a list.

To handle that, add a utility to rewrite the expression so
that the value is re-serialized into the expected list form. The
rewrite is implemented as a scalar function taking the map form and
returning the list form.
2023-02-12 17:25:46 +02:00
Avi Kivity
3a2d8175fb cql3: update_parameters: use evaluation_inputs compatible row prefetch
update_parameters::prefetch_data is used for some list updates (which
need a read-before-write to determine the key to update) and for
LWT compare-and-swap. Currently they use a custom structure for
representing a read row.

Switch to the same structure that is used in evaluation_inputs (and
in SELECT statement evaluation) to the expression machinery can be reused.

The expression representation is irregular (with different fields for
the keys and regular/static columns), so we introduce an old_row
structure to hold both the clustering key and the regular row values
for cas_request.

A nice bonus is that we can use get_non_pk_values() to read the data
into the format expected by evaluation_inputs, but on the other hand
we have to adjust get_prefetched_list() to fix up the type of
the returned list (we return it as a map, not a list, so list updates
can access the index).
2023-02-12 17:25:41 +02:00
Avi Kivity
47026b7ee0 cql3: expr: protect extract_column_value() from partial clustering keys
Partial clustering keys can exist in COMPACT STORAGE tables (though they
are exceedingly rare), and when LWT materializes a static row. Harden
extract_column_value() so it is ready for them.
2023-02-12 17:17:01 +02:00
Avi Kivity
c8d77c204f cql3: expr: extract extract_column_value() from evaluation machinery
Expression evaluation works with the evaluation_input structure to
compute values. As we move LWT column_condition towards expressions,
we'll start using evaluation_input, so provide this helper to ease
the transition.
2023-02-12 17:17:01 +02:00
Avi Kivity
721c05b7ec cql3: selection: introduce selection_from_partition_slice
Since expressions were introduced for SELECT statements, they
work with `selection` object to represent which table columns
they can work with. Probably a neutral representation would have
been better, but that's what we have now.

LWT works with partition_slice, so introduce a
selection_from_partition_slice() helper to bridge the two worlds.
2023-02-12 17:17:01 +02:00
Avi Kivity
31ee13c0c9 cql3: expr: move check for ordering on duration types from restrictions to prepare
Both LWT IF clause and SELECT WHERE clause check that a duration type
isn't used in an ordered comparison, since duration types are unordered
(is 1mo more or less than 30d?). As a first step towards centralizing this
check, move the check from restrictions into prepare. When LWT starts using
prepare, the duplication will be removed.

The error message was changed: the word "slice" is an internal term, and
a comparison does not necessarily have to be in a restriction (which is
also an internal term).

Tests were adjusted.
2023-02-12 17:17:01 +02:00
Avi Kivity
c0b1992fc4 cql3: expr: remove restrictions oper_is_slice() in favor of expr::is_slice()
The two are functionally identical, so eliminate duplicate code.
2023-02-12 17:17:01 +02:00
Avi Kivity
036fa0891f cql3: column_condition: optimize LIKE with constant pattern after preparing
This just moves things around to put all the code we will kill in
one place.

Note the code was adjusted: before the move, it operated on
an unprepared untyped_constant; after the move it operates on
a prepared constant.
2023-02-12 17:17:01 +02:00
Avi Kivity
db2fa44a9a cql3: expr: add optimizer for LIKE with constant pattern
Compiling a pattern is expensive and so we should try to do it
at prepare time, if the pattern is a constant. Add an optimizer
that looks for such cases and replaces them with a unary function
that embeds the compiled pattern.

This isn't integrated yet with prepare_expr(), since the filtering
code isn't ready for generic expressions. Its first user will be LWT,
which contains the optimization already (filtering had it as well,
but lost it sometime during the expression rewrite).

A unit test is added.
2023-02-12 17:16:58 +02:00
Avi Kivity
1959f9937c test: lib: add helper to evaluate an expression with bind variables but no table
Sometimes we want to defeat the expression optimizer's ability to
fold constant expressions. A bind variable is a convenient way to
do this, without the complexity of faking a schema and row inputs.
Add a helper to evaluate an expression with bind variable parameters,
doing all the paperwork for us.

A companion make_bind_variable() is added to likewise simplify
creating bind variables for tests.
2023-02-12 17:05:22 +02:00
Avi Kivity
899c4a7f29 cql3: column_condition: make the left-hand-side part of column_condition::raw
LWT IF conditions are collected with the left-hand-side outside the
condition structure, then moved back to the prepared condition
structure during preparation. Change that so that the raw description
also contains the left-hand-side. This makes it more similar to expressions
(which LWT conditions aspire to be).

The change is mechanical; a bit of code that used to manage the std::pair
is moved to column_condition::raw::prepare instead. The schema is now also
passed since it's needed to prepare the left-hand-side.
2023-02-12 17:05:22 +02:00
Avi Kivity
f5257533fd cql3: lwt: relax constraints on map subscripts and LIKE patterns
Previously, we rejected map subscripts that are NULL, as well as
LIKE patterns that are NULL. General SQL expression evaluation
allows NULL everywhere, and doesn't raise errors - an expression
involving NULL generally yields NULL. Change the behavior to
follow that. Since the new behavior was previously disallowed,
no one should have been relying on it and there is no compatibility
problem.

Update the tests and note it as a CQL extension.
2023-02-12 17:05:22 +02:00
Avi Kivity
b40dc49e05 cql3: expr: fix search_and_replace() for subscripts
We forgot to preserve the subscript's type, so fix that.

Also drop a leftover throw. It's dead code, immediately after a return.
2023-02-12 17:05:22 +02:00
Avi Kivity
8dda84bb0c cql3: expr: fix function evaluation with NULL inputs
Function call evaluation rejects NULL inputs, unnecssarily. Functions
work well with NULL inputs. Fix by relaxing the check.

This currently has no impact because functions are not evaluated via
expressions, but via selectors.
2023-02-12 17:05:22 +02:00
Avi Kivity
ecdd49317a cql3: expr: add LWT IF clause variants of binary operators
LWT IF clause interprets equality differently from SQL (and the
rest of CQL): it thinks NULL equals NULL. Currently, it implements
binary operators all by itself so the fact that oper_t::EQ (and
friends) means something else in the rest of the code doesn't
bother it. However, we can't unify the code (in
column_condition.cc) with the rest of expression evaluation if
the meaning changes in different places.

To prepare for this, introduce a null_handling_style field to
binary_operator that defaults to `sql` but can be changed to
`lwt_nulls` to indicate this special semantic.

A few unit tests are added. LWT itself still isn't modified.
2023-02-12 17:03:03 +02:00
Alejo Sanchez
8bf2d515de test/topology: rename topology test file
Rename test_topology.py to reflect current tests.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-02-12 12:59:31 +01:00
Alejo Sanchez
11691ba7f5 test/topology: lint and type for topology tests
Fix minor lint and type hints.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-02-12 12:59:31 +01:00
Alejo Sanchez
49baf6789c test/topology: move topology ip tests to own file
Move slow topology IP related tests to a separate file.

Add docstrings.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-02-12 12:59:19 +01:00
Alejo Sanchez
3fcef63a0f test/topology: move topology test remove garbaje...
group0 members to own file

Move slow test for removenode with nodes not present in group0 to a
server after a sudden stop to a separate file.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-02-12 12:48:39 +01:00
Nadav Har'El
10ca08e8ac Merge 'Sequence CDC preimage select with Paxos learn write' from Kamil Braun
`paxos_response_handler::learn_decision` was calling
`cdc_service::augment_mutation_call` concurrently with
`storage_proxy::mutate_internal`. `augment_mutation_call` was selecting
rows from the base table in order to create the preimage, while
`mutate_internal` was writing rows to the table. It was therefore
possible for the preimage to observe the update that it accompanied,
which doesn't make any sense, because the preimage is supposed to show
the state before the update.

Fix this by performing the operations sequentially. We can still perform
the CDC mutation write concurrently with the base mutation write.

`cdc_with_lwt_test` was sometimes failing in debug mode due to this bug
and was marked flaky. Unmark it.

Also fix a comment in `cdc_with_lwt_test`.

Fixes #12098

Closes #12768

* github.com:scylladb/scylladb:
  test/cql-pytest: test_cdc: regression test for #12098
  test/cql: cdc_with_lwt_test: fix comment
  service: storage_proxy: sequence CDC preimage select with Paxos learn
2023-02-12 13:28:34 +02:00
Alejo Sanchez
655e1587e3 test/topology: move topology rejoin test to own file
Move slow test for rejoining a server after a sudden stop to a separate
file.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-02-12 12:02:47 +01:00
Alejo Sanchez
7cc669f5a5 test/topology: merge topology schema tests and...
... move them to their own file.

Schema verification tests for restart, add, and hard stop of server can
be done with the same cluster. Merge them in the same test case.

While there, move them to a separate file to be run independently as
this is a slow test.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-02-12 12:02:40 +01:00
Alejo Sanchez
93de79d214 test/topology: isolate topology smp params test
Move slow test for different smp parameters to its own file.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-02-12 12:02:32 +01:00
Alejo Sanchez
293550ca5c test/topology: move topology helpers to common file
Move helper functions to a common file ahead of splitting topology
tests.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-02-12 12:02:16 +01:00
Nadav Har'El
2653865b34 Merge 'test.py: improve test failure handling' from Kamil Braun
Improve logging by printing the cluster at the end of each test.

Stop performing operations like attempting queries or dropping keyspaces on dirty clusters. Dirty clusters might be completely dead and these operations would only cause more "errors" to happen after a failed test, making it harder to find the real cause of failure.

Mark cluster as dirty when a test that uses it fails - after a failed test, we shouldn't assume that the cluster is in a usable state, so we shouldn't reuse it for another test.

Rely on the `is_dirty` flag in `PythonTest`s and `CQLApprovalTest`s, similarly to what `TopologyTest`s do.

Closes #12652

* github.com:scylladb/scylladb:
  test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters
  test/topology: don't drop random_tables keyspace after a failed test
  test/pylib: mark cluster as dirty after a failed test
  test: pylib, topology: don't perform operations after test on a dirty cluster
  test/pylib: print cluster at the end of test
2023-02-12 12:13:25 +02:00
Kamil Braun
54f85c641d test/pylib: use larger timeout for decommission/removenode
Recently we enabled RBNO by default in all topology operations. This
made the operations a bit slower (repair-based topology ops are a bit
slower than classic streaming - they do more work), and in debug mode
with large number of concurrent tests running, they might timeout.

The timeout for bootstrap was already increased before, do the same for
decommission/removenode. The previously used timeout was 300 seconds
(this is the default used by aiohttp library when it makes HTTP
requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which
is 1000 seconds.
2023-02-10 15:56:31 +01:00
Kamil Braun
fde6ad5fc0 test/pylib: scylla_cluster: rename START_TIMEOUT to TOPOLOGY_TIMEOUT
Use a more generic name since the constant will also be used as timeout
for decommission and removenode.
2023-02-10 15:56:31 +01:00
Kamil Braun
ca4db9bb72 Merge 'test/raft: test snapshot threshold' from Alecco
Force snapshot with schema changes while server down. Then verify schema when bringing back up the server.

Closes #12726

* github.com:scylladb/scylladb:
  pytest/topology: check snapshot transfer
  raft conf error injection for snapshot
  test/pylib: one-shot error injection helper
2023-02-10 15:24:46 +01:00
Kamil Braun
540f6d9b78 test/cql-pytest: test_cdc: regression test for #12098
Perform multiple LWT inserts to different keys ensuring none of them
observes a preimage.

On my machine this test reproduces the problem more than 50% of the time
in debug mode.
2023-02-10 14:35:49 +01:00
Avi Kivity
9696ab7fae cql3: expr: change evaluate_binop_sides to return more NULL information
Currently, evaluate_binop_sides() returns std::nullopt if either
side is NULL.

Since we wish to to add binary operators that do consider NULL on
each side, make evaluate_binop_sides return the original NULLs
instead (as managed_bytes_opt).

Utimately I think evaluate_binop_sides() should disappear, but before
that we have to improve unset value checking.
2023-02-10 09:45:35 +02:00
Botond Dénes
423df263f5 Merge 'Sanitize with_sstable_directory() helper in tests' from Pavel Emelyanov
The helping wrapper facilitates the usage of sharded<sstable_directory> for several test cases and the helper and its callers had deserved some cleanup over time.

Closes #12791

* github.com:scylladb/scylladb:
  sstable_directory_test: Reindent and de-multiline
  sstable_directory_test: Enlighten and rename sstable_from_existing_file
  sstable_directory_test: Remove constant parallelizm parameter
2023-02-10 07:11:38 +02:00
Tomasz Grabiec
402d5fd7e3 cache: Fix empty partition entries being left in cache in some cases
Merging rows from different partition versions should preserve the LRU link of
the entry from the newer version. We need this in case we're merging two last
dummy entries where the older dummy is already unlinked from the LRU. The
newer dummy could be the last entry which is still holding the partition
entry linked in the LRU.

The mutation_partition_v2 merging didn't take the LRU link from the newer
entry, and we could end up with the partition entry not having any entries
linked in the LRU.

Introduced in f73e2c992f.

Fixes #12778

Closes #12785
2023-02-09 23:03:23 +02:00
Kamil Braun
e2064f4762 Merge 'repair: finish repair immediately on local keyspaces' from Aleksandra Martyniuk
System keyspace is a keyspace with local replication strategy and thus
it does not need to be repaired. It is possible to invoke repair
of this keyspace through the api, which leads to runtime error since
peer_events and scylla_table_schema_history have different sharding logic.

For keyspaces with local replication strategy repair_service::do_repair_start
returns immediately.

Closes #12459

* github.com:scylladb/scylladb:
  test: rest_api: check if repair of system keyspace returns before corresponding task is created
  repair: finish repair immediately on local keyspaces
2023-02-09 18:44:37 +01:00
Pavel Emelyanov
52e2ad051e sstable_utils: Move the test_setup to perf/
The sstable perf test uses test_setup ability to create temporary
directory and clean it and that's the only place that uses it. Move the
remainders of test_setup into perf/ so that no unit tests attempt to
re-use it (there's test_env for that).

Remove unused _walker and _create_directory while at it.
Mark protected stuff private while at it as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-09 17:18:04 +03:00
Pavel Emelyanov
868391a613 sstable_utils: Remove unused wrappers over test_env
Now all callers are using the test_env directly

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-09 17:17:48 +03:00
Pavel Emelyanov
47022bf750 sstable_test: Open-code do_with_cloned_tmp_directory
The statistics_rewrite case uses the helper that creates a copy of the
provided static directory, but it's the only user of this helper. It's
better to open-code it into the test case.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-09 17:17:48 +03:00
Pavel Emelyanov
19c1afb20a sstable_test: Asynchronize statistics_rewrite case
It is ran inside async context and can be coded in a shorter form
without using deeply nested then-s

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-09 17:17:23 +03:00
Pavel Emelyanov
85b8bae035 tests: Replace test_setup::do_with_tmp_directory with test_env::do_with(_async)?
The former helper is just a wrapper over the _async version of the
latter and also creates a tempdir and calls the fn with tempdir as an
argument. The test_env already has its own temp dir on board, so callers
can can be switched to using it.

Some test cases use the do_with_tmp_directory but generate chain of
futures without in fact using the async context. This patch addresses
that, so the change is not 100% mechanical unfortunately.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-09 17:11:31 +03:00
Anna Stuchlik
9f2724231c doc: add the new KB to the list of topics 2023-02-09 14:42:09 +01:00
Anna Stuchlik
cfdb8a8760 doc: add a new KB article about timbstone garbage collection in ICS 2023-02-09 14:36:06 +01:00
Pavel Emelyanov
f0212c7b68 sstable_directory_test: Reindent and de-multiline
Many tests using sstable directory wrapper have broken indentation with
previous patching. Fix it. No functional changes.

Also, while at it, convert multiline wrapper calls into one-line, after
previous patch these are short enough for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-09 16:00:53 +03:00
Pavel Emelyanov
ec02b0f706 sstable_directory_test: Enlighten and rename sstable_from_existing_file
It used to be the sstable maker for sstable::test_env / cql_test_env,
now sstables for tests are made via sstables manager explicitly, so the
guy can be remaned to something more relevant to its current status.

Also, de-mark its constructors as explicit to make callers look shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-09 15:59:23 +03:00
Pavel Emelyanov
c843f7937b sstable_directory_test: Remove constant parallelizm parameter
It's 1 (one) all the time, just hard-code it internally

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-09 15:59:01 +03:00
Avi Kivity
fd4ee4878a Revert "storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops"
This reverts commit e7d5e508bc. It ends up
failing continuous integration tests randomly. We don't know if it's
uncovering an existing bug, or if RBNO itself is broken, but for now we
need to revert it to unblock progress.
2023-02-09 10:30:26 +02:00
Botond Dénes
b62d84fdba Merge 'Keep reshape and reshard logic in distributed loader' from Pavel Emelyanov
Now it's scattered between dist. loader and sstable directory code making the latter quite bloated. Keeping everything in distributed loader makes the sstable_directory code compact and easier to patch to support object storage backend.

Closes #12771

* github.com:scylladb/scylladb:
  sstable_directory: Rename remove_input_sstables_from_reshaping()
  sstable_directory: Make use of remove_sstables() helper
  sstable_directory: Merge output sstables collecting methods
  distributed_loader: Remove max_compaction_threshold argument from reshard()
  distributed_loader: Remove compaction_manager& argument from reshard()
  sstable_directory: Move the .reshard() to distributed_loader
  sstable_directory: Add helper to load foreign sstable
  sstable_directory: Add io-prio argument to .reshard()
  sstable_directory: Move reshard() to distributed_loader.cc
  distributed_loader: Remove compaction_manager& argument from reshape()
  sstable_directory: Move the .reshape() to distributed loader
  sstable_directory: Add helper to retrive local sstables
  sstable_directory: Add io-prio argument to .reshape()
  sstable_directory: Move reshape() to distributed_loader.cc
2023-02-09 10:01:44 +02:00
Botond Dénes
1c333e2102 Merge 'Transport server error handling fixes' from Gusev Petr
CQL transport sever error handling fixes and improvements:
  * log failed requests with `DEBUG` level for easier debugging;
  * in case of unhandled errors, deliver them to the client as `SERVER_ERROR`'s
  * fix for `protocol_error`'s in case of shedded big requests;
  * explicit tests have been written for the error handling problems above.

Closes #11949

* github.com:scylladb/scylladb:
  transport server: fix "request size too large" handling
  transport server: log failed requests with debug level
  transport server: fix unexpected server errors handling
  transport server: log client errors with debug level
2023-02-09 09:02:22 +02:00
Anna Stuchlik
c7778dd30b doc: related https://github.com/scylladb/scylladb/issues/12754, add the requirement to upgrade Monitoring to version 4.3
Closes #12784
2023-02-09 07:10:34 +02:00
Botond Dénes
746b009db0 Merge 'dist/debian: bump up debhelper compatibility level to 10 and cleanups' from Kefu Chai
- dist/debian: bump up debhelper compatibility level to 10
- dist/debian: drop unused Makefile variable

Closes #12723

* github.com:scylladb/scylladb:
  dist/debian: drop unused Makefile variable
  dist/debian: bump up debhelper compatibility level to 10
2023-02-09 07:04:20 +02:00
Pavel Emelyanov
40de737b36 sstable_directory: Rename remove_input_sstables_from_reshaping()
It unlinks unshared sstables filtering some of them out. Name it
according to what it does without mentioning reshape/reshard.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-08 15:00:44 +03:00
Pavel Emelyanov
a1dc251214 sstable_directory: Make use of remove_sstables() helper
Currently it's called remove_input_sstables_from_resharding() but it's
just unlinks sstables in parallel from the given list. So rename it not
to mention reshard and also make use of this "new" helper in the
remove_input_sstables_from_reshaping(), it needs exactly the same
functionality.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-08 15:00:44 +03:00
Pavel Emelyanov
cb36f5e581 sstable_directory: Merge output sstables collecting methods
There are two of them collecting sstables from resharding and reshaping.
Both doing the same job except for the latter doesn't expect the list to
contain remote sstables.

This patch merges them together with the help of an extra sanity boolean
to check for the remote sstable not in the list. And renames the method
not to mention reshape/reshard.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-08 15:00:41 +03:00
Avi Kivity
0f15ff740d cql3: expr: simplify user/debug formatting
We have a cql3::expr::expression::printer wrapper that annotates
an expression with a debug_mode boolean prior to formatting. The
fmt library, however, provides a much simpler alterantive: a custom
format specifier. With this, we can write format("{:user}", expr) for
user-oriented prints, or format("{:debug}", expr) for debug-oriented
prints (if nothing is specified, the default remains debug).

This is done by implementing fmt::formatter::parse() for the
expression type, can using expression::printer internally.

Since sometimes we pass expression element types rather than
the expression variant, we also provide a custom formatter for all
ExpressionElement Types.

Uses for expression::printer are updated to use the nicer syntax. In
one place we eliminate a temporary that is no longer needed since
ExpressionElement:s can be formatted directly.

Closes #12702
2023-02-08 12:24:58 +02:00
Petr Gusev
3263523b54 transport server: fix "request size too large" handling
Calling _read_buf.close() doesn't imply eof(), some data
may have already been read into kernel or client buffers
and will be returned next time read() is called.
When the _server._max_request_size limit was exceeded
and the _read_buf was closed, the process_request method
finished and we started processing the next request in
connection::process. The unread data from _read_buf was
treated as the header of the next request frame, resulting
in "Invalid or unsupported protocol version" error.

The existing test_shed_too_large_request was adjusted.
It was originally written with the assumption that the data
of a large query would simply be dropped from the socket
and the connection could be used to handle the
next requests. This behaviour was changed in scylladb#8800,
now the connection is closed on the Scylla side and
can no longer be used. To check there are no errors
in this case, we use Scylla metrics, getting them
from the Scylla Prometheus API.
2023-02-08 00:07:08 +04:00
Petr Gusev
0904f98ebf transport server: log failed requests with debug level
These logs can be helpful for debugging, e.g. if an error
was not handled correctly by the client driver, or another
error occurred while handling it.
2023-02-08 00:07:08 +04:00
Petr Gusev
a4cf509c3d transport server: fix unexpected server errors handling
If request processing ended with an error, it is worth
sending the error to the client through
make_error/write_response. Previously in this case we
just wrote a message to the log and didn't handle the
client connection in any way. As a result, the only
thing the client got in this case was timeout error.

A new test_batch_with_error is added. It is quite
difficult to reproduce error condition in a test,
so we use error injection instead. Passing injection_key
in the body of the request ensures that the exception
will be thrown only for this test request and
will not affect other requests that
the driver may send in the background.

Closes: scylladb#12104
2023-02-08 00:07:02 +04:00
Pavel Emelyanov
73d458cf89 distributed_loader: Remove max_compaction_threshold argument from reshard()
Since the whole reshard() is local to dist. loader code now, the caller
of the reshard helper may let this method get the threshold itself

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:31:43 +03:00
Pavel Emelyanov
25aaa45256 distributed_loader: Remove compaction_manager& argument from reshard()
It can be obtained from the table&

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:31:43 +03:00
Pavel Emelyanov
15547f1b5b sstable_directory: Move the .reshard() to distributed_loader
Now all the reshading logic is accumulated in distributed loader and the
sstable_directory is just the place where sstables are collected.

The changes summary is:
- add sstable_directory as argument (used to be "this")
- replace all "this" captures with &dir ones
- remove temporary namespace gap and declaration from sst-dir class

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:31:43 +03:00
Pavel Emelyanov
ab5f48d496 sstable_directory: Add helper to load foreign sstable
This is to generalize the code duplication between .reshard() and
existing .load_foreign_sstables() (plural form).

Make it coroutinized right at once.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:31:43 +03:00
Pavel Emelyanov
e6e65c87d5 sstable_directory: Add io-prio argument to .reshard()
Now it gets one from this-> but the method is becoming static one in
distributed_loader which only has it as an argument. That's not big deal
as the current IO class is going to be derived from current sched group,
so this extra arg will go away at all some day.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:31:41 +03:00
Pavel Emelyanov
a32d2b6d6a sstable_directory: Move reshard() to distributed_loader.cc
Just move the code and create temporary namespace gap for that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:31:12 +03:00
Pavel Emelyanov
1de8c85acd distributed_loader: Remove compaction_manager& argument from reshape()
It can be obtained from the table&

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:31:12 +03:00
Pavel Emelyanov
d734b6b7c1 sstable_directory: Move the .reshape() to distributed loader
Now all the reshaping logic is accumulated in distributed loader and the
sstable_directory is just the place where sstables are collected.

The changes summary is:
- add sstable_directory as argument (used to be "this")
- replace all "this" captures with &dir ones
- remove temporary namespace gap and declaration from sst-dir class

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:30:55 +03:00
Pavel Emelyanov
b906d34807 sstable_directory: Add helper to retrive local sstables
There are methods to retrive shared local sstables and foreign sstables,
so here's one more to the family

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:23:40 +03:00
Pavel Emelyanov
420fc8d4df sstable_directory: Add io-prio argument to .reshape()
Now it gets one from this-> but the method is becoming static one in
distributed_loader which only has it as an argument. That's not big deal
as the current IO class is going to be derived from current sched group,
so this extra arg will go away at all some day.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:22:27 +03:00
Pavel Emelyanov
a70d6017f8 sstable_directory: Move reshape() to distributed_loader.cc
Just move the code and create temporary namespace gap for that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:21:54 +03:00
Kamil Braun
97b2971bf1 test/cql: cdc_with_lwt_test: fix comment
The comment mentioned an entry that shouldn't be there (and it wasn't in
the actual expected result).
2023-02-07 16:12:18 +01:00
Kamil Braun
1ef113691a service: storage_proxy: sequence CDC preimage select with Paxos learn
`paxos_response_handler::learn_decision` was calling
`cdc_service::augment_mutation_call` concurrently with
`storage_proxy::mutate_internal`. `augment_mutation_call` was selecting
rows from the base table in order to create the preimage, while
`mutate_internal` was writing rows to the table. It was therefore
possible for the preimage to observe the update that it accompanied,
which doesn't make any sense, because the preimage is supposed to show
the state before the update.

Fix this by performing the operations sequentially. We can still perform
the CDC mutation write concurrently with the base mutation write.

`cdc_with_lwt_test` was sometimes failing in debug mode due to this bug
and was marked flaky. Unmark it.

Fixes #12098
2023-02-07 16:12:18 +01:00
Alejo Sanchez
cf3b8d7edc pytest/topology: check snapshot transfer
Test snapshot transfer by reducing the snapshot threshold on initial
servers (3 and 1 trailing).

Then creates a table, and does 3 extra schema changes (add column),
triggering at least 2 snapshots.

Then brings a new server to the cluster, which will get the schema
through a snapshot.

Then the test stops the initial servers and verifies the table
schema is up to date on the new server.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-02-07 16:09:07 +01:00
Petr Gusev
95bf8eebe0 query_ranges_to_vnodes_generator: fix for exclusive boundaries
Let the initial range passed to query_partition_key_range
be [1, 2) where 2 is the successor of 1 in terms
of ring_position order and 1 is equal to vnode.
Then query_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)],
so we get an empty range (1,2) and subsequently will
make a data request with this empty range in
storage_proxy::query_partition_key_range_concurrent,
which will be redundant.

The patch adds a check for this condition after
making a split in the main loop in process_one_range.

The patch does not attempt to handle cases where the
original ranges were empty, since this check is the
responsibility of the caller. We only take care
not to add empty ranges to the result as an
unintentional artifact of the algorithm in
query_ranges_to_vnodes_generator.

A test case is added in test_get_restricted_ranges.
The helper lambda check is changed so that not to limit
the number of ranges to the length of expected
ranges, otherwise this check passes without
the change in process_one_range.

Fixes: #12566

Closes #12755
2023-02-07 16:02:31 +02:00
Kefu Chai
afd1221b53 commitlog: mark request_controller_timeout_exception_factory::timeout() noexcept
request_controller_timeout_exception_factory::timeout() creates an
instance of `request_controller_timed_out_error` whose ctor is
default-created by compiler from that of timed_out_error, which is
in turn default-created from the one of `std::exception`. and
`std::exception::exception` does not throw. so it's safe to
mark this factory method `noexcept`.

with this specifier, we don't need to worry about the exception thrown
by it, and don't need to handle them if any in `seastar::semaphore`,
where `timeout()` is called for the customized exception.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12759
2023-02-07 14:38:54 +02:00
Botond Dénes
051da4e148 Merge 'Handle EDQUOT error just like ENOSPC' from Kefu Chai
- main: consider EDQUOT as environmental failure also
- main: use defer_verbose_shutdown() to shutdown compaction manager
- replica/table: extract should_retry() int with_retry
- replica/table: retry on EDQUOT when flushing memtable

Fixes #12626

Closes #12653

* github.com:scylladb/scylladb:
  replica/table: retry on EDQUOT when flushing memtable
  replica/table: extract should_retry() int with_retry
  main: use defer_verbose_shutdown() to shutdown compaction manager
  main: consider EDQUOT as environmental failure also
2023-02-07 14:38:36 +02:00
David Garcia
734f09aba7 docs: add flags support in mulitversion
Closes #12740
2023-02-07 14:23:53 +02:00
Wojciech Mitros
02bfac0c66 uda: change the UDF used in a UDA if it's replaced
Currently, if a UDA uses a UDF that's being replaced,
the UDA will still keep using the old UDF until the
node is restarted.
This patch fixes this behavior by checking all UDAs
when replacing a UDF and updating them if necessary.

Fixes #12709
2023-02-07 12:17:52 +01:00
Nadav Har'El
3ba011c2be cql: fix empty aggregation, and add more tests
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715
2023-02-07 12:28:42 +02:00
Botond Dénes
bf7113f6dc Merge 'locator: token_metadata: improve get_address_ranges()' from Michał Chojnowski
This two-patch series aims to improve `get_address_ranges()` by eliminating cases of quadratic behavior
which were noticed to cause huge allocations, and by deduplicating the code of `get_address_ranges()`
with the almost-identical `get_ranges()`.

Refs https://github.com/scylladb/scylladb/issues/10337
Refs https://github.com/scylladb/scylladb/issues/10817
Refs https://github.com/scylladb/scylladb/issues/10836
Refs https://github.com/scylladb/scylladb/issues/10837
Fixes https://github.com/scylladb/scylladb/issues/12724

Closes #12733

* github.com:scylladb/scylladb:
  locator: token_metadata: unify get_address_ranges() and get_ranges()
  locator: token_metadata: get rid of a quadratic behaviour in get_address_ranges()
2023-02-07 12:28:41 +02:00
Botond Dénes
a01662b287 Merge 'doc: improve the general upgrade policy' from Anna Stuchlik
Related: https://github.com/scylladb/scylladb/pull/12586

This PR improves the upgrade policy added with https://github.com/scylladb/scylladb/pull/12586, according to the feedback from:

@tzach
> Upgrading from 4.6 to 5.0 is not clear; better to use 4.x to 4.y versions as an example.

and @bhalevy
> It is not completely clear that when upgrading through several versions, the whole cluster needs to be upgraded to each consecutive version, not just the rolling node.

In addition, the content is organized into sections for the sake of readability.

Closes #12647

* github.com:scylladb/scylladb:
  doc: add the information abou patch releases
  doc: add the info about the minor versions
  doc: reorganize the content on the Upgrade ScyllaDB page
  doc: improve the overview of the upgrade procedure (apply feedback)
2023-02-07 12:28:41 +02:00
Nadav Har'El
c00fcc80e5 test/cql-pytest: three tests for empty clustering keys
This patch adds three additional tests for empty (e.g., empty string)
clustering keys.

The first test disproves a worry that was raised in #12561 that perhaps
empty clustering keys only seem work, but they don't get written to
sstables. The new test verifies that there is no bug - they are written
and can be read correctly.

The second and third test reproduce issue #12749 - an empty clustering
should be allowed in a COMPACT STORAGE table only if there is a compound
(multi-column) clustering key. But as the tests demonstrate, 1. if there
is just one clustering column, Scylla gives the wrong error message, and
2. if there is a compound clustering key, Scylla doesn't allow an empty
key as it should.

As usual, all tests pass on Cassandra. The last two tests fail on
Scylla, so are marked xfail.

Refs #12561
Refs #12749

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12750
2023-02-07 12:28:41 +02:00
Petr Gusev
bd80a449d5 transport server: log client errors with debug level
Ideally, these errors should be transparently delivered
to the client, but in practice, due to various
flaws/bugs in scylla and/or the driver,
they can be lost, which enormously complicates troubleshooting.

const socket_address& get_remote_address() is needed for its
convenient conversion to string, which includes ip and port.
2023-02-07 13:53:38 +04:00
Wojciech Mitros
58987215dc functions: add helper same_signature method
When deciding whether two functions have the same
signature, we have to check if they have the same name
and parameter types. Additionally, if they're represented
by pointers, we need to check if any of them is a nullptr.
This logic is used multiple times, so it's extracted to
a separate function.
To use this function, the `used_by_user_aggregate` method
takes now a function instead of name and types list - we
can do it because we always use it with an existing user
function (that we're trying to drop).
The method will also be useful when we'll be not dropping,
but replacing a user function.
2023-02-07 10:15:12 +01:00
Wojciech Mitros
20069372e7 uda: return aggregate functions as shared pointers
We will want to reuse the functions that we get from an aggregate
without making a deep copy, and it's only possible if we get
pointers from the aggregate instead of actual values.
2023-02-07 10:15:09 +01:00
Kefu Chai
bba03c1a55 replica/table: retry on EDQUOT when flushing memtable
retry when memtable flush fails due to EDQUOT.

there are chances that user exceeds the disk quota when scylla flushes
memtable and user manages to free up the necessary resource before the
next retry.

before this change, we simply `abort()` in this case.

after this change, we just keep on retrying until the service is shutdown.

Fixes #12626
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-07 16:00:40 +08:00
Kefu Chai
6d017e75e0 replica/table: extract should_retry() int with_retry
* extract a lambda encapsulating the condition if we should retry
  at seeing an exception when calling functions with `with_retry()`.

we apply the same check to the exception raised when performing
table related i/o operations. in this change, the two checks are
consolidated and extracted into a single lambda, so we can add
yet more error code (s) which should be considered retry-able
failures.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-07 16:00:40 +08:00
Kefu Chai
d4315245a1 main: use defer_verbose_shutdown() to shutdown compaction manager
* use `defer_verbose_shutdown()` to shutdown compaction manager

`EDQUOT` is quite similar as `ENOSPC`, in the sense that both of them
are caused by environmental issues.

before this change, `compaction_manager` filters the
ENOSPC exceptions thrown by `compaction_manager::really_do_stop()`,
so they are not propagated to caller when calling
`compaction_manager::stop()` -- only a warning message is printed
in the log. but `EDQUOT` is not handled.

after this change, the exception raised by compaction manager's
stop process is not filtered anymore and is handled by
`defer_verbose_shutdown()` instead, which is able to check the
type of exception, and print out error message in the log. so
the `ENOSPC` and `EDQUOT` errors are taken care of, and more
visible from user's perspective as they are printed as errors
instead of warning. but they are not printed using the
`compaction_manager` logger anymore. so if our testing or user's
workflow depends on this behavior, the related setting should be
updated accordingly.

Fixes #12626
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-07 16:00:40 +08:00
Kefu Chai
c3ef353e3d main: consider EDQUOT as environmental failure also
EDQUOT can be returned as the errno when the underlying filesystem
is trying to reserve necessary resources from disk for performing
i/o on behalf of the effective user, and the filesystem fails to
acquire the necessary resources. it could be inode, volume space,
or whatever resources for completing the i/o operation. but none
of them is the consequence of scylla's fault. so we should not
`abort()` at seeing this errno. instead, it's should be reported
to the administrator.

in this change, EDQUOT is also considered as an environmental
failure just like EIO, EACCES and ENOSPC. they could be thrown
when stopping an server.

Fixes #12626
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-07 16:00:40 +08:00
Tomasz Grabiec
ccc8e47db1 Merge 'test/lib: introduce key_utils.hh' from Botond Dénes
We currently have two method families to generate partition keys:
* make_keys() in test/lib/simple_schema.hh
* token_generation_for_shard() in test/lib/sstable_utils.hh

Both work only for schemas with a single partition key column of `text` type and both generate keys of fixed size.
This is very restrictive and simplistic. Tests, which wanted anything more complicated than that had to rely on open-coded key generation.
Also, many tests started to rely on the simplistic nature of these keys, in particular two tests started failing because the new key generation method generated keys of varying size:
* sstable_compaction_test.sstable_run_based_compaction_test
* sstable_mutation_test.test_key_count_estimation

These two tests seems to depend on generated keys all being of the same size. This makes some sense in the case of the key count estimation test, but makes no sense at all to me in the case of the sstable run test.

Closes #12657

* github.com:scylladb/scylladb:
  test/lib/sstable_utils: remove now unused token_generation_for_shard() and friends
  test/lib/simple_schema: remove now unused make_keys() and friends
  test: migrate to tests::generate_partition_key[s]()
  test/lib/test_services: add table_for_tests::make_default_schema()
  test/lib: add key_utils.hh
  test/lib/random_schema.hh: value_generator: add min_size_in_bytes
2023-02-06 18:11:32 +01:00
Nadav Har'El
cc207a9f44 Merge 'uda: improve checking whether UDFs are used in UDAs in DROP statements' from Wojciech Mitros
This patch fixes 2 issues with checking whether UDFs are used in UDAs:
1) UDFs types are not considered during the check, which prevents us from dropping a UDF that isn't used in any UDAs, but shares its name with one of them.
2) the REDUCEFUNC is not considered during the check, which allows dropping a UDF even though it's used in a UDA as the REDUCEFUNC.

Additionally, tests for these issues are added

Closes #12681

* github.com:scylladb/scylladb:
  udf: also check reducefunc to confirm that a UDF is not used in a UDA
  udf: fix dropping UDFs that share names with other UDFs used in UDAs
  pytest: add optional argument for new_function argument types
2023-02-06 19:07:26 +02:00
Kamil Braun
56c4d246ef Merge 'Introduce recent_entries_map datatype to track least recent visited entries.' from Andrii Patsula
Fixes: https://github.com/scylladb/scylladb/issues/12309

Closes #12720

* github.com:scylladb/scylladb:
  service/raft: raft_group_registry: use recent_entries_map to store rate_limits in pinger. Fixes #12309
  utils: introduce recent_entries_map datatype to track least recent visited entries.
2023-02-06 18:01:26 +01:00
Botond Dénes
a3b280ba8c Merge 'doc: document the workaround to install a non-latest ScyllaDB version' from Anna Stuchlik
This PR is related to https://github.com/scylladb/scylla-enterprise/issues/2176.
It adds a FAQ about a workaround to install a ScyllaDB version that is not the most recent patch version.
In addition, the link to that FAQ is added to the patch upgrade guides 2021 and 2022 .

Closes #12660

* github.com:scylladb/scylladb:
  doc: add the missing sudo command
  doc: replace the reduntant link with an alternative way to install a non-latest version
  doc: add the link to the FAQ about pinning to the patch upgrade guides 2022 and 2022
  doc: add a FAQ with a workaround to install a non-latest ScyllaDB version on Debian and Ubuntu
2023-02-06 17:00:39 +02:00
Kefu Chai
d0a2440023 docs: bump sphinx-sitemap to 2.5.0
`poetry install` consistently times out when resolving the
dependencies. like:

```
  Command ['/home/kefu/.cache/pypoetry/virtualenvs/scylla-1fWQLpOv-py3.9/bin/python', '-m', 'pip', 'install', '--use-pep517', '--disable-pip-version-check', '--isolated', '--no-input', '--prefix', '/home/kefu/.cache/pypoetry/virtualenvs
/scylla-1fWQLpOv-py3.9', '--upgrade', '--no-deps', '/home/kefu/.cache/pypoetry/artifacts/e6/ad/ab/eca9f61c5b15fd05df7192c0e5914a9e5ac927744b1fb5f6c07a92d7a4/sphinx-sitemap-2.2.0.tar.gz'] errored with the following return code 1, and out
put:
  Processing /home/kefu/.cache/pypoetry/artifacts/e6/ad/ab/eca9f61c5b15fd05df7192c0e5914a9e5ac927744b1fb5f6c07a92d7a4/sphinx-sitemap-2.2.0.tar.gz
    Installing build dependencies: started
    Installing build dependencies: finished with status 'error'
    ERROR: Command errored out with exit status 2:
     command: /home/kefu/.cache/pypoetry/virtualenvs/scylla-1fWQLpOv-py3.9/bin/python /tmp/pip-standalone-pip-z97s216j/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-37k3lwqd/overlay --no-warn-scrip
t-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel
         cwd: None
    Complete output (80 lines):
    Collecting setuptools>=40.8.0
      Downloading setuptools-67.1.0-py3-none-any.whl (1.1 MB)
    ERROR: Exception:
    Traceback (most recent call last):
      File "/tmp/pip-standalone-pip-z97s216j/__env_pip__.zip/pip/_vendor/urllib3/response.py", line 438, in _error_catcher
        yield
      File "/tmp/pip-standalone-pip-z97s216j/__env_pip__.zip/pip/_vendor/urllib3/response.py", line 519, in read
        data = self._fp.read(amt) if not fp_closed else b""
      File "/tmp/pip-standalone-pip-z97s216j/__env_pip__.zip/pip/_vendor/cachecontrol/filewrapper.py", line 62, in read
        data = self.__fp.read(amt)
      File "/usr/lib64/python3.9/http/client.py", line 463, in read
        n = self.readinto(b)
      File "/usr/lib64/python3.9/http/client.py", line 507, in readinto
        n = self.fp.readinto(b)
      File "/usr/lib64/python3.9/socket.py", line 704, in readinto
        return self._sock.recv_into(b)
      File "/usr/lib64/python3.9/ssl.py", line 1242, in recv_into
        return self.read(nbytes, buffer)
      File "/usr/lib64/python3.9/ssl.py", line 1100, in read
        return self._sslobj.read(len, buffer)
    socket.timeout: The read operation timed out
```

while sphinx-sitemap 2.5.0 installs without problems. sphinx-sitemap
2.50 is the latest version published to pypi.

according to sphinx-sitemap's changelog at
https://github.com/jdillard/sphinx-sitemap/blob/master/CHANGELOG.rst ,
no breaking changes were introduced in between 2.2.0 and 2.5.0.

after bumping sphinx-sitemap 2.5.0, following commands can complete
without errors:
```
poetry lock
make preview
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12705
2023-02-06 15:50:48 +02:00
Anna Stuchlik
c772563cb8 doc: add the information abou patch releases 2023-02-06 14:47:39 +01:00
Botond Dénes
cb2a129371 Merge 'Fix inefficiency when rebuilding table statistics with compaction groups' from Raphael "Raph" Carvalho
[table: Fix disk-space related metrics](529a1239a9) fixes the table's disk space  related metrics.

whereas second patch fixes an inefficiency when computing statistics which can be triggered with multiple compaction groups.

Closes #12718

* github.com:scylladb/scylladb:
  table: Fix inefficiency when rebuilding statistics with compaction groups
  table: Fix disk-space related metrics
2023-02-06 15:11:48 +02:00
Avi Kivity
6bc5536bd8 Revert "Update seastar submodule"
This reverts commit b4559a6992. It breaks
some raft tests.

Fixes #12741.
2023-02-06 14:56:44 +02:00
Botond Dénes
5a9f75aac6 Update tools/java submodule
* tools/java 1c4e1e7a7d...f0bab7af66 (1):
  > Fix port option in SSTableLoader
2023-02-06 14:18:52 +02:00
Wojciech Mitros
ef1dac813b udf: also check reducefunc to confirm that a UDF is not used in a UDA
When dropping a UDF we're checking if it's not begin used in any UDAs
and fail otherwise. However, we're only checking its state function
and final function, and it may also be used as its reduce function.
This patch adds the missing checks and a test for them.
2023-02-06 13:02:54 +01:00
Wojciech Mitros
49077dd144 udf: fix dropping UDFs that share names with other UDFs used in UDAs
Currently, when dropping a function, we only check if there exist
an aggregate that uses a function with the same name as its state
function or final function. This may cause the drop to fail even
when it's just another UDF with the same name that's used in the
aggregate, even when the actual dropped function is not used there.
This patch fixes this by checking whether not only the name of the
UDA's sfunc and finalfunc, but also their argument types.
2023-02-06 13:02:53 +01:00
Wojciech Mitros
8791b0faf5 pytest: add optional argument for new_function argument types
When multiple functions with the same name but different argument types
are created, the default drop statement for these functions will fail
because it does not include the argument types.
With this change, this problem can be worked around by specifying
argument types when creating the function, as this will cause the drop
statement to include them.
2023-02-06 13:02:19 +01:00
Botond Dénes
8efa9b0904 Merge 'Avoid qctx from view-builder methods of system_keyspace' from Pavel Emelyanov
The system_keyspace defines several auxiliary methods to help view_builder update system.scylla_views_builds_in_progress and system.built_views tables. All use global qctx thing.

It only takes adding view_builder -> system_keyspace dependency in order to de-static all those helpers and let them use query-processor from it, not the qctx.

Closes #12728

* github.com:scylladb/scylladb:
  system_keysace: De-static calls that update view-building tables
  storage_service: Coroutinize mark_existing_views_as_built()
  api: Unset column_famliy endpoints
  api: Carry sharded<db::system_keyspace> reference over
  view_builder: Add system_keyspace dependency
2023-02-06 12:44:40 +02:00
Botond Dénes
e247e15ec1 Merge 'Method to create and start task manager task' from Aleksandra Martyniuk
In most cases, tasks manager's tasks are started just after they are
created. Thus, to reduce boilerplate required for creating and starting
tasks, tasks::task_manager::module::make_and_start_task method is added.

Repair tasks are modified to use the method where possible.

Closes #12729

* github.com:scylladb/scylladb:
  repair: use tasks::task_manager::module::make_and_start_task for repair tasks
  tasks: add task_manager::module::make_and_start_task method
2023-02-06 12:38:35 +02:00
Yaniv Kaul
9039b94790 docs: dev - how to test your tests documentation
Short paragraph on how to develop tests and ensure they are solid.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes #12746
2023-02-06 12:07:43 +02:00
Avi Kivity
1e6cc9ca61 Merge 'storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops' from Asias He
Since 97bb2e47ff (storage_service: Enable Repair Based Node Operations (RBNO) by default for replace), RBNO was enabled by default for replace ops.

After more testing, we decided to enable repair based node operations by default for all node operations.

Closes #12173

* github.com:scylladb/scylladb:
  storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops
  test: Increase START_TIMEOUT
  test: Increase max-networking-io-control-blocks
  storage_service: Check node has left in node_ops_cmd::decommission_done
  repair: Use remote dc neighbors for everywhere strategy
2023-02-06 10:42:52 +02:00
Botond Dénes
511c0123a2 Merge 'Add compaction module to task manager' from Aleksandra Martyniuk
Introduces task manager's compaction module. That's an initial
part of integration of compaction with task manager.

When fully integrated, task manager will allow user to track compaction
operations, check status and progress of each individual one. It will help
with creating an asynchronous version of rest api that forces any compaction.

Currently, users can see with /task_manager/list_modules api call that
compaction is one of the modules accessible through task manager.
They won't get any additional information though, since compaction
tasks are not created yet.

A shared_ptr to compaction module is kept in compaction manager.

Closes #12635

* github.com:scylladb/scylladb:
  compaction: test: pass task_manager to compaction_manager in test environment
  compaction: create and register task manager's module for compaction
  tasks: add task_manager constructor without arguments
2023-02-06 09:25:05 +02:00
Botond Dénes
cdd8b0fa35 Merge 'SSTable set improvements' from Raphael "Raph" Carvalho
Makes sstable_set::all() interface robust, and introduces sstable_set::size() to avoid copies when retrieving set size.

Closes #12716

* github.com:scylladb/scylladb:
  treewide: Use new sstable_set::size() wherever possible
  sstables: Introduce sstable_set::size()
  sstables: Fix fragility of sstable_set::all() interface
2023-02-06 08:24:00 +02:00
Avi Kivity
f73e2c992f Merge 'Keep range tombstones with rows in memtables and cache' from Tomasz Grabiec
This series switches memtable and cache to use a new representation for mutation data,
called `mutation_partition_v2`. In this representation, range tombstone information is stored
in the same tree as rows, attached to row entries. Each entry has a new tombstone field,
which represents range tombstone part which applies to the interval between this entry and
the previous one. See docs/dev/mvcc.md for more details about the format.

The transient mutation object still uses the old model in order to avoid work needed to adapt
old code to the new model. It may also be a good idea to live with two models, since the
transient mutation has different requirements and thus different trade-offs can be made.
Transient mutation doesn't need to support eviction and strong exception guarantees,
so its algorithms and in-memory representation can be simpler.

This allows us to incrementally evict range tombstone information. Before this series,
range tombstones were accumulated and evicted only when the whole partition entry was evicted. This
could lead to inefficient use of cache memory.

Another advantage of the new representation is that reads don't have to lookup
range tombstone information in a different tree while reading. This leads to simpler
and more efficient readers.

There are several disadvantages too. Firstly, rows_entry is now larger by 16 bytes.
Secondly, update algorithms are more complex because they need to deoverlap range tombstone
information. Also, to handle preemption and provide strong exception guarantees, update
algorithms may need to allocate sentinel entries, which adds complexity and reduces performance.

The memtable reader was changed to use the same cursor implementation
which cache uses, for improved code reuse and reducing risk of bugs
due to discrepancy of algorithms which deal with MVCC.

Remaining work:
  - performance optimizations to apply_monotonically() to avoid regressions
  - performance testing
  - preemption support in apply_to_incomplete (cache update from memtable)

Fixes #2578
Fixes #3288
Fixes #10587

Closes #12048

* github.com:scylladb/scylladb:
  test: mvcc: Extend some scenarios with exhaustive consistency checks on eviction
  test: mvcc: Extract mvcc_container::allocate_in_region()
  row_cache, lru: Introduce evict_shallow()
  test: mvcc: Avoid copies of mutation under failure injection
  test: mvcc: Add missing logalloc::reclaim_lock to test_apply_is_atomic
  mutation_partition_v2: Avoid full scan when applying mutation to non-evictable
  Pass is_evictable to apply()
  tests: mutation_partition_v2: Introduce test_external_memory_usage_v2 mirroring the test for v1
  tests: mutation: Fix test_external_memory_usage() to not measure mutation object footprint
  tests: mutation_partition_v2: Add test for exception safety of mutation merging
  tests: Add tests for the mutation_partition_v2 model
  mutation_partition_v2: Implement compact()
  cache_tracker: Extract insert(mutation_partition_v2&)
  mvcc, mutation_partition: Document guarantees in case merging succeeds
  mutation_partition_v2: Accept arbitrary preemption source in apply_monotonically()
  mutation_partition_v2: Simplify get_continuity()
  row_cache: Distinguish dummy insertion site in trace log
  db: Use mutation_partition_v2 in mvcc
  range_tombstone_change_merger: Introduce peek()
  readers: Extract range_tombstone_change_merger
  mvcc: partition_snapshot_row_cursor: Handle non-evictable snapshots
  mvcc: partition_snapshot_row_cursor: Support digest calculation
  mutation_partition_v2: Store range tombstones together with rows
  db: Introduce mutation_partition_v2
  doc: Introduce docs/dev/mvcc.md
  db: cache_tracker: Introduce insert() variant which positions before existing entry in the LRU
  db: Print range_tombstone bounds as position_in_partition
  test: memtable_test: Relax test_segment_migration_during_flush
  test: cache_flat_mutation_reader: Avoid timestamp clash
  test: cache_flat_mutation_reader_test: Use monotonic timestamps when inserting rows
  test: mvcc: Fix sporadic failures due to compact_for_compaction()
  test: lib: random_mutation_generator: Produce partition tombstone less often
  test: lib: random_utils: Introduce with_probability()
  test: lib: Improve error message in has_same_continuity()
  test: mvcc: mvcc_container: Avoid UB in tracker() getter when there is no tracker
  test: mvcc: Insert entries in the tracker
  test: mvcc_test: Do not set dummy::no on non-clustering rows
  mutation_partition: Print full position in error report in append_clustered_row()
  db: mutation_cleaner: Extract make_region_space_guard()
  position_in_partition: Optimize equality check
  mvcc: Fix version merging state resetting
  mutation_partition: apply_resume: Mark operator bool() as explicit
2023-02-05 22:33:10 +02:00
Michał Chojnowski
5edf965526 locator: token_metadata: unify get_address_ranges() and get_ranges()
get_address_ranges() and get_ranges() perform almost the same computation.
They return the same ranges -- the only difference is that
get_address_ranges() returns them in unspecified order, while get_ranges()
returns them in sorted order. Therefore the result of get_ranges() is also
a valid result for get_address_ranges(), and the two functions can be unified
to avoid code duplication. This patch does just that.
2023-02-04 22:55:08 +01:00
Michał Chojnowski
9e57b21e0c locator: token_metadata: get rid of a quadratic behaviour in get_address_ranges()
Some callees of update_pending_ranges use the variant of get_address_ranges()
which builds a hashmap of all <endpoint, owned range> pairs. For
everywhere_topology, the size of this map is quadratic in the number of
endpoints, making it big enough to cause contiguous allocations of tens of MiB
for clusters of realistic size, potentially causing trouble for the
allocator (as seen e.g. in #12724). This deserves a correction.

This patch removes the quadratic variant of get_address_ranges() and replaces
its uses with its linear counterpart.

Refs #10337
Refs #10817
Refs #10836
Refs #10837
Fixes #12724
2023-02-04 22:38:04 +01:00
Aleksandra Martyniuk
f3fa0d21ef repair: use tasks::task_manager::module::make_and_start_task for repair tasks
Use tasks::task_manager::module::make_and_start_task to create and
start repair tasks. Delete start_repair_task static function which
did this before.
2023-02-04 14:33:17 +01:00
Aleksandra Martyniuk
cb3b6cdc1a tasks: add task_manager::module::make_and_start_task method
In most cases, tasks manager's tasks are started just after they are
created. Thus, to reduce boilerplate required for creating and starting
tasks, make_and_start_task method is added.
2023-02-04 14:23:51 +01:00
Jan Ciolek
2a5ed115ca cql/query_options: add a check for missing bind marker name
There was a missing check in validation of named
bind markers.

Let's say that a user prepares a query like:
```cql
INSERT INTO ks.tab (pk, ck, v) VALUES (:pk, :ck, :v)
```
Then they execute the query, but specify only
values for `:pk` and `:ck`.

We should detect that a value for :v is missing
and throw an invalid_request_exception.

Until now there was no such check, in case of a missing variable
invalid `query_options` were created and Scylla crashed.

Sadly it's impossible to create a regression test
using `cql-pytest` or `boost`.

`cql-pytest` uses the python driver, which silently
ignores mising named bind variables, deciding
that the user meant to send an UNSET_VALUE for them.
When given values like `{'pk': 1, 'ck': 2}`, it will automaticaly
extend them to `{'pk': 1, 'ck': 2, 'v': UNSET_VALUE}`.

In `boost` I tried to use `cql_test_env`,
but it only has methods which take valid `query_options`
as a parameter. I could create a separate unit tests
for the creation and validation of `query_options`
but it won't be a true end-to-end test like `cql-pytest`.

The bug was found using the rust driver,
the reproducer is available in the issue description.

Fixes: #12727

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #12730
2023-02-04 02:13:34 +02:00
Alejo Sanchez
346d02b477 raft conf error injection for snapshot
To trigger snapshot limit behavior provide an error injection to set
with one-shot.

Note this effectively changes it and there is no revert.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-02-03 22:33:33 +01:00
Pavel Emelyanov
d021aaf34d system_keysace: De-static calls that update view-building tables
There's a bunch of them used by mainly view_builder and also by the API
and storage_service. All use global qctx to make its job, now when the
callers have main-local sharded<system_keysace> references they can be
made non-static.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-03 21:56:54 +03:00
Pavel Emelyanov
e2f51ce43e storage_service: Coroutinize mark_existing_views_as_built()
It's a start-only method.
Making it coroutine helps further patching.

Also restrict the call to be shard-0 only, it's such anyway but lets
the code have less nested coroutinized lambdas.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-03 21:55:10 +03:00
Andrii Patsula
e420dbf10b service/raft: raft_group_registry: use recent_entries_map to store rate_limits in pinger.
Fixes #12309
2023-02-03 19:04:51 +01:00
Andrii Patsula
c95066a410 utils: introduce recent_entries_map datatype to track least recent visited entries. 2023-02-03 19:04:32 +01:00
Pavel Emelyanov
b347a0cf0b api: Unset column_famliy endpoints
The API calls in question will use system keyspace, that starts before
(and thus stops after) and nowadays indirectly uses database instance
that also starts earlier (and also stops later), so this avoids
potential dangling references.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-03 18:59:28 +03:00
Pavel Emelyanov
eac2e453f2 api: Carry sharded<db::system_keyspace> reference over
There's the column_family/get_built_indexes call that calls a system
keyspace method to fetch data from scylla_views_builds_in_progress
table, so the system keyspace reference will be needed in the API
handler.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-03 18:57:43 +03:00
Pavel Emelyanov
bbbeba103b view_builder: Add system_keyspace dependency
The view builder updates system.scylla_views_builds_in_progress and
.built_views tables and thus needs the system keyspace instance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-03 18:55:58 +03:00
Aleksandra Martyniuk
12789adb95 compaction: test: pass task_manager to compaction_manager in test environment
Each instance of compaction manager should have compaction module pointer
initialized. All contructors get task_manager reference with which
the module is created.
2023-02-03 15:15:11 +01:00
Raphael S. Carvalho
5a784c3c6d treewide: Use new sstable_set::size() wherever possible
That's the preferred alternative because it's zero copy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-03 10:38:04 -03:00
Raphael S. Carvalho
909d1975af sstables: Introduce sstable_set::size()
Preferred aternative to sstable_set->all()->size(), which may
involve of copy elements from a single set or multiple ones
if compound_sstable_set is used.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-03 10:38:00 -03:00
Asias He
e7d5e508bc storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops
Since 97bb2e47ff (storage_service: Enable
Repair Based Node Operations (RBNO) by default for replace), RBNO was
enabled by default for replace ops.

After more testing, we decided to enable repair based node operations by
default for all node operations.
2023-02-03 21:15:08 +08:00
Asias He
fc60484422 test: Increase START_TIMEOUT
It is observed that CI machine is slow to run the test. Increase the
timeout of adding servers.
2023-02-03 21:15:08 +08:00
Aleksandra Martyniuk
47ef689077 compaction: create and register task manager's module for compaction
As an initial part of integration of compaction with task manager, compaction
module is added. Compaction module inherits from tasks::task_manager::module
and shared_ptr to it is kept in compaction manager. No compaction tasks are
created yet.
2023-02-03 13:52:30 +01:00
Aleksandra Martyniuk
6233823cc7 tasks: add task_manager constructor without arguments
Sometimes, e.g. for tests, we may need to create task_manager
without main-specific arguments.
2023-02-03 13:52:30 +01:00
Aleksandra Martyniuk
8cb319030a test: rest_api: check if repair of system keyspace returns before corresponding task is created 2023-02-03 13:35:13 +01:00
Aleksandra Martyniuk
aab704d255 repair: finish repair immediately on local keyspaces
System keyspace is a keyspace with local replication strategy and thus
it does not need to be repaired. It is possible to invoke repair
of this keyspace through the api, which leads to runtime error since
peer_events and scylla_table_schema_history have different sharding logic.

For keyspaces with local replication strategy repair_service::do_repair_start
returns immediately.
2023-02-03 13:35:13 +01:00
Kamil Braun
61dfc9c10f Merge 'docs: extend the warning on using "nodetool removenode"' from Anna Stuchlik
This PR extends the description of using `nodetool removenode `to remove an unavailable node, as requested in https://github.com/scylladb/scylla-enterprise/issues/2338.

Closes #12410

* github.com:scylladb/scylladb:
  docs: improve the warning and add a comment to update/remove the information in the future
  doc: extend the information on removing an unavailable node
  docs: extend the warning on the Remove a Node page
2023-02-03 12:00:17 +01:00
Kamil Braun
d991f71910 test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters
`TopologyTest`s (used by `topology/` suite and friends) already relied
on the `is_dirty` flag stored in `ScyllaCluster` thanks to
`ScyllaClusterManager` (which passes the flag when returning a cluster
to the pool).

But `PythonTest`s (cql-pytest/ suite) and `CQLApprovalTest`s (cql/
suite) had different ways to decide whether a cluster should be
recycled. For example, `PythonTest` would recycle a cluster if
`after_test` raised an exception. This depended on a post-condition
check made by `after_test`: it would query the number of keyspaces and
throw an exception if it was different than when the test started. If
the cluster (which for `PythonTest` is always single-node) was dead,
this query would fail.

However, we modified the behavior of `after_test` in earlier commits -
it no longer preforms the post-condition check on dirty clusters. So
it's also no longer reliable to use the exception raised by `after_test`
to decide that we should recycle the cluster.

Unify the behavior of `PythonTest` and `CQLApprovalTest` with what
`TopologyTest` does - using the `is_dirty` flag to decide that we should
recycle a cluster. Thanks to earlier commits, this flag is set to `True`
whenever a test fails, so it should cover most cases where we want to
recycle a cluster. (The only case not currently covered is if a
non-dirty cluster crashes after we perform the keyspace post-condition
check, which seems quite improbable.)

Note that this causes us to recycle clusters more often in these tests:
previously, when a `PythonTest` or `CQLApprovalTest` failed, but the
cluster was still alive and the post-condition check passed, we would
use the cluster for the next test. Now we recycle a cluster whenever a
test that used it fails.
2023-02-03 11:49:35 +01:00
Kamil Braun
8442cccd37 test/topology: don't drop random_tables keyspace after a failed test
After a failed test, the cluster might be down so dropping the
random_tables keyspace might be impossible. The cluster will be marked
dirty so it doesn't matter that we leave any garbage there.

Note: we already drop only if the cluster is not marked as dirty, and we
mark the cluster as dirty after a failed test. However, marking the
cluster as dirty after a failed test happens at the end of the `manager`
fixture and the `random_tables` fixture depends on the `manager`
fixture, so at the end of the `random_tables` fixture the cluster still
wasn't marked as dirty. Hence the fixture must access the
pytest-provided `request` fixture where we store a flag whether the test
has failed.
2023-02-03 11:49:35 +01:00
Anna Stuchlik
84e2178fe9 docs: improve the warning and add a comment to update/remove the information in the future 2023-02-03 09:33:07 +01:00
Botond Dénes
c270c305c0 Merge 'Allow entire test suite to run with multiple compaction groups' from Raphael "Raph" Carvalho
New test/lib/scylla_test_case.hh, introduced in "tests: Add command line options for Scylla unit tests",
allows extension of the command line options provided by Seastar testing framework.
It allows all boost tests to process additional options without changing a single line of code.

Patch "test: Add x-log2-compaction-groups to Scylla test command line options" builds on that, allowing
all test cases to run with N compaction groups. Again, without changing a line of code in the tests.

Now all you have to do is:
./build/dev/test/boost/sstable_compaction_test -- --smp 1 --x-log2-compaction-groups 1
./test.py --mode=dev --x-log2-compaction-groups 1 --verbose

And it will run the test cases with as many groups as you wish.

./test.py passes successfully with parameter --x-log2-compaction-groups 1.

Closes #12369

* github.com:scylladb/scylladb:
  test.py: Add option to run scylla tests with multiple compaction groups
  test: Add x-log2-compaction-groups to Scylla test command line options
  test: Enable Scylla test command line options for boost tests
  tests: Add command line options for Scylla unit tests
  replica: table: Add debug log for number of compaction groups
  test: sstable_compaction_test: Fix indentation
  test: sstable_compaction_test: Make it work with compaction groups
  test: test_bloom_filter: Fix it with multiple compaction groups
  test: memtable_test: Fix it with multiple compaction groups
2023-02-03 06:35:15 +02:00
Kefu Chai
d2e3a60428 dist/debian: drop unused Makefile variable
`job` was introduced back in 782ebcece4,
so we could consume the option specified in DEB_BUILD_OPTIONS
environmental variable. but now that we always repackage
the artifacts prebuilt in the relocatable package. we don't build
them anymore when packaging debian packages. see
9388f3d626 . and `job` is not
passed to `ninja` anymore.

so, in this change, `job` is removed from debian/rules as well, as
it is not used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-03 11:18:51 +08:00
Kefu Chai
75eaee040b dist/debian: bump up debhelper compatibility level to 10
to silence the warnings from dh tools, like
```
 dh: warning: Compatibility levels before 10 are deprecated (level 9 in use)
    dh_clean
 dh_clean: warning: Compatibility levels before 10 are deprecated (level 9 in use)
```

see
https://manpages.debian.org/testing/debhelper/debhelper-compat-upgrade-checklist.7.en.html
for the changes in between v9 and v10, none of them applies to
our use case.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-03 11:04:43 +08:00
Raphael S. Carvalho
55a8421e3d table: Fix inefficiency when rebuilding statistics with compaction groups
Whenever any compaction group has its SSTable set updated, table's
rebuild_statistics() is called and it inefficiently iterates through
SSTable set of all compaction groups.

Now each compaction group keeps track of its statistics, such that
table's rebuild_statistics() only need to sum them up.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-02 17:10:11 -03:00
Raphael S. Carvalho
529a1239a9 table: Fix disk-space related metrics
total disk space used metric is incorrectly telling the amount of
disk space ever used, which is wrong. It should tell the size of
all sstables being used + the ones waiting to be deleted.
live disk space used, by this defition, shouldn't account the
ones waiting to be deleted.
and live sstable count, shouldn't account sstables waiting to
be deleted.

Fix all that.

Fixes #12717.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-02 16:38:45 -03:00
Raphael S. Carvalho
55cd163392 sstables: Fix fragility of sstable_set::all() interface
all() was returning lw_shared_ptr<sstable_list> which allowed caller
to modify sstable set content, which will mess up everything.
sstable_set is supposed to be only modifed through insert and erase
functions.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-02 15:46:08 -03:00
Alejo Sanchez
9ceb6aba81 test/pylib: one-shot error injection helper
Existing helper with async context manager only worked for non one-shot
error injections. Fix it and add another helper for one-shot without a
context manager.

Fix tests using the previous helper.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-02-02 16:37:21 +01:00
Kamil Braun
a9dbd89478 test/pylib: mark cluster as dirty after a failed test
We don't expect the cluster to be functioning at all after a failed
test. The whole cluster might have crashed, for example. In these
situations the framework would report multiple errors (one for the
actual failure, another for a failed post-condition check because the
cluster was down) which would only obscure the report and make debugging
harder. It's also not safe in general to reuse the cluster in another
test - if the test previous failed, we should not assume that it's in a
valid state.

Therefore, mark the cluster as dirty after a failed test. This will let
us recycle the cluster based on the dirty flag and it will disable
post-condition check after a failed test (which is only done on
non-dirty clusters).

To implement this in topology tests, we use the
`pytest_runtest_makereport` hook which executes after a test finishes
but before fixtures finish. There we store a test-failed flag in a stash
provided by pytest, then access the flag in the `manager` fixture.
2023-02-02 16:35:55 +01:00
Kamil Braun
977375d13f test: pylib, topology: don't perform operations after test on a dirty cluster
`after_test` would count keyspaces and check that the number is the same
as before the test started. The `random_tables` fixture after a test
would drop the keyspace that it created before the test.

These steps are done to ensure that the cluster is ready to be reused
for the next steps. If the cluster is dirty, it cannot be reused anyway,
so the steps are unnecessary. They might also be impossible in general
- a dirty cluster might be completely dead. For example, the attempts to
drop a keyspace from `random_tables` would cause confusing errors
if a test failed when it tried to restart a node while all nodes
were down,  making it harder to find the 'real' failure.

Therefore don't perform these operations if the cluster is dirty.
2023-02-02 15:59:02 +01:00
Kamil Braun
f4b56cddde test/pylib: print cluster at the end of test
- print the cluster used by the test in `after_test`
- if cluster setup fails in `before_test`, print the cluster together
  with the exception (`after_test` is not executed if `before_test`
  fails)
2023-02-02 15:59:02 +01:00
Anna Stuchlik
f4c5cdf21b doc: add the info about the minor versions 2023-02-02 14:16:40 +01:00
Avi Kivity
f5fd0769b2 Merge 'cql3: expr: don't pass empty evaluation_inputs in is_one_of' from Jan Ciołek
`evaluation_inputs` is a struct which contains data needed to evaluate expressions - values of columns, bind variables and other data.
`is_on_of()` is a function used to to evaluate `IN` restrictions. It checks whether the LHS is one of elements on the RHS list.

Generally when evaluating expressions we get the `evaluation_inputs` as an argument and we should pass them along to any functions that evaluate subexpressions.

`is_one_of()` got the inputs as an argument, but didn't pass them along to `equal()`, instead it creates new empty `evaluation_inputs{}` and gives that to `equal()`.

At first [I thought this was a bug](https://github.com/scylladb/scylladb/pull/12356#discussion_r1084300969) - with missing information there could be a crash if `equal()` tried to evaluate an expression with a `bind_variable`.
It turns out that in this particular case `equal()` won't use the `evaluation_inputs` at all. The LHS and RHS passed to it are just constant values, which were already evaluated to serialized bytes before calling `evaluate()`, so there is no bug.

It's still better to pass the inputs argument along if possible. If in the future `equal()` required these inputs for some reason, missing inputs could lead to an unexpected crash.
I couldn't find any tests that would detect this case, so such a bug could stay undetected until an unhappy user finds it because their cluster crashed.
I added some tests to make sure that it's covered from now on.

Closes #12701

* github.com:scylladb/scylladb:
  cql-pytest: test filtering using list with bind variable
  test/expr_test: test <int_value> IN (123, ?, 456)
  cql3: expr: don't pass empty evaluation_inputs in is_one_of
2023-02-02 11:40:20 +02:00
Botond Dénes
9efbcfa190 Merge 'test/alternator: tests for Limit parameter of ListStreams operation' from Nadav Har'El
The first patch in this series enables a previously-skipped test for what happens with Limit=0. The test passes.
The second patch adds an xfailng test for very large Limit.

Closes #12625

* github.com:scylladb/scylladb:
  test/alternator: xfailing test for huge Limit in ListStreams
  alternator/test: un-skip test of zero Limit in ListStreams
2023-02-02 07:02:28 +02:00
Asias He
6d7b4a896e test: Increase max-networking-io-control-blocks
The number is too low in the test and we saw

rpc: Connection is closed error

Inrease the number to the default 1000.
2023-02-02 11:11:22 +08:00
Asias He
693d71984f storage_service: Check node has left in node_ops_cmd::decommission_done
In test with ring delay zero, it is possible that when the
node_ops_cmd::decommission_done is received, the nodes remained in the
cluster haven't learned the LEFT status for the leaving node yet.

To guarantee when the decommission restful api returns, all the nodes
participated the decommission operation have learned the LEFT status, a
check in the node_ops_cmd::decommission_done is added in this patch.

After this patch, the decommission tests which start multiple
decommission in a loop with ring delay zero in
test/topology/test_topology.py passes.
2023-02-02 11:11:22 +08:00
Asias He
e2e5017c54 repair: Use remote dc neighbors for everywhere strategy
Consider:

- Bootstrap n1 in dc 1
- Create ks with EverywhereStrategy
- Bootstrap n2 in dc 2

Since n2 is the first node in dc2, there will be no local dc nodes to
sync data from. In this case, n2 should sync data with node in dc 1 even
if it is in the remote dc.
2023-02-02 11:10:50 +08:00
Raphael S. Carvalho
e3923a9caf test.py: Add option to run scylla tests with multiple compaction groups
The tests can now optionally run with multiple groups via option
--x-log2-compaction-groups.

This includes boost tests and the ones which run against either
one (e.g. cql) or many instances (e.g. topology).

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:17:16 -03:00
Raphael S. Carvalho
f510cab5f0 test: Add x-log2-compaction-groups to Scylla test command line options
Now any boost test can run with multiple compaction groups by default,
without any change in the boost test cases whatsoever.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:14:51 -03:00
Raphael S. Carvalho
3c5afb2d5c test: Enable Scylla test command line options for boost tests
We have enabled the command line options without changing a
single line of code, we only had to replace old include
with scylla_test_case.hh.

Next step is to add x-log-compaction-groups options, which will
determine the number of compaction groups to be used by all
instantiations of replica::table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:14:51 -03:00
Raphael S. Carvalho
a2c60b6cf5 tests: Add command line options for Scylla unit tests
Scylla unit tests are limited to command line options defined by
Seastar testing framework.

For extending the set of options, Scylla unit tests can now
include test/lib/scylla_test_case.hh instead of seastar/testing/test_case.hh,
which will "hijack" the entry point and will process the command line
options, then feed the remaining options into seastar testing entry
point.

This is how it looks like when asking for help:

Scylla tests additional options:
  --help                              Produces help message
  --x-log2-compaction_groups arg (=0) Controls static number of compaction
                                      groups per table per shard. For X groups,
                                      set the option to log (base 2) of X.
                                      Example: Value of 3 implies 8 groups.

Running 1 test case...
App options:
  -h [ --help ]                         show help message
  --help-seastar                        show help message about seastar options
  --help-loggers                        print a list of logger names and exit
  --random-seed arg                     Random number generator seed
  --fail-on-abandoned-failed-futures arg (=1)
                                        Fail the test if there are any
                                        abandoned failed futures

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:14:51 -03:00
Raphael S. Carvalho
8988795b08 replica: table: Add debug log for number of compaction groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:14:51 -03:00
Raphael S. Carvalho
a7ddedb998 test: sstable_compaction_test: Fix indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:14:51 -03:00
Raphael S. Carvalho
c455e43f49 test: sstable_compaction_test: Make it work with compaction groups
Tests using replica::table::add_sstable_and_update_cache() cannot
rely on the sstable being added to a single compaction group, if
the test was forced to run with multiple groups.

Additionally let's remove try_flush_memtable_to_sstable() which
is retricted to a single group, allowing the entire test to now
pass with multiple groups.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:14:51 -03:00
Raphael S. Carvalho
c25d8614a9 test: test_bloom_filter: Fix it with multiple compaction groups
With many compaction groups, the data:filter size ratio becomes small
with a small number of keys.

Test is adjusted to run another check with more keys if efficiency
is higher than expected, but not lower.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:14:51 -03:00
Raphael S. Carvalho
2d2460046b test: memtable_test: Fix it with multiple compaction groups
With compaction groups, automatic flushing may not pick the user
table. Fix it by using explicit flush.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:14:51 -03:00
Botond Dénes
34cdcaffae reader_concurrency_semaphore: un-bless permits when they become inactive
When the memory consumption of the semaphore reaches the configured
serialize threshold, all but the blessed permit is blocked from
consuming any more memory. This ensures that past this limit, only one
permit at a time can consume memory.
Such a blessed permit can be registered inactive. Before this patch, it
would still retain its blessed status when doing so. This could result
in this permit being re-queued for admission if it was evicted in the
meanwhile, potentially resulting in a complete deadlock of the semaphore:
* admission queue permits cannot be admitted because there is no memory
* admitter permits are all queued on memory, as none of them are blessed

This patch strips the blessed status from the permit when it is
registered as inactive. It also adds a unit test to verify this happens.

Fixes: #12603

Closes #12694
2023-02-01 21:02:17 +02:00
Botond Dénes
693c22595a sstables/sstable: validate_checksums(): force-check EOF
EOF is only guarateed to be set if one tried to read past the end of the
file. So when checking for EOF, also try to read some more. This
should force the EOF flag into a correct value. We can then check that
the read yielded 0 bytes.
This should ensure that `validate_checksums()` will not falsely declare
the validation to have failed.

Fixes: #11190

Closes #12696
2023-02-01 20:52:46 +02:00
Nadav Har'El
69517040f7 Merge 'alterator::streams: Sort tables in list_streams to ensure no duplicates' from Calle Wilund
Fixes #12601 (maybe?)

Sort the set of tables on ID. This should ensure we never generate duplicates in a paged listing here. Can obviously miss things if they are added between paged calls and end up with a "smaller" UUID/ARN, but that is to be expected.

Closes #12614

* github.com:scylladb/scylladb:
  alternator::streams: Special case single table in list_streams
  alternator::streams: Only sort tables iff limit < # tables or ExclusiveStartStreamArn set
  alternator::streams: Set default list_streams limit to 100 as per spec
  alterator::streams: Sort tables in list_streams to ensure no duplicates
2023-02-01 19:47:16 +02:00
Wojciech Mitros
86c61828e6 udt: disallow dropping a user type used in a user function
Currently, nothing prevents us from dropping a user type
used in a user function, even though doing so may make us
unable to use the function correctly.
This patch prevents this behavior by checking all function
argument and return types when executing a drop type statement
and preventing it from completing if the type is referenced
by any of them.

Closes #12680
2023-02-01 18:53:29 +02:00
Kefu Chai
53366db6c6 build: disable Seastar's io_uring backend again
this partially reverts 49157370bc

according the reports in #12173, at least two developers ran into
test failures which are correlated with the lastest Seastar change,
which enables the io_uring backend by default. they are using linux
kernel 6.0.12 and 6.1.7. it's also reported that reverting the
the commit of eedca15f16c3b6eae3d3d8af9510624a93f5d186 in seastar
helps. that very commit enables the io_uring by default. although we
are not able to identify the exact root cause of the failures in #12173
at this moment, to rule out the potential problem of io_uring should
help with further investigation.

in this change, io_uring backend is disabled when building Seastar.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12689
2023-02-01 17:36:07 +02:00
Jan Ciolek
ed568f3f70 cql-pytest: test filtering using list with bind variable
Add tests which test filtering using IN restriction
with a list which contains a bind variable.

There are other cql-pytest tests which
test IN lists with a bind variable,
but it looks like they don't do filtering.

IN restrictions on primary key columns
are handled in a special way to generate
the right ranges.

These tests hit a different code path as
filtering uses `expr::evaluate()`.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-02-01 16:30:09 +01:00
Jan Ciolek
9eb6746a67 test/expr_test: test <int_value> IN (123, ?, 456)
Add tests which test evaluating the IN restriction
with a list which contains a bind variable.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-02-01 16:29:32 +01:00
Jan Ciolek
286599fe8b cql3: expr: don't pass empty evaluation_inputs in is_one_of
evaluation_inputs is a struct which contains
data needed to evaluate expressions - values
of columns, bind variables and other data.

is_on_of() is a function used to to evaluate
IN restrictions. It checks whether the LHS
is one of elements on the RHS list.

Generally when evaluating expressions we get
the evaluation_inputs{} as an argument and
we should pass them along to any functions
that evaluate subexpressions.

is_one_of() got the inputs as an argument,
but didn't pass them along to equal(),
instead it creates new empty evaluation_inputs{}
and gives that to equal().

At first I thought this was a bug - with missing
information there could be a crash if equal()
tried to evaluate an expression with a bind_variable.

It turns out that in this particular case equal()
won't use the evaluation_inputs{} at all.
The LHS and RHS passed to it are just constant values,
which were already evaluated to serialized bytes
before calling evaluate().

It's still better to pass the inputs argument along
if possible. If in the future equal() required
these inputs for some reason, missing inputs
could lead to an unexpected crash.
I couldn't find any tests that would detect this case,
so such a bug could stay undetected until an unhappy user
finds it because their cluster crashed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-02-01 16:20:24 +01:00
Avi Kivity
b4559a6992 Update seastar submodule
* seastar 943c09f869...ef24279f03 (6):
  > Merge 'util/print_safe, reactor: use concept for type constraints and refactory ' from Kefu Chai
  > Right align the memory diagnostics
  > Merge 'Add an API for the metrics layer to manipulate metrics dynamically.' from Amnon Heiman
  > semaphore: assert no outstanding units when moved
  > build: do not populate package registry by default
  > build: stop detecting concepts support

Closes #12695
2023-02-01 17:19:49 +02:00
Kamil Braun
40142a51d0 test: topology: wait for token ring/group 0 consistency after decommission
There was a check for immediate consistency after a decommission
operation has finished in one of the tests, but it turns out that also
after decommission it might take some time for token ring to be updated
on other nodes. Replace the check with a wait.

Also do the wait in another test that performs a sequence of
decommissions. We won't attempt to start another decommission until
every node learns that the previously decommissioned node has left.

Closes #12686
2023-02-01 16:49:22 +02:00
Raphael S. Carvalho
1b2140e416 compaction: Fix inefficiency when updating LCS backlog tracker
LCS backlog tracker uses STCS tracker for L0. Turns out LCS tracker
is calling STCS tracker's replace_sstables() with empty arguments
even when higher levels (> 0) *only* had sstables replaced.
This unnecessary call to STCS tracker will cause it to recompute
the L0 backlog, yielding the same value as before.

As LCS has a fragment size of 0.16G on higher levels, we may be
updating the tracker multiple times during incremental compaction,
which operates on SSTables on higher levels.

Inefficiency is fixed by only updating the STCS tracker if any
L0 sstable is being added or removed from the table.

This may be fixing a quadratic behavior during boot or refresh,
as new sstables are loaded one by one.
Higher levels have a substantial higher number of sstables,
therefore updating STCS tracker only when level 0 changes, reduces
significantly the number of times L0 backlog is recomputed.

Refs #12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12676
2023-02-01 15:19:07 +02:00
Michael Hollander
5d1e40bc18 Added missing full stop to SimpleSnitch paragraph
Closes #12692
2023-02-01 13:21:49 +02:00
Nadav Har'El
132af20057 Merge 'test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests' from Kamil Braun
`ScyllaClusterManager` is used to run a sequence of test cases from
a single test file. Between two consecutive tests, if the previous test
left the cluster 'dirty', meaning the cluster cannot be reused, it would
free up space in the pool (using `steal`), stop the cluster, then get a
new cluster from the pool.

Between the `steal` and the `get`, a concurrent test run (with its own
instance of `ScyllaClusterManager` would start, because there was free
space in the pool.

This resulted in undesirable behavior when we ran tests with
`--repeat X` for a large `X`: we would start with e.g. 4 concurrent
runs of a test file, because the pool size was 4. As soon as one of the
runs freed up space in the pool, we would start another concurrent run.
Soon we'd end up with 8 concurrent runs. Then 16 concurrent runs. And so
on. We would have a large number of concurrent runs, even though the
original 4 runs didn't finish yet. All of these concurrent runs would
compete waiting on the pool, and waiting for space in the pool would
take longer and longer (the duration is linear w.r.t number of
concurrent competing runs). Tests would then time out because they would
have to wait too long.

Fix that by using the new `replace_dirty` function introduced to the
pool. This function frees up space by returning a dirty cluster and then
immediately takes it away to be used for a new cluster. Thanks to this,
we will only have at most as many concurrent runs as the pool size. For
example with --repeat 8 and pool size 4, we would run 4 concurrent runs
and start the 5th run only when one of the original 4 runs finishes,
then the 6th run when a second run finishes and so on.

The fix is preceded by a refactor that replaces `steal` with `put(is_dirty=True)`
and a `destroy` function passed to the pool (now the pool is responsible
for stopping the cluster and releasing its IPs).

Fixes #11757

Closes #12549

* github.com:scylladb/scylladb:
  test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests
  test/pylib: pool: introduce `replace_dirty`
  test/pylib: pool: replace `steal` with `put(is_dirty=True)`
2023-02-01 12:37:39 +02:00
Anna Stuchlik
b346778ae8 doc: add the missing sudo command 2023-02-01 10:43:39 +01:00
Nadav Har'El
681a066923 test/pylib: put UNIX-domain socket in /tmp
The "cluster manager" used by the topology test suite uses a UNIX-domain
socket to communicate between the cluster manager and the individual tests.
The socket is currently located in the test directory but there is a
problem: In Linux the length of the path used as a UNIX-domain socket
address is limited to just a little over 100 bytes. In Jenkins run, the
test directory names are very long, and we sometimes go over this length
limit and the result is that test.py fails creating this socket.

In this patch we simply put the socket in /tmp instead of the test
directory. We only need to do this change in one place - the cluster
manager, as it already passes the socket path to the individual tests
(using the "--manager-api" option).

Tested by cloning Scylla in a very long directory name.
A test like ./test.py --mode=dev test_concurrent_schema fails before
this patch, and passes with it.

Fixes #12622

Closes #12678
2023-02-01 12:37:35 +03:00
Botond Dénes
325246ab2a Merge 'doc: fix the service name from "scylla-enterprise-server" "to "scylla-server"' from Anna Stuchlik
Related https://github.com/scylladb/scylladb/issues/12658.

This issue fixes the bug in the upgrade guides for the released versions.

Closes #12679

* github.com:scylladb/scylladb:
  doc: fix the service name in the upgrade guide for patch releases versions 2022
  doc: fix the service name in the upgrade guide from 2021.1 to 2022.1
2023-02-01 12:37:35 +03:00
Anna Stuchlik
2be131da83 doc: fixes https://github.com/scylladb/scylladb/issues/12672, fix the redirects to the Cloud docs
Closes #12673
2023-02-01 12:37:35 +03:00
Botond Dénes
d8073edbb7 Merge 'cql3, locator: call fmt::format_to() explicitly and include used headers' from Kefu Chai
these fixes address the FTBFS of scylla with GCC-13.

Closes #12669

* github.com:scylladb/scylladb:
  cql3/stats: include the used header.
  cql3, locator: call fmt::format_to() explicitly
2023-02-01 12:37:35 +03:00
Pavel Emelyanov
d065f9f82e sstables: The generation_type is not formattable
If TOC writing hits TOC file conflict it tries to throw an exception
with sstable generation in it. However, generation_type is not
formattable at all, let alone the {:d} option.pick

This bug generates an obscure 'fmt::v9::format_error (invalid type
specifier)' error in unknown location making the debugging hard.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12671
2023-02-01 12:37:35 +03:00
Kefu Chai
186ceea009 cql3/selection: construct string_view using char* not size
before this change, we construct a sstring from a comma statement,
which evaluates to the return value of `name.size()`, but what we
expect is `sstring(const char*, size_t)`.

in this change

* instead of passing the size of the string_view,
  both its address and size are used
* `std::string_view` is constructed instead of sstring, for better
  performance, as we don't need to perform a deep copy

the issue is reported by GCC-13:

```
In file included from cql3/selection/selectable.cc:11:
cql3/selection/field_selector.hh:83:60: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result]
        auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size()));
                                                           ^~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12666
2023-02-01 12:37:35 +03:00
David Garcia
616bf26422 docs: add opensource flag
Closes #12656
2023-02-01 12:37:35 +03:00
Anna Stuchlik
e81b586d6a Merge branch 'scylladb:master' into anna-pinning-workaround 2023-02-01 10:36:44 +01:00
Anna Stuchlik
11a59bcc76 doc: fix the service name in the upgrade guide for patch releases versions 2022 2023-01-31 11:04:21 +01:00
Anna Stuchlik
71ae644d40 doc: fix the service name in the upgrade guide from 2021.1 to 2022.1 2023-01-31 10:46:44 +01:00
Kefu Chai
58b4dc5b9a cql3/stats: include the used header.
otherwise `uint64_t` won't be found when compiling with GCC-13.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-30 21:50:23 +08:00
Kefu Chai
ccc03dd1ec cql3, locator: call fmt::format_to() explicitly
since format_to() is defined included by both fmt and std namepaces,
without specifying which one to use, we'd fail to build with the
standard library which implements std::format_to(). yes, we are
`using namespace std` somewhere.

this change should address the FTBFS with GCC-13.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-30 21:50:11 +08:00
Warren Krewenki
8655a8be19 docs: Update suggested AWS instance types in benchmark tips
The list of suggested instances had a misspelling of c5d, and didn't include the i4i instances recommended by https://www.scylladb.com/2022/05/09/scylladb-on-the-new-aws-ec2-i4i-instances-twice-the-throughput-lower-latency/

Closes #12664
2023-01-30 14:10:18 +02:00
Botond Dénes
c927eea1d5 Merge 'table: trim ranges for compaction group cleanup' from Benny Halevy
This series contains the following changes for trimming the ranges passed to cleanup a compaction group to the compaction group owned token_range.

table: compaction_group_for_token: use signed arithmetic
Fixes #12595

table: make_compaction_groups: calculate compaction_group token ranges
table: perform_cleanup_compaction: trim owned ranges on compaction_group boundaries
Fixes #12594

Closes #12598

* github.com:scylladb/scylladb:
  table: perform_cleanup_compaction: trim owned ranges on compaction_group boundaries
  table: make_compaction_groups: calculate compaction_group token ranges
  dht: range_streamer: define logger as static
2023-01-30 13:11:28 +02:00
Anna Stuchlik
64cc4c8515 docs: fixes https://github.com/scylladb/scylladb/issues/12654, update the links to the Download Center
Closes #12655
2023-01-30 12:45:20 +02:00
Michał Chojnowski
fa7e904cd6 commitlog: fix total_size_on_disk accounting after segment file removal
Currently, segment file removal first calls `f.remove_file()` and
does `total_size_on_disk -= f.known_size()` later.
However, `remove_file()` resets `known_size` to 0, so in effect
the freed space in not accounted for.

`total_size_on_disk` is not just a metric. It is also responsible
for deciding whether a segment should be recycled -- it is recycled
only if `total_size_on_disk - known_size < max_disk_size`.
Therefore this bug has dire performance consequences:
if `total_size_on_disk - known_size` ever exceeds `max_disk_size`,
the recycling of commitlog segments will stop permanently, because
`total_size_on_disk - known_size` will never go back below
`max_disk_size` due to the accounting bug. All new segments from this
point will be allocated from scratch.

The bug was uncovered by a QA performance test. It isn't easy to trigger --
it took the test 7 hours of constant high load to step into it.
However, the fact that the effect is permanent, and degrades the
performance of the cluster silently, makes the bug potentially quite severe.

The bug can be easily spotted with Prometheus as infinitely rising
`commitlog_total_size_on_disk` on the affected shards.

Fixes #12645

Closes #12646
2023-01-30 12:20:04 +02:00
Botond Dénes
71ad0dff2b test/lib/sstable_utils: remove now unused token_generation_for_shard() and friends 2023-01-30 05:03:42 -05:00
Botond Dénes
a03c11234d test/lib/simple_schema: remove now unused make_keys() and friends 2023-01-30 05:03:42 -05:00
Botond Dénes
4ad3ba52b0 test: migrate to tests::generate_partition_key[s]()
Use the newly introduced key generation facilities, instead of the the
old inflexible alternatives and hand-rolled code.
Most of the migrations are mechanic, but there are two tests that
were tricky to migrate:
* sstable_compaction_test.sstable_run_based_compaction_test
* sstable_mutation_test.test_key_count_estimation

These two tests seems to depend on generated keys all being of the same
size. This makes some sense in the case of the key count estimation
test, but makes no sense at all to me in the case of the sstable run
test.
2023-01-30 05:03:42 -05:00
Botond Dénes
84c94881b3 test/lib/test_services: add table_for_tests::make_default_schema()
Creating the default schema, used in the default constructor of
table_for_tests. Allows for getting the default schema without creating
an instance first.
2023-01-30 05:03:42 -05:00
Botond Dénes
61f28d3ab2 test/lib: add key_utils.hh
Contains methods to generate partition and clustering keys. In the case
of the former, one can specify the shard to generate keys for.
We currently have some methods to generate these but they are not
generic. Therefore the tests are littered by open-coded variants.
The methods introduced here are completely generic: they can generate
keys for any schema.
2023-01-30 05:03:42 -05:00
Anna Stuchlik
0294b426b9 doc: replace the reduntant link with an alternative way to install a non-latest version 2023-01-30 10:01:17 +01:00
Botond Dénes
04ca710a95 test/lib/random_schema.hh: value_generator: add min_size_in_bytes
Allow caller to specify the minimum size in bytes of the generated
value. Only really works with string-like types (and collections of
these).
Also fixed max size enforcement for strings: before this patch, the
provided max size was dividied by wide string size, instead of the
char width of the actual string type the value is generated for.
2023-01-30 01:11:31 -05:00
Avi Kivity
5d914adcef Merge 'view: row_lock: lock_ck: find or construct row_lock under partition lock' from Benny Halevy
Since we're potentially searching the row_lock in parallel to acquiring the read_lock on the partition, we're racing with row_locker::unlock that may erase the _row_locks entry for the same clustering key, since there is no lock to protect it up until the partition lock has been acquired and the lock_partition future is resolved.

This change moves the code to search for or allocate the row lock _after_ the partition lock has been acquired to make sure we're synchronously starting the read/write lock function on it, without yielding, to prevent this use-after-free.

This adds an allocation for copying the clustering key in advance that wasn't needed before if the lock for it was already found, but the view building is not on the hot path so we can tolerate that.

This is required on top of 5007ded2c1 as seen in https://github.com/scylladb/scylladb/issues/12632 which is closely related to #12168 but demonstrates a different race causing use-after-free.

Fixes #12632

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12639

* github.com:scylladb/scylladb:
  view: row_lock: lock_ck: try_emplace row_lock entry
  view: row_lock: lock_ck: find or construct row_lock under partition lock
2023-01-29 18:38:14 +02:00
Warren Krewenki
2b7a7e52f4 docs: Missing closing quote in example query
Closes #12663
2023-01-29 11:50:11 +02:00
Tomasz Grabiec
c9c476afd7 test: mvcc: Extend some scenarios with exhaustive consistency checks on eviction 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
80de99cb1b test: mvcc: Extract mvcc_container::allocate_in_region() 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
7bb975eb22 row_cache, lru: Introduce evict_shallow()
Will be used by MVCC tests which don't want (can't) deal with the
row_cache as the container but work with the partition_entry directly.

Currently, rows_entry::on_evicted() assumes that it's embedded in
row_cache and would segfault when trying to evict the contining
partition entry which is not embedded in row_cache. The solution is to
call evict_shallow() from mvcc_tests, which does not attempt to evict
the containing partition_entry.
2023-01-27 21:56:31 +01:00
Tomasz Grabiec
f2832046e9 test: mvcc: Avoid copies of mutation under failure injection
Speeds up the test a bit because we avoid the copy when converting to
mutation_partition_v2 in apply().
2023-01-27 21:56:31 +01:00
Tomasz Grabiec
b8980f68f0 test: mvcc: Add missing logalloc::reclaim_lock to test_apply_is_atomic 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
d02d668777 mutation_partition_v2: Avoid full scan when applying mutation to non-evictable
For non-evictable snapshots all ranges are continuous so there is no
need to apply the continuity flag to the previous interval if the
source mutation has the interval marked as continuous.

Without this, applying a single row mutation to a memtable would
involve scanning exisiting version for the range before the row's
key. This makes population quadratic.

This is severed by the fact that this scan will happen in the
background if preempted, which exposes a scheduling problem. The
mutation cleaner worker which merges versions in the background will
not keep up with the incoming writes. This will lead to explosion of
partition versions, which makes reads (e.g. memtable flush) very
slow. The read will have to refresh the iterator heap, which has an
iterator for each version, across every preemption point, because
cleaning invalidates iterators.

The same could happen before the v2 representation, but for much less
typical workloads, e.g. applying lots of mutations with a single range
tombstone covering existing rows.

The problem was hit in index_with_paging_test in debug mode. It's less
likely to happen in release mode where preemption is not triggered as
often.
2023-01-27 21:56:31 +01:00
Tomasz Grabiec
bc35fa7696 Pass is_evictable to apply() 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
2b5e7a684b tests: mutation_partition_v2: Introduce test_external_memory_usage_v2 mirroring the test for v1 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
81b1b2ee55 tests: mutation: Fix test_external_memory_usage() to not measure mutation object footprint
The test measured copying of the mutation object, but verified the
measurement against mutation_partition::external_memory_usage(). So
anything allocated on the mutation object level would cause the test
to (incorrectly) fail. Fix that by copying only the mutation_partition
part.

Currently not a problem, because the partition_key is stored in the
in-line storage. Would become a problem once inline storage is
reduced.
2023-01-27 21:56:31 +01:00
Tomasz Grabiec
f172336b32 tests: mutation_partition_v2: Add test for exception safety of mutation merging 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
919ff433d1 tests: Add tests for the mutation_partition_v2 model 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
cec9b2d114 mutation_partition_v2: Implement compact()
For convenience, will be used in unit tests.
2023-01-27 21:56:31 +01:00
Tomasz Grabiec
4317999ca4 cache_tracker: Extract insert(mutation_partition_v2&) 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
c7f7377ea3 mvcc, mutation_partition: Document guarantees in case merging succeeds
It's not obvious that invariants for partial merge do not hold for a
completed merge.

This is due to the fact that an empty source partition, which is
always empty after merge, is always fully continuous.
2023-01-27 21:56:31 +01:00
Tomasz Grabiec
8ae78ffebd mutation_partition_v2: Accept arbitrary preemption source in apply_monotonically()
Will be useful in testing to exhaustivaly test preemption scenarios.
2023-01-27 21:56:31 +01:00
Tomasz Grabiec
8883ac30cf mutation_partition_v2: Simplify get_continuity() 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
d9e27abe87 row_cache: Distinguish dummy insertion site in trace log 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
026f8cc1e7 db: Use mutation_partition_v2 in mvcc
This patch switches memtable and cache to use mutation_partition_v2,
and all affected algorithms accordingly.

The memtable reader was changed to use the same cursor implementation
which cache uses, for improved code reuse and reducing risk of bugs
due to discrepancy of algorithms which deal with MVCC.

Range tombstone eviction in cache has now fine granularity, like with
rows.

Fixes #2578
Fixes #3288
Fixes #10587
2023-01-27 21:56:28 +01:00
Tomasz Grabiec
ccf3a13648 range_tombstone_change_merger: Introduce peek()
Returns the current tombstone without affecting state.
2023-01-27 19:15:39 +01:00
Tomasz Grabiec
42f5a7189d readers: Extract range_tombstone_change_merger 2023-01-27 19:15:39 +01:00
Tomasz Grabiec
6b7473be53 mvcc: partition_snapshot_row_cursor: Handle non-evictable snapshots
This is a prerequisite for using the cursor in memtable readers.

Non-evictable snapshots are those which live in memtables. Unlike
evictable snapshots, they don't have a dummy entry at position after
all clustering rows. In evictable snapshots, lookup always finds an
entry, not so with non-evictable snapshots. The cursor was not
prepared for this case, this patch handles it.
2023-01-27 19:15:39 +01:00
Tomasz Grabiec
091ad8f6ee mvcc: partition_snapshot_row_cursor: Support digest calculation
Prerequisite for using in memtable reader.
2023-01-27 19:15:39 +01:00
Tomasz Grabiec
195b40315a mutation_partition_v2: Store range tombstones together with rows
This patch changes mutation_partition_v2 to store range tombstone
information together with rows.

This mainly affects the version merging algorithm,
mutation_partition_v2::apply_monotonically().

Continuity setting no longer can drop dummy entry unconditionally
since it may be a boundary of a range tombstone.

Memtable/cache is not switched yet.

Refs #10587
Refs #3288
2023-01-27 19:15:39 +01:00
Tomasz Grabiec
7e6056b3cc db: Introduce mutation_partition_v2
Intended to be used in memtable/cache, as opposed to the old
mutation_partition which will be intended to be used as temporary
object.

The two will have different trade-offs regarding memory efficiency and
algorithms.

In this commit there is no change in logic, the class is mostly
copied. Some methods which are not needed on the v2 model were removed
from the interface.

Logic changes will be introduced in later commits.
2023-01-27 19:15:39 +01:00
Tomasz Grabiec
806f698272 doc: Introduce docs/dev/mvcc.md
This extracts information which was there in row_cache.md, but is
relevant to MVCC in general.

It also makes adaptations and reflects the upcoming changes in this
series related to switching to the new mutation_partition_v2 model:

 - continuity in evictable snapshots can now overlap. This is needed
   to represent range tombstone information, which is linked to
   continuity information.

 - description of range tombstone representation was added
2023-01-27 19:15:39 +01:00
Tomasz Grabiec
27882ff19e db: cache_tracker: Introduce insert() variant which positions before existing entry in the LRU 2023-01-27 19:15:39 +01:00
Tomasz Grabiec
a574a1cc4e db: Print range_tombstone bounds as position_in_partition
It's the standard now which replaced bound_view.

Will be consistent with how range tombstone bounds are represented in
mutation_partition_v2 (as rows_entry::position()).
2023-01-27 19:15:39 +01:00
Tomasz Grabiec
40719c600c test: memtable_test: Relax test_segment_migration_during_flush
Partition version merging can now insert sentinels, which may
temporarily increase unspooled memory. It is no longer true that
unspooled monotonically decreases, which the test verified.  Relax it,
and only verify that unspooled is smaller than real dirty.
2023-01-27 19:15:39 +01:00
Tomasz Grabiec
31bcc3b861 test: cache_flat_mutation_reader: Avoid timestamp clash
api::new_timestamp() is not monotonic. In
test_single_row_and_tombstone_not_cached_single_row_range1, we
generate a deletion and an insertion in the deleted reange. If they
get the same timestamp, the inserted row will be covered.

This will surface after cache starts to compact rows with range tombstones.
2023-01-27 19:15:38 +01:00
Tomasz Grabiec
25683449e4 test: cache_flat_mutation_reader_test: Use monotonic timestamps when inserting rows
When inserting range tombstones, the test uses api::new_timestamp(),
but when inserting rows, it uses a fixed timestamp of 1. This will be
problematic when rows get compacted with range tombstone, all rows
would get compacted away, which is not expected by the test. To fix
this, let's use the same timestamp source as range tombstones. This
way rows will get a later timestamp.
2023-01-27 19:15:38 +01:00
Tomasz Grabiec
71057412ed test: mvcc: Fix sporadic failures due to compact_for_compaction()
compact_for_compaction() will perform cell expiration based on
gc_clock::now(), which introduces sporadic mismatches due to expiry
status of a row marker.

Drop this, we can rely on compaction done by is_equal_to_compacted()
2023-01-27 19:15:38 +01:00
Tomasz Grabiec
f908713290 test: lib: random_mutation_generator: Produce partition tombstone less often
This tombstone has a high chance of obliterating all data, which will
make tests which involve partition version merging not very
interesting. The result will be an empty partition with a
tombstone. Reduce its frequency, so that in MVCC there is a
significant chance of having live data in the combined entry where
individual versions come from the generator.
2023-01-27 19:15:38 +01:00
Tomasz Grabiec
3bf8052be4 test: lib: random_utils: Introduce with_probability() 2023-01-27 19:15:38 +01:00
Tomasz Grabiec
c386874e18 test: lib: Improve error message in has_same_continuity() 2023-01-27 19:15:38 +01:00
Tomasz Grabiec
08f68c5f20 test: mvcc: mvcc_container: Avoid UB in tracker() getter when there is no tracker 2023-01-27 19:15:38 +01:00
Tomasz Grabiec
5aa8cb56a8 test: mvcc: Insert entries in the tracker
evictable snapshots must have all entries added to the
tracker. Partition version merging assumes this. Before this was
benign, but will start to trigger asserts in mutation_partition_v2.
2023-01-27 19:15:38 +01:00
Tomasz Grabiec
9d38997971 test: mvcc_test: Do not set dummy::no on non-clustering rows
This will trigger an assert in apply_monotonically() later in the
series, where this row would be merged with a dummy at the same
position. This row must not be marked as non-dummy, there is an
assumption that non-clustering positions are all dummies. There can't
be two entries with the same position an a different dummy status.
2023-01-27 19:15:38 +01:00
Tomasz Grabiec
f79072638d mutation_partition: Print full position in error report in append_clustered_row()
std::prev(i) can be dummy.
2023-01-27 19:15:38 +01:00
Tomasz Grabiec
6a305666a4 db: mutation_cleaner: Extract make_region_space_guard()
Will be used in more places.
2023-01-27 19:15:38 +01:00
Tomasz Grabiec
833e2a8d30 position_in_partition: Optimize equality check
We can avoid key comparsion if bound weights don't match.
2023-01-27 19:15:38 +01:00
Tomasz Grabiec
95b509afcd mvcc: Fix version merging state resetting
Upon entry to merge_partition_versions() we skip over versions which
are not referenced in order to start merging from the oldest
unreferenced version, which is good for performance. Later, we
reallocate version merging state if we detected such a move, so that
we don't reuse state allocated for a different version pair than
before. This check was using version_no, the counter of skipped
versions to detect this. But this only makes sense if each
merge_partition_versions() uses the same version pointer as a base. In
fact it doesn't, if we skip, we advance _version, so the skip is
persisted in the snapshot. It's enough to discard the version merging
state when we do that.

This shouldn't have effect on existing code base, since there is
currently no way to trigger the version skipping loggic.
2023-01-27 19:15:38 +01:00
Tomasz Grabiec
1c4b5b0b6b mutation_partition: apply_resume: Mark operator bool() as explicit 2023-01-27 19:15:38 +01:00
Anna Stuchlik
70480184ab doc: add the link to the FAQ about pinning to the patch upgrade guides 2022 and 2022 2023-01-27 18:06:54 +01:00
Anna Stuchlik
31515f7604 doc: add a FAQ with a workaround to install a non-latest ScyllaDB version on Debian and Ubuntu 2023-01-27 17:49:00 +01:00
Botond Dénes
84a69b6adb db/view/view_update_check: check_needs_view_update_path(): filter out non-member hosts
We currently don't clean up the system_distributed.view_build_status
table after removed nodes. This can cause false-positive check for
whether view update generation is needed for streaming.
The proper fix is to clean up this table, but that will be more
involved, it even when done, it might not be immediate. So until then
and to be on the safe side, filter out entries belonging to unknown
hosts from said table.

Fixes: #11905
Refs: #11836

Closes #11860
2023-01-27 17:12:45 +03:00
Botond Dénes
e2c9cdb576 mutation_compactor: only pass consumed range-tombstone-change to validator
Currently all consumed range tombstone changes are unconditionally
forwarded to the validator. Even if they are shadowed by a higher level
tombstone and/or purgable. This can result in a situation where a range
tombstone change was seen by the validator but not passed to the
consumer. The validator expects the range tombstone change to be closed
by end-of-partition but the end fragment won't come as the tombstone was
dropped, resulting in a false-positive validation failure.
Fix by only passing tombstones to the validator, that are actually
passed to the consumer too.

Fixes: #12575

Closes #12578
2023-01-27 14:03:45 +01:00
Nadav Har'El
b99b83acdd docs/alternator: fix links to open issues
The docs/alternator/compatibility.md file links to various open issues
on unimplemented features. One of the links was to an already-closed
issue. Replace it by a link to an open issue that was missing.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12649
2023-01-27 14:29:57 +02:00
Pavel Emelyanov
1f9f819c8c table: Remove unused column_family_directory() overload
There's another one that accepts explicit basedir first argument and
that's used by the rest of the code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12643
2023-01-27 14:17:41 +02:00
Nadav Har'El
f873884b50 test/alternator: unskip test which works on modern Scylla
We had one test test_gsi.py::test_gsi_identical that didn't work on KA/LA
sstables due to #6157, so it was skipped. Today, Scylla no longer supports
writing these old sstable formats, so the test can never find itself
running on these versions, so should pass. And indeed it does, and the
"skip" marker can be removed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12651
2023-01-27 14:10:07 +02:00
Botond Dénes
d358d4d9e9 Merge 'Configure sstable_test_env with tempdir' from Pavel Emelyanov
Today's sstable_test_env starts with a default-configured db::config and, thus, sstables_manager. Test cases that run in this env always create a tempdir to store sstable files in on their own. Next patching makes sstable-manager and friends fully control the data-dir path in order to support object storage for sstables in a nice way, and this behavior of tests upsets this ongoing work.

Said that, this PR configures sstable_test_env with a tempdir and pins down the cases using it to stick to that directory, rather than to the custom one.

Closes #12641

* github.com:scylladb/scylladb:
  test: Use tempdir from sstable_test_env
  test: Add tmpdir to sstable test env
  test: Keep db::config as unique pointer
2023-01-27 13:59:12 +02:00
Avi Kivity
df09bf2670 tools: toolchain: dbuild: pass NOFILE limit from host to container
The leak sanitizer has a bug [1] where, if it detects a leak, it
forks something, and before that, it closes all files (instead of
using close_range like a good citizen).

Docker tends to create containers with the NOFILE limit (number of
open files) set to 1 billion.

The resulting 1 billion close() system calls is incredibly slow.

Work around that problem by passing the host NOFILE limit.

[1] https://github.com/llvm/llvm-project/issues/59112

Closes #12638
2023-01-27 13:56:35 +02:00
Benny Halevy
d2893f93cb view: row_lock: lock_ck: try_emplace row_lock entry
Use same method as the two-level lock at the
partition level.  try_emplace will either use
an existing entry, if found, or create a new
entry otherwise.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-27 13:51:48 +02:00
Benny Halevy
4b5e324ecb view: row_lock: lock_ck: find or construct row_lock under partition lock
Since we're potentially searching the row_lock in parallel to acquiring
the read_lock on the partition, we're racing with row_locker::unlock
that may erase the _row_locks entry for the same clustering key, since
there is no lock to protect it up until the partition lock has been
acquired and the lock_partition future is resolved.

This change moves the code to search for or allocate the row lock
_after_ the partition lock has been acquired to make sure we're
synchronously starting the read/write lock function on it, without
yielding, to prevent this use-after-free.

This adds an allocation for copying the clustering key in advance
even if a row_lock entry already exists, that wasn't needed before.
It only us slows down (a bit) when there is contention and the lock
already existed when we want to go locking. In the fast path there
is no contention and then the code already had to create the lock
and copy the key. In any case, the penalty of copying the key once
is tiny compared to the rest of the work that view updates are doing.

This is required on top of 5007ded2c1 as
seen in https://github.com/scylladb/scylladb/issues/12632
which is closely related to #12168 but demonstrates a different race
causing use-after-free.

Fixes #12632

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-27 13:51:46 +02:00
Kamil Braun
fa9cf81af2 test: topology: verify that group 0 and token ring are consistent
After topology changes like removing a node, verify that the set of
group 0 members and token ring members is the same.

Modify `get_token_ring_host_ids` to only return NORMAL members. The
previous version which used the `/storage_service/host_id` endpoint
might have returned non-NORMAL members as well.

Fixes: #12153

Closes #12619
2023-01-27 14:21:14 +03:00
Avi Kivity
f719de3357 Update seastar submodule
* seastar d41af8b59...943c09f86 (20):
  > reactor: disable io_uring on older kernels if not enough lockable memory is available
  > demos/tcp_sctp_client_demo: use user-defined literal for sizes
  > core/units: add user-defined literal for IEC prefixes
  > core/units: include what we use
  > coroutine/exception: do not include core/coroutine.hh
  > seastar/coroutine: drop std-coroutine.hh
  > core/bitops.hh: add type constraits to templates
  > apps/iotune: s/condition == false/!condition/
  > core/metrics_api: s/promehteus/prometheus/
  > reactor: make io_uring the default backend if available
  > tests: connect_test: use 127.0.0.1 for connect refused test
  > reactor: use aio to implement reactor_backend_uring::read()
  > future: schedule: get_available_state_ref under SEASTAR_DEBUG
  > rpc: client_info: add retrieve_auxiliary_opt
  > Merge 'Make http requests with content-length header and generated body' from Pavel Emelyanov
  > Merge 'Ensure logger doesn't allocate' from Travis Downs
  > http, httpd: optimize header field assignment
  > sstring: operator<< std::unordered_map: delete stray space char
  > Dump memory diagnostics at error level on abort
  > Fix CLI help for memory diagnostics dump

Closes #12650
2023-01-26 22:19:24 +02:00
Anna Stuchlik
6ef33f8aae doc: reorganize the content on the Upgrade ScyllaDB page 2023-01-26 13:37:27 +01:00
Botond Dénes
d7ed92bb42 Merge 'Reduce the number of table::make_sstable() overloads' from Pavel Emelyanov
There are several helpers to make an sstable for the table and two with most of the arguments are only used by tests. This PR leaves table with just one arg-less call thus making it easier to patch further.

Closes #12636

* github.com:scylladb/scylladb:
  table: Shrink sstables making API
  tests: Use sstables manager to make sstables
  distributed_loader: Add helpers to make sstables for reshape/reshard
2023-01-26 14:25:21 +02:00
Anna Stuchlik
29536cb064 doc: improve the overview of the upgrade procedure (apply feedback) 2023-01-26 13:09:08 +01:00
Kamil Braun
5eadea301e Merge 'pytest: start after ungraceful stop' from Alecco
If a server is stopped suddenly (i.e. not graceful), schema tables might
be in inconsistent state. Add a test case and enable Scylla
configuration option (force_schema_commit_log) to handle this.

Fixes #12218

Closes #12630

* github.com:scylladb/scylladb:
  pytest: test start after ungraceful stop
  test.py: enable force_schema_commit_log
2023-01-26 12:08:33 +01:00
Kamil Braun
3eabe04f5d test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests
`ScyllaClusterManager` is used to run a sequence of test cases from
a single test file. Between two consecutive tests, if the previous test
left the cluster 'dirty', meaning the cluster cannot be reused, it would
put the old cluster to the pool with `is_dirty=True`, then get a new
cluster from the pool.

Between the `put` and the `get`, a concurrent test run (with its own
instance of `ScyllaClusterManager`) would start, because there was free
space in the pool.

This resulted in undesirable behavior when we ran tests with
`--repeat X` for a large `X`: we would start with e.g. 4 concurrent
runs of a test file, because the pool size was 4. As soon as one of the
runs freed up space in the pool, we would start another concurrent run.
Soon we'd end up with 8 concurrent runs. Then 16 concurrent runs. And so
on. We would have a large number of concurrent runs, even though the
original 4 runs didn't finish yet. All of these concurrent runs would
compete waiting on the pool, and waiting for space in the pool would
take longer and longer (the duration is linear w.r.t number of
concurrent competing runs). Tests would then time out because they would
have to wait too long.

Fix that by using the new `replace_dirty` function introduced to the
pool. This function frees up space by returning a dirty cluster and then
immediately takes it away to be used for a new cluster. Thanks to this,
we will only have at most as many concurrent runs as the pool size. For
example with --repeat 8 and pool size 4, we would run 4 concurrent runs
and start the 5th run only when one of the original 4 runs finishes,
then the 6th run when a second run finishes and so on.

Fixes #11757
2023-01-26 11:58:00 +01:00
Kamil Braun
b5ef57ecc2 test/pylib: pool: introduce replace_dirty
Used to atomically return a dirty object to the pool and then use the
space freed by this object to get another object. Unlike
`put(is_dirty=True)` followed by `get`, a concurrent waiter cannot take
away our space from us.

A piece of `get` was refactored to a private function `_build_and_get`,
this piece is also used in `replace_dirty`.
2023-01-26 11:58:00 +01:00
Kamil Braun
858803cc2c test/pylib: pool: replace steal with put(is_dirty=True)
The pool usage was kind of awkward previously: if the user of a pool
decided that a previously borrowed object should no longer be used,
it was their responsibility to destroy the object (releasing associated
resources and so on) and then call `steal()` on the pool to free space
for a new object.

Change the interface. Now the `Pool` constructor obtains a `destroy`
function additionally to the `build` function. The user calls the
function `put` to return both objects that are still usable and those
aren't. For the latter, they set `is_dirty=True`. The pool will
'destroy' the object with the provided function, which could mean e.g.
releasing associated resources.

For example, instead of:
```
if self.cluster.is_dirty:
    self.clusters.stop()
    self.clusters.release_ips()
    self.clusters.steal()
else:
    self.clusters.put(self.cluster)
```
we can now use:
```
self.clusters.put(self.cluster, is_dirty=self.cluster.is_dirty)
```
(assuming that `self.clusters` is a pool constructed with a `destroy`
function that stops the cluster and releases its IPs.)

Also extend the interface of the context manager obtained by
`instance()` - the user must now pass a flag `dirty_on_exception`. If
the context manager exists due to an exception and that flag was `True`,
the object will be considered dirty. The dirty flag can also be set
manually on the context manager. For example:
```
async with (cm := pool.instance(dirty_on_exception=True)) as server:
    cm.dirty = await run_test(test, server)
    # It will also be considered dirty if run_test throws an exception
```
2023-01-26 11:58:00 +01:00
Pavel Emelyanov
dd307d8a42 test: Use tempdir from sstable_test_env
The test cases in sstable_directory_test use a temporary directory that
differs from the one sstables manager starts over. Fix that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-01-26 11:47:06 +03:00
Pavel Emelyanov
0c3799db71 test: Add tmpdir to sstable test env
This adds the test/lib's tmpdir instance _and_ configures the
data_file_directories with this path. This makes sure sstables manager
and the rest of the test use the same directory for sstables. For now
it doesn't change anything, but helps next patching.

(A neat side effect of this change is that sstable_test_env is now
 configured the same way as cql_test_env does)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-01-26 11:47:06 +03:00
Pavel Emelyanov
9f4efd6b6f table: Shrink sstables making API
Currently there are four helpers, this patch makes it just two and one
of them becomes private the table thus making the API small and neat
(and easy to patch further).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-01-26 10:47:39 +03:00
Pavel Emelyanov
fd559f3b81 tests: Use sstables manager to make sstables
This test uses two many-args helpers from table calss to create sstables
with desired parameters. The table API in question is not used by any
other code but these few places, to it's better to open-code it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-01-26 10:47:39 +03:00
Pavel Emelyanov
bfddfb8927 distributed_loader: Add helpers to make sstables for reshape/reshard
This kills two birds with one stone. First, it factors out (quite a lot
of) common arguments that are passed to table.make_sstable(). Second, it
makes the helpers call sstable manager with extended args making it
possible to remove those wrappers from table class later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-01-26 10:47:39 +03:00
Botond Dénes
ba26770376 tools/schema_loader: data_dictionary_impl:try_find_table(): also check ks name
Although the number of keyspaces should mostly be 1 here, and thus the
chance of two tables from different keyspaces colliding is miniscule, it
is not zero. Better be safe than sorry, so match the keyspace name too
when looking up a table.

Closes #12627
2023-01-25 22:04:07 +02:00
Raphael S. Carvalho
87ee547120 table: Fix quadratic behavior when inserting sstables into tracker on schema change
Each time backlog tracker is informed about a new or old sstable, it
will recompute the static part of backlog which complexity is
proportional to the total number of sstables.
On schema change, we're calling backlog_tracker::replace_sstables()
for each existing sstable, therefore it produces O(N ^ 2) complexity.

Fixes #12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12593
2023-01-25 21:57:33 +02:00
Botond Dénes
bdd4b25c61 scylla-gdb.py: scylla memory: remove 'sstable reads' from semaphore names
This phrase is inaccurate and unnecessary. We know all lines in the
printout are for reads and they are semaphores: no need to repeat this
information on each line.
Example:

  Read Concurrency Semaphores:
    read:              0/100,             0/     41901096, queued: 0
    streaming:         0/ 10,             0/     41901096, queued: 0
    system:            0/ 10,             0/     41901096, queued: 0

Closes #12633
2023-01-25 21:55:27 +02:00
Nadav Har'El
f4f2d608d7 dbuild: fix path in example in README
The dbuild README has an example how to enable ccache, and required
modifying the PATH. Since recently, our docker image includes
required commands (cxxbridge) in /usr/local/bin, so the build will
fail if that directory isn't also in the path - so add it in the
example.

Also use the opportunity to fix the "/home/nyh" in one example to
"$HOME".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12631
2023-01-25 21:54:44 +02:00
Pavel Emelyanov
9ccae1be18 test: Keep db::config as unique pointer
The goal is to make it possible to make config with custom-initialized
options in test_env::impl's constructor initializer list (next patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-01-25 19:38:47 +03:00
Kamil Braun
a0ff33e777 test/pylib: scylla_cluster: don't leak server if stopping it fails
`ScyllaCluster.server_stop` had this piece of code:
```
        server = self.running.pop(server_id)
        if gracefully:
            await server.stop_gracefully()
        else:
            await server.stop()
        self.stopped[server_id] = server
```

We observed `stop_gracefully()` failing due to a server hanging during
shutdown. We then ended up in a state where neither `self.running` nor
`self.stopped` had this server. Later, when releasing the cluster and
its IPs, we would release that server's IP - but the server might have
still been running (all servers in `self.running` are killed before
releasing IPs, but this one wasn't in `self.running`).

Fix this by popping the server from `self.running` only after
`stop_gracefully`/`stop` finishes.

Make an analogous fix in `server_start`: put `server` into
`self.running` *before* we actually start it. If the start fails, the
server will be considered "running" even though it isn't necessarily,
but that is OK - if it isn't running, then trying to stop it later will
simply do nothing; if it is actually running, we will kill it (which we
should do) when clearing after the cluster; and we don't leak it.

Closes #12613
2023-01-25 16:58:02 +02:00
Alejo Sanchez
878cb45c24 pytest: test start after ungraceful stop
Test case for a start of a server after it was stopped suddenly (instead
of gracefully). This coud cause commitlog flush issues.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-01-25 14:49:27 +01:00
Alejo Sanchez
ccbd89f0cd test.py: enable force_schema_commit_log
To handle start after ungraceful stop, enable separate schema commit log
from server start.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-01-25 14:49:27 +01:00
Kamil Braun
5c886e59de Merge 'Enable Raft by default in new clusters' from Kamil Braun
New clusters that use a fresh conf/scylla.yaml will have `consistent_cluster_management: true`, which will enable Raft, unless the user explicitly turns it off before booting the cluster.

People using existing yaml files will continue without Raft, unless consistent_cluster_management is explicitly requested during/after upgrade.

Also update the docs: cluster creation and node addition procedures.

Fixes #12572.

Closes #12585

* github.com:scylladb/scylladb:
  docs: mention `consistent_cluster_management` for creating cluster and adding node procedures
  conf: enable `consistent_cluster_management` by default
2023-01-25 14:09:38 +01:00
Benny Halevy
82011fc489 dht: incremental_owned_ranges_checker: belongs_to_current_node: mark as const
Its _it member keeps state about the current range.
Although it's modified by the method, this is an implementation
detail that irrelevant to the caller, hence mark the
belongs_to_current_node method as const (and noexcept while
at it).

This allows the caller, cleanup_compaction, to use it from
inside a const method, without having to mark
its respective member as mutable too.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12634
2023-01-25 14:52:21 +02:00
Alexey Novikov
ce96b472d3 prevent populating cache with expired rows from sstables
change row purge condition for compacting_reader to remove all expired
rows to avoid read perfomance problems when there are many expired
tombstones in row cache

Refs #2252

Closes #12565
2023-01-25 12:59:40 +01:00
Kamil Braun
5bc7f0732e Merge 'test.py: manual cluster pool handling for Python suite' from Alecco
From reviews of https://github.com/scylladb/scylladb/pull/12569, avoid
using `async with` and access the `Pool` of clusters with
`get()`/`put()`.

Closes #12612

* github.com:scylladb/scylladb:
  test.py: manual cluster handling for PythonSuite
  test.py: stop cluster if PythonSuite fails to start
  test.py: minor fix for failed PythonSuite test
2023-01-24 17:37:55 +01:00
Nadav Har'El
b28818db06 Merge 'Make regexes in types.cc static and remove unnecessary tolower transform' from Marcin Maliszkiewicz
- makes all regexes static

If making regex compilation static
for uuid_type_impl and timeuuid_type_impl helps then it should
also help for timestamp_type and simple_date_type.

-  remove unnecessary tolower transform in simple_date_type_impl::from_sstring

Following function uses only decimal and '-' characters (see date_re). They are not
affected by tolower call in any way.

Aditionally std::strtoll supports "0x" prefixes but also accepts
upper case version "0X" so it's also not affected by tolower call.

get_simple_date_time only casts strings to integer types using
boost:lexical_cast so also not affected by tolower.

Finally, serialize only uses str to include it in an exception text
so tolower doesn't affect it in a positive way. It's even better
that input is displayed to the user as it was, not converted to lower
case.

Closes #12621

* github.com:scylladb/scylladb:
  types: remove unnecessary tolower transform in simple_date_type_impl::from_sstring
  types: make all regexes static
2023-01-24 16:13:59 +02:00
Pavel Emelyanov
f6e8b64334 snitch: Use set_my_dc_and_rack() on all shards
Most of snitch drivers set _my_dc and _my_rack with direct assignment
thus skipping the sanity checks for dc/rack being empty. On other shards
they call set_my_dc_and_rack() helper which warns the empty value and
replaces it with some defaults.

It's better to use the helper on all shards in order to have the same
dc/rack values everywhere.

refs: #12185

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12524
2023-01-24 14:17:06 +02:00
Nadav Har'El
55558e1bd7 test/alternator: check operation on invalid TableName
Issue #12538 suggested that maybe Alternator shouldn't bother reporting an
invalid table name in item operations like PutItem, and that it's enough
to report that the table doesn't exist. But the test added in this patch
shows that DynamoDB, like Alternator, reports the invalid table name in
this case - not just that the table doesn't exist.

That should make us think twice before acting on issue #12538. If we do
what this issue recommended, this test will need to be fixed (e.g., to
accept as correct both types of errors).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12608
2023-01-24 14:14:39 +02:00
Kefu Chai
4a0134a097 db: system_keyspace: take the reserved_memory into account
before this change, we returns the total memory managed by Seastar
in the "total" field in system.memory. but this value only reflect
the total memory managed by Seastar's allocator. if
`reserve_additional_memory` is set when starting app_template,
Seastar's memory subsystem just reserves a chunk of memory of this
specified size for system, and takes the remaining memory. since
f05d612da8, we set this value to 50MB for wasmtime runtime. hence
the test of `TestRuntimeInfoTable.test_default_content` in dtest
fails. the test expects the size passed via the option of
`--memory` to be identical to the value reported by system.memory's
"total" field.

after this change, the "total" field takes the reserved memory
for wasm udf into account. the "total" field should reflect the total
size of memory used by Scylla, no matter how we use a certain portion
of the allocated memory.

Fixes #12522
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12573
2023-01-24 14:07:44 +02:00
Anna Stuchlik
3cbe657b24 doc: fixes https://github.com/scylladb/scylla-docs/issues/3706, v2 of https://github.com/scylladb/scylladb/pull/11638, add a note about performance penalty in non-frozen connections vs frozen connections and UDT, add a link to the blog post about performance
Closes #12583
2023-01-24 13:16:58 +02:00
Nadav Har'El
158be3604d test/alternator: xfailing test for huge Limit in ListStreams
DynamoDB Streams limits the "Limit" parameter of ListStreams to 100 -
anything larger will result in an error. Scylla doesn't necessarily
need to uphold the same limit, but we should uphold *some* limit, as
not having any limit can result (in the theoretical case of a huge
number of tables with streams enabled) in an unbounded response size.

So here we add a test to check that a Limit of 100,000 is not allowed.
It passes on DynamoDB (in fact, any number higher than 100 will be
enough threre) but fails on Alternator, so is marked "xfail".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-01-24 12:38:18 +02:00
Nadav Har'El
3beafd8441 alternator/test: un-skip test of zero Limit in ListStreams
We had a skipped test on how Alternator handles Limit=0 for ListStreams
which should be reported as an error. We had to skip it because boto3
did us a "favor" of discovering this parameter error before ever sending
it to the server. We discovered long ago how to avoid this client-side
checking in boto3, but only used it for the "dynamodb" fixture and
forgot to copy the same trick to the "dynamodbstreams" fixture - and
in this patch we do, and can run this test successfully.

While at it, also copy the extented timeout configuration we had in
the dynamodb fixture also to the dynamodbstreams fixture. There is
no reason why it should be different.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-01-24 12:38:18 +02:00
Alejo Sanchez
f236d518c6 test.py: manual cluster handling for PythonSuite
Instead of complex async with logic, use manual cluster pool handling.

Revert the discard() logic in Pool from a recent commit.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-01-24 11:38:17 +01:00
Alejo Sanchez
a6059e4bb7 test.py: stop cluster if PythonSuite fails to start
If cluster fails to start, stop it.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-01-24 11:36:49 +01:00
Alejo Sanchez
dec0c1d9f6 test.py: minor fix for failed PythonSuite test
Even though test can't fail both before and after, make the logic
explicit in case code changes in the future.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-01-24 11:36:49 +01:00
Kefu Chai
232c73a077 doc: add PREVIEW_HOST Make variable
add Make variable named `PREVIEW_HOST` so it can be overriden like
```
make preview PREVIEW_HOST=$(hostname -I | cut -d' ' -f 1)
```
it allows developer to preview the document if the host buiding the
document is not localhost.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12589
2023-01-24 12:27:33 +02:00
Botond Dénes
cfaec4428b Merge 'Remove qctx from system_keyspace::increment_and_get_generation()' from Pavel Emelyanov
It's a simple helper used during boot-time that can enjoy query-processor from sharded<system_keyspace>

Closes #12587

* github.com:scylladb/scylladb:
  system_keyspace: De-static system_keyspace::increment_and_get_generation
  system_keyspace: Fix indentation after previous patch
  system_keyspace: Coroutinize system_keyspace::increment_and_get_generation
2023-01-24 12:17:12 +02:00
Marcin Maliszkiewicz
f4de64957b types: remove unnecessary tolower transform in simple_date_type_impl::from_sstring
Following function uses only decimal and '-' characters (see date_re). They are not
affected by tolower call in any way.

Aditionally std::strtoll supports "0x" prefixes but also accepts
upper case version "0X" so it's also not affected by tolower call.

get_simple_date_time only casts strings to integer types using
boost:lexical_cast so also not affected by tolower.

Finally, serialize only uses str to include it in an exception text
so tolower doesn't affect it in a positive way. It's even better
that input is displayed to the user as it was, not converted to lower
case.
2023-01-24 10:50:13 +01:00
Calle Wilund
a079c3dbbe alternator::streams: Special case single table in list_streams
Avoid iterating all tables (at least multiple times).
2023-01-24 09:14:33 +00:00
Calle Wilund
9412d8f259 alternator::streams: Only sort tables iff limit < # tables or ExclusiveStartStreamArn set
Avoid sorts for request that will be answered immediately.
2023-01-24 08:48:20 +00:00
Avi Kivity
49157370bc build: don't force-disable io_uring in Seastar
The reasons for force-disabling are doubly wrong: we now
use liburing from Fedora 37, which is sufficiently recent,
and the auto-detection code will disable io_uring if a
sufficiently recent version isn't present.

Closes #12620
2023-01-24 10:32:00 +02:00
Calle Wilund
9886788a46 alternator::streams: Set default list_streams limit to 100 as per spec
AWS docs says so.
2023-01-24 08:24:42 +00:00
Kamil Braun
54170749b8 service/raft: raft_group0: prevent double abort
There was a small chance that we called `timeout_src.request_abort()`
twice in the `with_timeout` function, first by timeout and then by
shutdown. `abort_source` fails on an assertion in this case. Fix this.

Fixes: #12512

Closes #12514
2023-01-23 21:32:21 +01:00
Marcin Maliszkiewicz
76c1d0e5d3 types: make all regexes static
If making regex compilation static for uuid_type_impl and
timeuuid_type_impl helps then it should also help for timestamp_type
and simple_date_type.
2023-01-23 20:37:32 +01:00
Nadav Har'El
634c3d81f5 Merge 'doc: add the general upgrade policy' from Anna Stuchlik
Fix https://github.com/scylladb/scylla-docs/issues/3968

This PR adds the information that an upgrade to each successive major version is required to upgrade from an old ScyllaDB version.

Closes #12586

* github.com:scylladb/scylladb:
  docs: remove repetition
  doc: add the general upgrade policy to the uprage page
2023-01-23 18:34:59 +02:00
Benny Halevy
008ca37d28 sstable_directory: reindent reshard
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-23 17:30:05 +02:00
Benny Halevy
792bc58fce sstable_directory: coroutinize reshard
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-23 17:29:49 +02:00
Nadav Har'El
ccc2c6b5dd Merge 'test/pylib: scylla_cluster: improve server startup check' from Kamil Braun
Don't use a range scan, which is very inefficient, to perform a query for checking CQL availability.

Improve logging when waiting for server startup times out. Provide details about the failure: whether we managed to obtain the Host ID of the server and whether we managed to establish a CQL connection.

Closes #12588

* github.com:scylladb/scylladb:
  test/pylib: scylla_cluster: better logging for timeout on server startup
  test/pylib: scylla_cluster: use less expensive query to check for CQL availability
2023-01-23 17:00:52 +02:00
Kamil Braun
8a1ea6c49f test/pylib: scylla_cluster: better logging for timeout on server startup
Waiting for server startup is a multi-step procedure: after we start the
actual process, we will:
- try to obtain the Host ID (by querying a REST API endpoint)
- then try to connect a CQL session
- then try to perform a CQL query

The steps are repeated every .1 second until we reach a timeout (the
Host ID step is skipped if we previously managed to obtain it).

On timeout we'd only get a generic "failed to start server" message, it
wouldn't say what we managed to do and what not.

For example, on one of the failed jobs on Jenkins I observed this
timeout error. Looking at the logs of the server, it turned out that the
server printed the "initialization completed" message more than 2
minutes before the actual timeout happened. So for 2 minutes, the test
framework either couldn't obtain the Host ID, or couldn't establish a
CQL connection, or couldn't perform a CQL query, but I wasn't able to
determine fully which one of these was the case.

Improve the code by printing whether we managed to get the Host ID of
the server and if so - whether we managed to connect to CQL.
2023-01-23 15:59:42 +01:00
Kamil Braun
0e591606a5 test/pylib: scylla_cluster: use less expensive query to check for CQL availability
The previous CQL query used a range scan which is very inefficient, even
for local tables.

Also add a comment explaining why we need this query.
2023-01-23 15:59:05 +01:00
Avi Kivity
3f887fa24b Merge 'doc: remove duplicatiom of the ScyllaDB ports (table)' from Anna Stuchlik
Fix https://github.com/scylladb/scylladb/issues/12605#event-8328930604

This PR removes the duplicated content (the file with the table was included twice) and reorganizes the content in the Networking section.

Closes #12615

* github.com:scylladb/scylladb:
  doc: fix the broken link
  doc: replace Scylla with ScyllaDB
  doc: remove duplication in the Networking section (the table of ports used by ScyllaDB
2023-01-23 16:27:06 +02:00
Anna Stuchlik
30f3ee6138 doc: fix the broken link 2023-01-23 14:43:07 +01:00
Anna Stuchlik
1dd0fb8c2d doc: replace Scylla with ScyllaDB 2023-01-23 14:40:36 +01:00
Anna Stuchlik
d881b3c498 doc: remove duplication in the Networking section (the table of ports used by ScyllaDB 2023-01-23 14:39:01 +01:00
Calle Wilund
da8adb4d26 alterator::streams: Sort tables in list_streams to ensure no duplicates
Fixes #12601 (maybe?)

Sort the set of tables on ID. This should ensure we never
generate duplicates in a paged listing here. Can obviously miss things if they
are added between paged calls and end up with a "smaller" UUID/ARN, but that
is to be expected.
2023-01-23 11:41:40 +00:00
Benny Halevy
1123565eb0 table: perform_cleanup_compaction: trim owned ranges on compaction_group boundaries
To cleanup tokens in sstables that are not owned
by the compaction group.  This may happen in the future
after a compaction group split if copying / linking
the sstables in the original compaction_group to
the split compaction_groups.

Fixes #12594

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-22 22:54:26 +02:00
Benny Halevy
95a8e0b21d table: make_compaction_groups: calculate compaction_group token ranges
Add dht::split_token_range_msb that returns a token_range_vector
with ranges split using a given number of most-significant bits.

When creating the table's compaction groups, use dht::split_token_range_msb
to calculate the token_range owned by each compaction_group.

Refs #12594

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-22 22:54:26 +02:00
Benny Halevy
912b56ebcf dht: range_streamer: define logger as static
dht::logger can't be global in this case,
as it's too generic, but should be static
to range_streamer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-22 22:54:26 +02:00
Nadav Har'El
54f174a1f4 Merge 'test.py: handle broken clusters for Python suite' from Alecco
If the after test check fails (is_after_test_ok is False), discard the cluster and raise exception so context manager (pool) does not recycle it.

Ignore exception re-raised by the context manager.

Fixes #12360

Closes #12569

* github.com:scylladb/scylladb:
  test.py: handle broken clusters for Python suite
  test.py: Pool discard method
2023-01-22 19:58:12 +02:00
Benny Halevy
8009585e7d table: compaction_group_for_token: use signed arithmetic
Add and use dht::compaction_group_of that computes the
compaction_group index by unbiasing the token,
similar to dht::shard_of.

This way, all tokens in `_compaction_groups[i]` are ordered
before `_compaction_groups[j]` iff i < j.

Fixes #12595

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12599
2023-01-22 11:27:07 +02:00
Pavel Emelyanov
be2ad2fe99 system_keyspace: De-static system_keyspace::increment_and_get_generation
It's only called on cluster-join from storage_service which has the
local system_keyspace reference and it's already started by that time.

This allows removing few more occurrences of global qctx.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-01-20 17:24:22 +03:00
Pavel Emelyanov
4c4f8aa3e1 system_keyspace: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-01-20 17:24:22 +03:00
Pavel Emelyanov
b0edc07339 system_keyspace: Coroutinize system_keyspace::increment_and_get_generation
Just unroll the fn().then({ fn2().then().then(); }); chain.
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-01-20 17:24:10 +03:00
Botond Dénes
ebc100f74f types: is_tuple(): handle reverse types
Currently reverse types match the default case (false), even though they
might be wrapping a tuple type. One user-visible effect of this is that
a schema, which has a reversed<frozen<UDT>> clustering key component,
will have this component incorrectly represented in the schema cql dump:
the UDT will loose the frozen attribute. When attempting to recreate
this schema based on the dump, it will fail as the only frozen UDTs are
allowed in primary key components.

Fixes: #12576

Closes #12579
2023-01-20 15:50:58 +02:00
Anna Stuchlik
0a91578875 docs: remove repetition 2023-01-20 14:45:59 +01:00
Anna Stuchlik
2c357a7007 doc: add the general upgrade policy to the uprage page 2023-01-20 14:43:26 +01:00
Botond Dénes
7f9b39009c reader_concurrency_semaphore_test: leak test: relax iteration limit
This test creates random dummy reads and simulates a query with them.
The test works in terms of iteration (tick), advancing each simulating
read in each iteration. To prevent infinite runtime an iteration limit
of 100 was added to detect a non-converging test and kill it. This limit
proved too strict however and in this patch we bump it to 1000 to
prevent some unlucky seed making this test fail, as seen recently in CI.

Closes #12580
2023-01-20 15:39:13 +02:00
Kamil Braun
050614f34d docs: mention consistent_cluster_management for creating cluster and adding node procedures 2023-01-20 13:29:25 +01:00
Kamil Braun
b0313e670b conf: enable consistent_cluster_management by default
Raft will be turned on by default in new clusters.

Fixes #12572
2023-01-20 13:29:06 +01:00
Botond Dénes
0d64f327e1 Merge 'gdb: Introduce 'scylla range-tombstones' command' from Tomasz Grabiec
Prints and validates range tombstones in a given container.

Currently supported containers:

 - mutation_partition

Example:
```
    (gdb) scylla range-tombstones $mp
    {
      start: ['a', 'b'],
      kind: bound_kind::excl_start,
      end: ['a', 'b'],
      kind: bound_kind::incl_end,
      t: {timestamp = 1672546889091665, deletion_time = {__d = {__r = 1672546889}}}
    }
    {
      start: ['a', 'b'],
      kind: bound_kind::excl_start,
      end: ['a', 'c']
      kind: bound_kind::incl_end,
      t: {timestamp = 1673731764010123, deletion_time = {__d = {__r = 1673731764}}}
    }
```

Closes #12571

* github.com:scylladb/scylladb:
  gdb: Introduce 'scylla range-tombstones'
  gdb: Introduce 'scylla set-schema'
  gdb: Extract purse_bytes() in managed_bytes_printer
2023-01-20 11:21:34 +02:00
Nadav Har'El
3d78dbd9f2 test/cql-pytest: regression tests for null lookup in local SI
We noticed that old branches of Scylla had problems with looking up a
null value in a local secondary index - hanging or crashing. This patch
includes tests to reproduce these bugs. The tests pass on current
master - apparently this bug has already been fixed, but we didn't
have a regression test for it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12570
2023-01-19 23:58:33 +02:00
Alejo Sanchez
51e84508ee test.py: handle broken clusters for Python suite
If the after test check fails (!is_after_test_ok), discard the cluster
and raise exception so context manager (pool) does not recycle it.

Ignore Pool exception re-raised by the context manager.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-01-19 21:43:50 +01:00
Alejo Sanchez
c886a05b37 test.py: Pool discard method
Add a context manager discard() method to tell it to discard the object.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-01-19 21:43:45 +01:00
Avi Kivity
b4d91d87db Merge 'build: fix build problems in Nix development environment' from Piotr Grabowski
This PR fixes three problems that prevented/could prevent a successful build in ScyllaDB's Nix development environment.

The first commit adds a missing `abseil-cpp` dependency to Nix devenv, as this dependency is now required after 8635d2442.

The second commit bumps the version of Lua from 5.3 to 5.4, as after 9dd5107919 a 4-argument version of `lua_resume` (only available in Lua 5.4) is used in the ScyllaDB codebase.

The third commit explicitly adds `rustc` to Nix devenv dependencies. This places `rustc` from nixpkgs on the `PATH`, preventing `cargo` from executing `rustc` installed globally on the system (see the commit message for additional reasoning).

After those changes, ScyllaDB can be succesfully built in both `nix-shell .` and `nix develop .` environments.

Closes #12568

* github.com:scylladb/scylladb:
  build: explicitly add rustc to Nix devenv
  build: bump Lua version (5.3 -> 5.4) in Nix devenv
  build: add abseil-cpp dependency to Nix devenv
2023-01-19 21:52:37 +02:00
Tomasz Grabiec
95547162c0 gdb: Introduce 'scylla range-tombstones'
Prints and validates range tombstones in a given container.

Currently supported containers:
    - mutation_partition

Example:

    (gdb) scylla range-tombstones $mp
    {
      start: ['a', 'b'],
      kind: bound_kind::excl_start,
      end: ['a', 'b'],
      kind: bound_kind::incl_end,
      t: {timestamp = 1672546889091665, deletion_time = {__d = {__r = 1672546889}}}
    }
    {
      start: ['a', 'b'],
      kind: bound_kind::excl_start,
      end: ['a', 'c']
      kind: bound_kind::incl_end,
      t: {timestamp = 1673731764010123, deletion_time = {__d = {__r = 1673731764}}}
    }
2023-01-19 19:58:13 +01:00
Tomasz Grabiec
f759b35596 gdb: Introduce 'scylla set-schema'
Sets the current schema to be used by schema-aware commands.

Setting the schema allows some commands and printers to interpret
schema-dependent objects and present them in a more friendly form.

Some commands require schema to work, for example to sort keys, and
will fail otherwise.
2023-01-19 19:58:13 +01:00
Tomasz Grabiec
797bc7915d gdb: Extract purse_bytes() in managed_bytes_printer 2023-01-19 19:58:13 +01:00
Kamil Braun
2f84e820fd test/pylib: scylla_cluster: return error details from test framework endpoints
If an endpoint handler throws an exception, the details of the exception
are not returned to the client. Normally this is desirable so that
information is not leaked, but in this test framework we do want to
return the details to the client so it can log a useful error message.

Do it by wrapping every handler into a catch clause that returns
the exception message.

Also modify a bit how HTTPErrors are rendered so it's easier to discern
the actual body of the error from other details (such as the params used
to make the request etc.)

Before:
```
E test.pylib.rest_client.HTTPError: HTTP error 500: 500 Internal Server Error
E
E Server got itself in trouble, params None, json None, uri http+unix://api/cluster/before-test/test_stuff
```

After:
```
E test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/before-test/test_stuff, params: None, json: None, body:
E Failed to start server at host 127.155.129.1.
E Check the log files:
E /home/kbraun/dev/scylladb/testlog/test.py.dev.log
E /home/kbraun/dev/scylladb/testlog/dev/scylla-1.log
```

Closes #12563
2023-01-19 17:47:13 +02:00
Kamil Braun
3ed3966f13 test/pylib: scylla_cluster: release cluster IPs when stopping ScyllaClusterManager
When we obtained a new cluster for a test case after the previous test
case left a dirty cluster, we would release the old cluster's used IP
addresses (`_before_test` function). However, we would not release the
last cluster's IP after the last test case. We would run out of IPs with
sufficiently many test files or `--repeat` runs. Fix this.

Also reorder the operations a bit: stop the cluster (and release its
IPs) before freeing up space in the cluster pool (i.e. call
`self.cluster.stop()` before `self.clusters.steal()`). This reduces
concurrency a bit - fewer Scyllas running at the same time, which is
good (the pool size gives a limit on the desired max number of
concurrently running clusters). Killing a cluster is quick so it won't
make a significant difference for the next guy waiting on the pool.

Closes #12564
2023-01-19 17:46:46 +02:00
Piotr Grabowski
4068efa173 build: explicitly add rustc to Nix devenv
Before this patch, "cargo" was the only Rust toolchain dependency in Nix
development environment. Due to the way "cargo" tool is packaged in Nix,
"cargo" would first try to use "rustc" from PATH (for example some
version already installed globally on OS). If it didn't find any, it
would fallback to "rustc" from nixpkgs.

There are issues with such approach:
- "rustc" installed globally on the system could be old.
- the goal of having a Nix development environment is that such
  environment is separate from the programs installed globally on the
  system and the versions of all tools are pinned (via flake.lock).

Fix this problem by adding rustc to nativeBuildInputs in default.nix.
After this patch, "rustc" from nixpkgs is present on the PATH
(potentially overriding "rustc" already installed on the system), so
"cargo" can correctly use it.

You can validate this behavior experimentally by adding a fake failing
rustc before entering the Nix development environment:

  mkdir fakerustc
  echo '#!/bin/bash' >> fakerustc/rustc
  echo 'exit 1' >> fakerustc/rustc
  chmod +x fakerustc/rustc 
  export PATH=$(pwd)/fakerustc:$PATH

  nix-shell .
2023-01-19 15:53:49 +01:00
Piotr Grabowski
1b8a6b160e build: bump Lua version (5.3 -> 5.4) in Nix devenv
A recent commit (9dd5107919) started using a 4-argument version of
lua_resume, which is only available in Lua 5.4. This caused build
problems when trying to build Scylla in Nix development environment:

  tools/lua_sstable_consumer.cc:1292:19: error: no matching function for call to 'lua_resume'
              ret = lua_resume(l, nullptr, nargs, &nresults);
                    ^~~~~~~~~~
  /nix/store/wiz3xb19x2pv7j3hf29rbafm4s5zp2kx-lua-5.3.6/include/lua.h:290:15: note: candidate function not viable: requires 3 arguments, but 4 were provided
  LUA_API int  (lua_resume)     (lua_State *L, lua_State *from, int narg);
                ^
  1 error generated.

Fix the problem by bumping the version of Lua from 5.3 to 5.4 in
default.nix. Since "lua54Packages.lua" was added to nixpkgs fairly
recently (NixOS/nixpkgs#207862), flake.lock is updated to get the newest
version of nixpkgs (updated using "nix flake update" command).
2023-01-19 15:53:49 +01:00
Marcin Maliszkiewicz
7230841431 alternator: unify json streaming heuristic
Main assumption here is that if is_big is good enough for
GetBatchItems operation it should work well also for Scan,
Query and GetRecords. And it's easier to maintain more unified
code.

Additionally 'future<> print' documentation used for streaming
suggests that there is quite big overhead so since it seems the
only motivation for streaming was to reduce contiguous allocation
size below some threshold we should not stream when this threshold
is not exceeded.

Closes #12164
2023-01-19 16:40:43 +02:00
Anna Stuchlik
20f7848661 docs: add a missing redirection for the Cqlsh page
This PR is not related to any reported issue in the repo.
I've just discovered a broken link in the university caused by a
missing redirection.

Closes #12567
2023-01-19 16:37:58 +02:00
Piotr Grabowski
fbc042ff02 build: add abseil-cpp dependency to Nix devenv
After 8635d2442 commit, the abseil submodule was removed in favor of
using pre-built abseil distribution. Installation of abseil-cpp was
added to install-dependencies.sh and dbuild image, but no change was
made to the Nix development environment, which resulted in error
while executing ./configure.py (while in Nix devenv):

  Package absl_raw_hash_set was not found in the pkg-config search path.
  Perhaps you should add the directory containing `absl_raw_hash_set.pc'
  to the PKG_CONFIG_PATH environment variable
  No package 'absl_raw_hash_set' found

Fix the issue by adding "abseil-cpp" to buildInputs in default.nix.
2023-01-19 15:03:55 +01:00
Nadav Har'El
18be50582d test/cql-pytest: add tests for behavior of unset values
Recently, commit 0b418fa made the checking for "unset" values more
centralized and more robust, but as the tests added in this patch
show, the situation is good (and in particular, that #10358 is
solved).

The tests in this patch check that the behavior of "unset" values in
the CQL v4 protocol matches Cassandra's behavior and its documentation,
and how it compares to our wishes of how we want unset values to behave.

One of these tests fail on Cassandra (we consider this a Cassandra bug).
One test fails on Scylla because it doesn't yet support arithmetic
expressions (Refs #2693).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12534
2023-01-19 15:48:07 +02:00
Nadav Har'El
9433108158 Merge 'Allow transient list values to contain NULLs' from Avi Kivity
The CQL protocol and specification call for lists with NULLs in
some places. For example, the statement:

```cql
UPDATE tab
SET x = 3
IF y IN (1, 2,  NULL)
WHERE pk = 4
```

has a list `(1, 2, NULL)` that contains NULL. Although the syntax is tuple-like, the value is a list;
consider the same statement as a prepared statement:

```cql
UPDATE tab
SET x = :x
IF y IN :y_values
WHERE pk = :pk
```

`:y_values` must have a list type, since the number of elements is unknown.

Currently, this is done with special paths inside LWT that bypass normal
evaluation, but if we want to unify those paths, we must allow NULLs in
lists (except in storage). This series does that.

Closes #12411

* github.com:scylladb/scylladb:
  test: materialized view: add test exercising synthetic empty-type columns
  cql3: expr: relax evaluate_list() to allow allow NULL elements
  types: allow lists with NULL
  test: relax NULL check test predicate
  cql3, types: validate listlike collections (sets, lists) for storage
  types: make empty type deserialize to non-null value
2023-01-19 15:15:16 +02:00
Botond Dénes
d661d03057 Merge 'main, test: integrate perf tools into scylla' from Kefu Chai
following tests are integrated into scylla executable

- perf_fast_forward
- perf_row_cache_update
- perf_simple_query
- perf_row_cache_update
- perf_sstable

before this change
```console
$ size build/release/scylla
   text    data     bss     dec     hex filename
82284664         288960  335897 82909521        4f11951 build/release/scylla
$ ls -l build/release/scylla
-rwxrwxr-x 1 kefu kefu 1719672112 Jan 19 17:51 build/release/scylla
```
after this change
```console
$ size build/release/scylla
   text    data     bss     dec     hex filename
84349449         289424  345257 84984130        510c142 build/release/scylla
$ ls -l build/release/scylla
-rwxrwxr-x 1 kefu kefu 1774204800 Jan 19 17:52 build/release/scylla
```

Fixes #12484

Closes #12558

* github.com:scylladb/scylladb:
  main: move perf_sstable into scylla
  main: move perf_row_cache_update into scylla
  test: perf_row_cache_update: add static specifier to local functions
  main: move perf_fast_forward into scylla
  main: move perf_simple_query into scylla
  test: extract debug::the_database out
  main: shift the args when checking exec_name
  main: extract lookup_main_func() out
2023-01-19 15:01:30 +02:00
Kamil Braun
147dd73996 test/pylib: scylla_cluster: mark cluster as dirty if it fails to boot
If a cluster fails to boot, it saves the exception in
`self.start_exception` variable; the exception will be rethrown when
a test tries to start using this cluster. As explained in `before_test`:
```
    def before_test(self, name) -> None:
        """Check that  the cluster is ready for a test. If
        there was a start error, throw it here - the server is
        running when it's added to the pool, which can't be attributed
        to any specific test, throwing it here would stop a specific
        test."""
```
It's arguable whether we should blame some random test for a failure
that it didn't cause, but nevertheless, there's a problem here: the
`start_exception` will be rethrown and the test will fail, but then the
cluster will be simply returned to the pool and the next test will
attempt to use it... and so on.

Prevent this by marking the cluster as dirty the first time we rethrow
the exception.

Closes #12560
2023-01-19 14:26:57 +02:00
Marcin Maliszkiewicz
4c33791f96 alternator: eliminate regexes from the hot path
This decreases the whole alternator::get_table cpu time by 78%
(from 2.8 us to 0.6 us on my cpu).

In perf_simple_query it decreases allocs/op by 1.6% (by removing 4 allocations)
and increases median tps by 3.4%.

Raw results from running:

./build/release/test/perf/perf_simple_query_g --smp 1 \
         --alternator forbid --default-log-level error \
         --random-seed=1235000092 --duration=180 --write

Before the patch:

median 46903.65 tps (197.2 allocs/op,  12.1 tasks/op,  170886 insns/op, 0 errors)
median absolute deviation: 210.15
maximum: 47354.59
minimum: 42535.63

After the patch:

median 48484.76 tps (194.1 allocs/op,  12.1 tasks/op,  168512 insns/op, 0 errors)
median absolute deviation: 317.32
maximum: 49247.69
minimum: 44656.38

Closes #12445
2023-01-19 13:23:24 +02:00
Avi Kivity
9029b8dead test: disable commitlog O_DSYNC, preallocation
Commitlog O_DSYNC is intended to make Raft and schema writes durable
in the face of power loss. To make O_DSYNC performant, we preallocate
the commitlog segments, so that the commitlog writes only change file
data and not file metadata (which would require the filesystem to commit
its own log).

However, in tests, this causes each ScyllaDB instance to write 384MB
of commitlog segments. This overloads the disks and slows everything
down.

Fix this by disabling O_DSYNC (and therefore preallocation) during
the tests. They can't survive power loss, and run with
--unsafe-bypass-fsync anyway.

Closes #12542
2023-01-19 11:14:05 +01:00
Kefu Chai
7f5bb19d1f main: move perf_sstable into scylla
* configure.py:
  - include `test/perf/perf_sstable` and its dependencies in scylla_perfs
* test/perf/perf_sstable.cc: change `main()` to
  `perf::scylla_sstable_main()`
* test/perf/entry_point.hh: add
  `perf::scylla_sstable_main()`
* main.cc:
  - dispatch "perf-sstable" subcommand to
    `perf::scylla_sstable_main`

before this change, we have a tool at `test/perf/perf_sstable`
for running performance tests by exercising sstable related operations.

after this change, the `test/perf/perf_sstable` is integreated
into `scylla` as a subcommand. so we can run `scylla perf-sstable`
[options, ...]` to perform the same tests previous driven by the tool.

Fixes #12484
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-19 17:42:52 +08:00
Kefu Chai
240f2c6f00 main: move perf_row_cache_update into scylla
* configure.py:
  - include `test/perf/perf_row_cache_update.cc` in scylla_perfs
* main.cc:
  - dispatch "perf-row-cache-update" subcommand to
    `perf::scylla_row_cache_update_main`
* test/perf/perf_fast_forward.cc: change `main()` to
  `perf::scylla_row_cache_update_main()`
* test/perf/entry_point.hh: add
  `perf::scylla_row_cache_update_main()`

before this change, we have a tool at `test/perf/perf_row_cache_update`
for running performance tests by updating row cache.

after this change, the `test/perf/perf_row_cache_update` is integreated
into `scylla` as a subcommand. so we can run `scylla perf-row-cache-update
[options, ...]` to perform the same tests previous driven by the tool.

Fixes #12484
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-19 17:42:46 +08:00
Kefu Chai
4e390b9a05 test: perf_row_cache_update: add static specifier to local functions
now that these functions are only used by the same compiling unit,
they don't need external linkage. so let's hide them using `static`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-19 17:42:46 +08:00
Kefu Chai
228ccdc1c7 main: move perf_fast_forward into scylla
* configure.py:
  - include `test/perf/perf_simple_query.cc` in scylla_perfs
* main.cc:
  - dispatch "perf-fast-forward" subcommand to
    `perf::scylla_fast_forward_main`
* test/perf/perf_fast_forward.cc: change `main()` to
  `perf::scylla_simple_query_main()`
* test/perf/entry_point.hh: add
  `perf::scylla_simple_query_main()`

before this change, we have a tool at `test/perf/perf_fast_forward`
for running performance tests by fast forwarding the reader.

after this change, the `test/perf/perf_fast_forward` is integreated
into `scylla` as a subcommand. so we can run `scylla perf-fast-forward
[options, ...]` to perform the same tests previous driven by the tool.

Fixes #12484
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-19 17:42:40 +08:00
Kefu Chai
09de031cab main: move perf_simple_query into scylla
* configure.py:
  - include scylla_perfs in scylla
  - move 'test/lib/debug.cc' down scylla_perfs, as the latter uses
    `debug::the_database`
  - link `scylla` against seastar_testing_libs also. because we
    use the helpers in `test/lib/random_utils.hh` for generating
    random numbers / sequences in `perf_simple_query.cc`, and
    `random_utils.hh` references `seastar::testing::local_random_engine`
    as a local RNG. but `seastar::testing::local_random_engine`
    is included in `libseastar_testing.a` or
    `libseastar_perf_testing.a`. since we already have the rules for
    linking against `libseastar_testing.a`, let's just reuse them,
    and link `scylla` against this new dependency.

* main.cc:
  - dispatch "perf-simple-query" subcommand to
    `perf::scylla_simple_query_main`
* test/perf/perf_simple_query.cc: change `main()` to
  `perf::scylla_simple_query_main()`
* test/perf/entry_point.hh: define the main function entries
  so `main.cc` can find them. it's quite like how we collect
  the entries in `tools/entry_point.hh`

before this change, we have a tool at `test/perf/perf_simple_query`
for running performance test by sending simple query to a single-node
cluster.

after this change, the `test/perf/perf_simple_query` is integreated
into `scylla` as a subcommand. so we can run `scylla perf-simple-query
[options, ...]` to perform the same tests previous driven by the tool.

Fixes #12484
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-19 17:42:30 +08:00
Kefu Chai
c65692a13a test: extract debug::the_database out
we want to integrate some perf test into scylla executable, so we
can run them on a regular basis. but `test/lib/cql_test_env.cc`
shares `debug::the_database` with `main.cc`, so we cannot just
compile them into a single binary without changing them.

before this change, both `test/lib/cql_test_env.cc`
and `main.cc` define `debug::the_database`.

after this change, `debug::the_database` is extracted into
`debug.cc`, so it compiles into a separate compiling unit.
and scylla and tests using seastar testing framework are linked
against `debug.cc` via `scylla_core` respectively. this paves the road to
integrating scylla with the tests linking aginst
`test/lib/cql_test_env.cc`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-19 17:42:23 +08:00
Nadav Har'El
0ff0c80496 test/cql-pytest: un-xfail tests for UNSET values
Commit 0b418fa improved the error detection of unset values in
inappropriate CQL statements, and some of the unit tests translated
from Cassandra started to pass, so this patch removes their "xfail"
mark.

In a couple of places Scylla's error message is worded differently
from Cassandra, so the test was modified to look for a shorter
string common to both implementations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12553
2023-01-19 07:47:08 +02:00
Kefu Chai
6a3b19b53d test/perf: replace "std::cout <<" with fmt::print()
for better readablity

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12559
2023-01-19 07:45:13 +02:00
Avi Kivity
aab5954cfb Merge 'reader_concurrency_semaphore: add more layers of defense against OOM' from Botond Dénes
The reader concurrency semaphore has no mechanism to limit the memory consumption of already admitted read. Once memory collective memory consumption of all the admitted reads is above the limit, all it can do is to not admit any more. Sometimes this is not enough and the memory consumption of the already admitted reads balloons to the point of OOMing the node. This pull-request offers a solution to this: it introduces two more layers of defense above this: a soft and a hard limit. Both are multipliers applied on the semaphores normal memory limit.
When the soft limit threshold is surpassed, all readers but one are blocked via a new blocking `request_memory()` call which is used by the `tracking_file_impl`. The reader to be allowed to proceed is chosen at random, it is the first reader which happens to request memory after the limit is surpassed. This is both very simple and should avoid situations where the algorithm choosing the reader to be allowed to proceed chooses a reader which will then always time out.
When the hard limit threshold is surpassed, `reader_concurrency_semaphore::consume()` starts throwing `std::bad_alloc`. This again will result in eliminating whichever reader was unlucky enough to request memory at the right moment.

With this, the semaphore is now effectively enforcing an upper bound for memory consumption, defined by the hard limit.

Refs: https://github.com/scylladb/scylladb/issues/11927

Closes #11955

* github.com:scylladb/scylladb:
  test: reader_concurrency_semaphore_test: add tests for semaphore memory limits
  reader_permit: expose operator<<(reader_permit::state)
  reader_permit: add id() accessor
  reader_concurrency_semaphore: add foreach_permit()
  reader_concurrency_semaphore: document the new memory limits
  reader_concurrency_semaphore: add OOM killer
  reader_concurrency_semaphore: make consume() and signal() private
  test: stop using reader_concurrency_semaphore::{consume,signal}() directly
  reader_concurrency_semaphore: move consume() out-of-line
  reader_permit: consume(): make it exception-safe
  reader_permit: resource_units::reset(): only call consume() if needed
  reader_concurrency_semaphore: tracked_file_impl: use request_memory()
  reader_concurrency_semaphore: add request_memory()
  reader_concurrency_semaphore: wrap wait list
  reader_concurrency_semaphore: add {serialize,kill}_limit_multiplier parameters
  test/boost/reader_concurrency_semaphore_test: dummy_file_impl: don't use hardoced buffer size
  reader_permit: add make_new_tracked_temporary_buffer()
  reader_permit: add get_state() accessor
  reader_permit: resource_units: add constructor for already consumed res
  reader_permit: resource_units: remove noexcept qualifier from constructor
  db/config: introduce reader_concurrency_semaphore_{serialize,kill}_limit_multiplier
  scylla-gdb.py: scylla-memory: extract semaphore stats formatting code
  scylla-gdb.py: fix spelling of "graphviz"
2023-01-18 17:02:55 +02:00
Avi Kivity
9a54cb5deb Merge 'cql3/expr: make it possible to prepare binary_operator' from Jan Ciołek
`prepare_expression` takes an unprepared CQL expression straight from the parser output and prepares it. Preparation consists of various type checks that are needed to ensure that the expression is correct and to reason about it.

While `prepare_expression` supports a number of different types of expressions, until now it was impossible to prepare a `binary_operator`. Eventually we would like to be able to prepare all kinds of expressions, so this PR adds the missing support for `binary_operator`.

Closes #12550

* github.com:scylladb/scylladb:
  expr_test: test preparing binary_operator with NULL RHS
  expr_test: test preparing IS NOT NULL binary_operator
  expr_test: test preparing binary_operator with LIKE
  expr_test: test preparing binary_operator with CONTAINS KEY
  expr_test: test preparing binary_operator with CONTAINS
  expr_test: test preparing binary_operator with IN
  expr_test: test preparing binary_operator with =, !=, <, <=, >, >=
  expr_test: use make_*_untyped function in existing tests
  expr_test_utils: add utilities to create untyped_constant
  expr_test_utils: add make_float_* and make_double_*
  cql3: expr: make it possible to prepare binary_operator using prepare_expression
  cql3/expr: check that RHS of IS NOT NULL is a null value when preparing binary operators
  cql3: expr: pass non-empty keyspace name in prepare_binary_operator
  cql3: expr: take reference to schema in prepare_binary_operator
2023-01-18 16:55:18 +02:00
Jenkins Promoter
75a3dd2fc8 release: prepare for 5.3.0-dev 2023-01-18 16:22:41 +02:00
Kefu Chai
965443d6be main: shift the args when checking exec_name
instead of introducing yet another variable for tracking the
status, update the args right away. for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-18 22:22:10 +08:00
Kefu Chai
835cd9bfc9 main: extract lookup_main_func() out
refactor main() to extract lookup_main_func() out, so we find
the main_func in a table instead of using a lengthy if-then-else
clause.

when the length of the list of candidates of dispatch grows, the
code would be less structured. so in this change, the code looking
up for the main_func is extracted into a dedicated function for
better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-18 22:22:10 +08:00
Avi Kivity
71bbd7475c Update seastar submodule
* seastar 8889cbc198...d41af8b592 (14):
  > Merge 'Perf stall detector related improvements' from Travis Downs
Ref #8828, #7882, #11582 (may help make progress)
  > build: pass HEAPPROF definition to src/core/reactor.cc too
  > Limit memory address space per core to 64GB when hwloc is not available
  > build: revert use pkg_search_module(.. IMPORTED_TARGET ..) changes
  > Fix missing newlines in seastar-addr2line
  > Use an integral type for uniform_int_distribution
  > Merge 'tls_test: use a dedicated https server for testing' from Kefu Chai
  > build: use ${CMAKE_BINARY_DIR} when running 'cmake --build ..'
  > build: do not set c-ares_FOUND with PARENT_SCOPE
  > reactor: drop unused member function declaration
  > sstring: refactor to_sstring() using fmt::format_to()
  > http: delay input stream close until responses sent
  > build: enable non-library targets using default option value
  > Merge 'sstring: specialize uninitialize_string() and use resize_and_overwrite if available' from Kefu Chai

Closes #12509
2023-01-18 15:50:57 +02:00
Jan Ciolek
ae0e955b90 expr_test: test preparing binary_operator with NULL RHS
Make sure that preparing binary_operator works properly
when the RHS is NULL.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:46 +01:00
Jan Ciolek
65b8a09409 expr_test: test preparing IS NOT NULL binary_operator
Add unit test which check that preparing binary_operators
which represent IS NOT NULL works as expected

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:46 +01:00
Jan Ciolek
5b3e6769f1 expr_test: test preparing binary_operator with LIKE
Add unit test which check that preparing binary_operators
with the LIKE operation works as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com
2023-01-18 12:04:45 +01:00
Jan Ciolek
e876496f7f expr_test: test preparing binary_operator with CONTAINS KEY
Add unit test which check that preparing binary_operators
with the CONTAINS KEY operation works as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:45 +01:00
Jan Ciolek
c6d2e1a03e expr_test: test preparing binary_operator with CONTAINS
Add unit test which check that preparing binary_operators
with the CONTAINS operation works as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:45 +01:00
Jan Ciolek
6b147ecaea expr_test: test preparing binary_operator with IN
Add unit test which check that preparing binary_operators
with the IN operation works as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:45 +01:00
Jan Ciolek
669d791250 expr_test: test preparing binary_operator with =, !=, <, <=, >, >=
Add unit test which check that preparing binary_operators
with basic comparison operations works as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:44 +01:00
Jan Ciolek
60803d12a9 expr_test: use make_*_untyped function in existing tests
Use the newly introduced convenience methods that create
untyped_constant in existing tests.

This will make the code more readable by removing
visual clutter that came with the previous overly
verbose code.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:44 +01:00
Jan Ciolek
819390f9fe expr_test_utils: add utilities to create untyped_constant
expression tests often need to create instances of untyped_constant.
Creating them by hand is tedious because the required code is overly verbose.
Having convenience functions for it speeds up test writing.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:44 +01:00
Jan Ciolek
362bf7f534 expr_test_utils: add make_float_* and make_double_*
Add utilities to create float and double values in tests.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:44 +01:00
Jan Ciolek
da3c07955a cql3: expr: make it possible to prepare binary_operator using prepare_expression
prepare_expression didn't allow to prepare binary_operators.
so it's now implemented.

If prepare_binary_operator is unable to infer
the types it will fail with an exception instead
of returning std::nullopt, but we can live with
that for now.

Preparing binary_operators inside the WHERE
clause is currently more complicated than just
calling prepare_binary_operator. Preparation
of the WHERE clause is done inside statement_restrictions
constructor. It's done by iterating over all binary_operators,
validating them and then preparing. The validation contains
additional checks with custom error messages.
Preparation has to be done after validation,
because otherwise the error messages will change
and some tests will start failing.
Because of that we can't just call prepare_expression
on the WHERE clause yet.

It's still useful to have the ability to prepare
binary_operators using prepare_expression.
In cases where we know that the WHERE clause is valid,
we can just call prepare_expression and be done with it.

Once grammar is fully relaxed the artificial constraints
checked by the validation code will be removed and
it will be possible to prepare the whole WHERE clause
using just prepare_expression.

prepare_expression does a bit more than
prepare_binary_operator. In case where
both sides of the binary_operator are known
it will evaluate the whole binary_operator
to a constant value.

Query analysis code is NOT ready
to encounter constant boolean values inside
the WHERE clause, so for the WHERE we still use
prepare_binary_operator which doesn't
evaluate the binary_operator to a
constant value.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:43 +01:00
Jan Ciolek
5f8b1a1a60 cql3/expr: check that RHS of IS NOT NULL is a null value when preparing binary operators
When preparing a binary operator we first prepare the LHS,
which gives us information about its type and allows
to infer the desired type of RHS.

Then the RHS is prepared with the expectation that it
is compatible with the inferred type.

This is enough for all types of operations apart
from IS NOT NULL.

For IS NOT we should also check that the RHS value
is actually null. It's not enough to check that
RHS is of right type.

Before this change preparing `int_col IS NOT 123`
would end in success, which is wrong.

The missing check doesn't cause any real problems,
it's impossible for the user to produce such input
because the parser will reject it.
Still it's better to have the check because
in the future the grammar might get more relaxed
and the parser could become more generic,
making it possible to write such things.

It would be better to introduce unary_operators,
but that's a bigger change.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:43 +01:00
Jan Ciolek
703e9f21ff cql3: expr: pass non-empty keyspace name in prepare_binary_operator
For some reason we passed an empty keyspace name
to prepare_expression when preparing the LHS
of a binary operator.

This doesn't look correct. We have keyspace
name available from the schema_ptr so let's use that.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:43 +01:00
Jan Ciolek
9a0c5789a2 cql3: expr: take reference to schema in prepare_binary_operator
prepare_binary_operator takes a schema_ptr,
but it would be useful to take a reference to schema instead.
Every schema_ptr can be easily converted to a reference
so there is no loss of functionality.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:40 +01:00
Nadav Har'El
48e2d6a541 Merge 'utils: throw error on malformed input in base64 decode' from Marcin Maliszkiewicz
Several cases where fixed in this patches, all are related to processing of malformed base64 data. Main purpose was to bring alternator implementation closer to what DynamoDB does. We now:
- Throw error when padding is missing during base64 decoding
- Throw error when base64 data is malformed
- In alternator when invalid base64 data is fetched from DB (as opposed to being part of user's request) we now exclude such row during filtering

Additionally some small code quality improvements:
- avoid unnecessary type conversions in calls to rjson:from_strings functions
- avoid some copy constructions in calls to rjson:from_strings functions

Fixes https://github.com/scylladb/scylladb/issues/6487

Closes #11944

* github.com:scylladb/scylladb:
  alternator: evaluate expressions as false for stored malformed binary data
  rjson: avoid copy constructors in from_string calls when possible
  alternator: remove unused parameters from describe_items func
  utils: throw error on malformed input in base64 decode
  utils: throw error on missing padding in base64 decode
2023-01-18 12:40:57 +02:00
Avi Kivity
561f4ca057 test: materialized view: add test exercising synthetic empty-type columns
Materialized views inject synthetic empty-type columns in some conditions.
Since we just touched empty-type serialization/deserialization, add a
test to exercise it and make sure it still works.
2023-01-18 10:38:24 +02:00
Avi Kivity
04925a7b29 cql3: expr: relax evaluate_list() to allow allow NULL elements
Tests are similarly relaxed. A test is added in lwt_test to show
that insertion of a list with NULL is still rejected, though we
allow NULLs in IF conditions.

One test is changed from a list of longs to a list of ints, to
prevent churn in the test helper library.
2023-01-18 10:38:24 +02:00
Avi Kivity
390a0ca47b types: allow lists with NULL
Allow transient lists that contain NULL throughout the
evaluation machinery. This makes is possible to evalute things
like `IF col IN (1, 2, NULL)` without hacks, once LWT conditions
are converted to expressions.

A few tests are relaxed to accommodate the new behavior:
 - cql_query_test's test_null_and_unset_in_collections is relaxed
   to allow `WHERE col IN ?`, with the variable bound to a list
   containing NULL; now it's explicitly allowed
 - expr_test's evaluate_bind_variable_validates_no_null_in_list was
   checking generic lists for NULLs, and was similary relaxed (and
   renamed)
 - expr_Test's evaluate_bind_variable_validates_null_in_lists_recursively
   was similarly relaxed to allow NULLs.
2023-01-18 10:38:24 +02:00
Avi Kivity
00145f9ada test: relax NULL check test predicate
When we start allowing NULL in lists in some contexts, the exact
location where an error is raised (when it's disallowed) will
change. To prepare for that, relax the exception check to just
ensure the word NULL is there, without caring about the exact
wording.
2023-01-18 10:38:24 +02:00
Avi Kivity
5f8540ecfa cql3, types: validate listlike collections (sets, lists) for storage
Lists allow NULL in some contexts (bind variables for LWT "IN ?"
conditions), but not in most others. Currently, the implementation
just disallows NULLs in list values, and the cases where it is allowed
are hacked around. To reduce the special cases, we'll allow lists
to have NULLs, and just restrict them for storage. This is similar
to how scalar values can be NULL, but not when they are part of a
partition key.

To prepare for the transition, identify the locations where lists
(and sets, which share the same storage) are stored as frozen
values and add a NULL check there. Non-frozen lists already have the
check. Since sets share the same format as lists, apply the same to
them.

No actual checks are done yet, since NULLs are impossible. This
is just a stub.
2023-01-18 10:38:24 +02:00
Avi Kivity
da4abccf89 types: make empty type deserialize to non-null value
The empty type is used internally to implement CQL sets on top
of multi-cell maps. The map's key (an atomic cell) represents the
set value, and the map's value is discarded. Since it's unneeded
we use an internal "empty" type.

Currently, it is deserialized into a `data_value` object representing
a NULL. Since it's discarded, it really doesn't matter.

However, with the impending change to change lists to allow NULLs,
it does matter:

 1. the coordinator sets the 'collections_as_maps' flag for LWT
    requests since it wants list indexes (this affects sets too).
 2. the replica responds by serializing a set as a map.
 3. since we start allow NULL collection values, we now serialize
    those NULLs as NULLs.
 4. the coordinator deserializes the map, and complains about NULL
    values, since those are not supported.

The solution is simple, deserialize the empty value as a non-NULL
object. We create an empty empty_type_representation and add the
scaffolding needed. Serialization and deserialization is already
coded, it was just never called for NULL values (which were serialized
with size 0, in collections, rather than size -1, luckily).

A unit test is added.
2023-01-18 10:38:24 +02:00
Tomasz Grabiec
563998b69a Merge 'raft: improve group 0 reconfiguration failure handling' from Kamil Braun
Make it so that failures in `removenode`/`decommission` don't lead to reduced availability, and any leftovers in group 0 can be removed by `removenode`:
- In `removenode`, make the node a non-voter before removing it from the token ring. This removes the possibility of having a group 0 voting member which doesn't correspond to a token ring member. We can still be left with a non-voter, but that's doesn't reduce the availability of group 0.
- As above but for `decommission`.
- Make it possible to remove group 0 members that don't correspond to token ring members from group 0 using `removenode`.
- Add an API to query the current group 0 configuration.

Fixes #11723.

Closes #12502

* github.com:scylladb/scylladb:
  test: test_topology: test for removing garbage group 0 members
  test/pylib: move some utility functions to util.py
  db: system_keyspace: add a virtual table with raft configuration
  db: system_keyspace: improve system.raft_snapshot_config schema
  service: storage_service: better error handling in `decommission`
  service: storage_service: fix indentation in removenode
  service: storage_service: make `removenode` work for group 0 members which are not token ring members
  service/raft: raft_group0: perform read_barrier in wait_for_raft
  service: storage_service: make leaving node a non-voter before removing it from group 0 in decommission/removenode
  test: test_raft_upgrade: remove test_raft_upgrade_with_node_remove
  service/raft: raft_group0: link to Raft docs where appropriate
  service/raft: raft_group0: more logging
  service/raft: raft_group0: separate function for checking and waiting for Raft
2023-01-17 21:23:15 +01:00
Kamil Braun
d134c458e5 test/pylib: increase timeout when waiting for cluster before test
Increase the timeout from default 5 minutes to 10 minutes.
Sent as a workaround for #12546 to unblock next promotions.

Closes #12547
2023-01-17 21:03:09 +02:00
Kamil Braun
4f1c317bdc test: test_raft_upgrade: stop servers gracefully in test_recovery_after_majority_loss
This test is frequently failing due to a timeout when we try to restart
one of the nodes. The shutdown procedure apparently hangs when we try to
stop the `hints_manager` service, e.g.:
```
INFO  2023-01-13 03:18:02,946 [shard 0] hints_manager - Asked to stop
INFO  2023-01-13 03:18:02,946 [shard 0] hints_manager - Stopped
INFO  2023-01-13 03:18:02,946 [shard 0] hints_manager - Asked to stop
INFO  2023-01-13 03:18:02,946 [shard 1] hints_manager - Asked to stop
INFO  2023-01-13 03:18:02,946 [shard 1] hints_manager - Stopped
INFO  2023-01-13 03:18:02,946 [shard 1] hints_manager - Asked to stop
INFO  2023-01-13 03:18:02,946 [shard 1] hints_manager - Stopped
INFO  2023-01-13 03:22:56,997 [shard 0] hints_manager - Stopped
```
observe the 5 minute delay at the end.

There is a known issue about `hints_manager` stop hanging: #8079.

Now, for some reason, this is the only test case that is hitting this
issue. We don't completely understand why. There is one significant
difference between this test case and others: this is the only test case
which kills 2 (out of 3) servers in the cluster and then tries to
gracefully shutdown the last server. There's a hypothesis that the last
server gets stuck trying to send hints to the killed servers. We weren't
able to prove/falsify it yet. But if it's true, then this patch will:
- unblock next promotions,
- give us some important information when we see that the issue stops
  appearing.
In the patch we shutdown all servers gracefully instead of killing them,
like we do in the other test cases.

Closes #12548
2023-01-17 20:51:09 +02:00
Pavel Emelyanov
4f415413d2 raft: Fix non-existing state_machine::apply_entry in docs
The docs mention that method, but it doesn't exist. Instead, the
state_machine interface defines plain .apply() one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12541
2023-01-17 12:53:05 +01:00
Kamil Braun
5545547d07 test: test_topology: test for removing garbage group 0 members
Verify that `removenode` can remove group 0 members which are not token
ring members.
2023-01-17 12:28:00 +01:00
Kamil Braun
c959ec455a test/pylib: move some utility functions to util.py
They were used in test_raft_upgrade, but we want to use them in other
test files too.
2023-01-17 12:28:00 +01:00
Kamil Braun
a483915c62 db: system_keyspace: add a virtual table with raft configuration
Add a new virtual table `system.raft_state` that shows the currently
operating Raft configuration for each present group. The schema is the
same as `system.raft_snapshot_config` (the latter shows the config from
the last snapshot). In the future we plan to add more columns to this
table, showing more information (like the current leader and term),
hence the generic name.

Adding the table requires some plumbing of
`sharded<raft_group_registry>&` through function parameters to make it
accessible from `register_virtual_tables`, but it's mostly
straightforward.

Also added some APIs to `raft_group_registry` to list all groups and
find a given group (returning `nullptr` if one isn't found, not throwing
an exception).
2023-01-17 12:28:00 +01:00
Kamil Braun
2bfe85ce9b db: system_keyspace: improve system.raft_snapshot_config schema
Remove the `ip_addr` column which was not used. IP addresses are not
part of Raft configuration now and they can change dynamically.

Swap the `server_id` and `disposition` columns in the clustering key, so
when querying the configuration, we first obtain all servers with the
current disposition and then all servers with the previous disposition
(note that a server may appear both in current and previous).
2023-01-17 12:28:00 +01:00
Kamil Braun
c3ed82e5fb service: storage_service: better error handling in decommission
Improve the error handling in `decommission` in case `leave_group0`
fails, informing the user what they should do (i.e. call `removenode` to
get rid of the group 0 member), and allowing decommission to finish; it
does not make sense to let the node continue to run after it leaves the
token ring. (And I'm guessing it's also not safe. Or maybe impossible.)
2023-01-17 12:28:00 +01:00
Kamil Braun
beb0eee007 service: storage_service: fix indentation in removenode 2023-01-17 12:28:00 +01:00
Kamil Braun
aba33dd352 service: storage_service: make removenode work for group 0 members which are not token ring members
Due to failures we might end up in a situation where we have a group 0
member which is not a token ring member: a decommission/removenode
which failed after leaving/removing a node from the token ring but
before leaving / removing a node from group 0.

There was no way to get rid of such a group 0 member. A node that left
the token ring must not be allowed to run further (or it can cause data
loss, data resurrection and maybe other fun stuff), so we can't run
decommission a second time (even if we tried, it would just say that
"we're not a member of the token ring" and abort). And `removenode`
would also not work, because it proceeds only if the node requested to
be removed is a member of the token ring.

We modify `removenode` so it can run in this situation and remove the
group 0 member. The parts of `removenode` related to token ring
modification are now conditioned on whether the node was a member of the
token ring. The final `remove_from_group0` step is in its own branch. Some
minor refactors were necessary. Some log messages were also modified so
it's easier to understand which messages correspond the "token movement"
part of the procedure.

The `make_nonvoter` step happens only if token ring removal happens,
otherwise we can skip directly to `remove_from_group0`.

We also move `remove_from_group0` outside the "try...catch",
fixing #11723. The "node ops" part of the procedure is related strictly
to token ring movement, so it makes sense for `remove_from_group0` to
happen outside.

Indentation is broken in this commit for easier reviewability, fixed in
the following commit.

Fixes: #11723
2023-01-17 12:28:00 +01:00
Kamil Braun
ec2cd29e42 service/raft: raft_group0: perform read_barrier in wait_for_raft
Right now wait_for_raft is called before performing group 0
configuration changes. We want to also call it before checking for
membership, for that it's desirable to have the most recent information,
hence call read_barrier. In the existing use cases it's not strictly
necessary, but it doesn't hurt.
2023-01-17 12:28:00 +01:00
Kamil Braun
db734cd74f service: storage_service: make leaving node a non-voter before removing it from group 0 in decommission/removenode
removenode currently works roughly like this:
1. stream/repair data so it ends up on new replica sets (calculated
   without the node we want to remove)
2. remove the node from the token ring
3. remove the node from group 0 configuration.

If the procedure fails before after step 2 but before step 3 finishes,
we're in trouble: the cluster is left with an additional voting group 0
member, which reduces group 0's availability, and there is no way to
remove this member because `removenode` no longer considers it to be
part of the cluster (it consults the token ring to decide).

Improve this failure scenario by including a new step at the beginning:
make the node a non-voter in group 0 configuration. Then, even if we
fail after removing the node from the token ring but before removing it
from group 0, we'll only be left with a non-voter which doesn't reduce
availability.

We make a similar change for `decommission`: between `unbootstrap()` (which
streams data) and `leave_ring()` (which removes our tokens from the
ring), become a non-voter. The difference here is that we don't become a
non-voter at the beginning, but only after streaming/repair. In
`removenode` it's desirable to make the node a non-voter as soon as
possible because it's already dead. In decommission it may be desirable
for us to remain a voter if we fail during streaming because we're still
alive and functional in that case.

In a later commit we'll also make it possible to retry `removenode` to
remove a node that is only a group 0 member and not a token ring member.
2023-01-17 12:28:00 +01:00
Kamil Braun
1eee349a17 test: test_raft_upgrade: remove test_raft_upgrade_with_node_remove
The test would create a scenario where one node was down while the others
started the Raft upgrade procedure. The procedure would get stuck, but
it was possible to `removenode` the downed node using one of the alive
nodes, which would unblock the Raft upgrade procedure.

This worked because:
1. the upgrade procedure starts by ensuring that all peers can be
   contacted,
2. `removenode` starts by removing the node from the token ring.

After removing the node from the token ring, the upgrade procedure
becomes able to contact all peers (the peers set no longer contains the
down node). At the end, after removing the node from the token ring,
`removenode` would actually get stuck for a while, waiting for the
upgrade procedure to finish before removing the peer from group 0.
After the upgrade procedure finished, `removenode` would also finish.
(so: first the upgrade procedure waited for removenode, then removenode
waited for the upgrade procedure).

We want to modify the `removenode` procedure and include a new step
before removing the node from the token ring: making the node a
non-voter. The purpose is to improve the possible failure scenarios.
Previously, if the `removenode` procedure failed after removing the node
from the token ring but before removing it from group 0, the cluster
would contain a 'garbage' group 0 member which is a voter - reducing
group 0's availability. If the node is made a non-voter first, then this
failure will not be as big of a problem, because the leftover group 0
member will be a non-voter.

However, to correctly perform group 0 operations including making
someone a nonvoter, we must first wait for the Raft upgrade procedure to
finish (or at least wait until everyone joins group 0). Therefore by
including this 'make the node a non-voter' step at the beginning of
`removenode`, we make it impossible to remove a token ring member in the
middle of the upgrade procedure, on which the test case relied. The test
case would get stuck waiting for the `removenode` operation to finish,
which would never finish because it would wait for the upgrade procedure
to finish, which would not finish because of the dead peer.

We remove the test case; it was "lucky" to pass in the first place. We
have a dedicated mechanism for handling dead peers during Raft upgrade
procedure: the manual Raft group 0 RECOVERY procedure. There are other
test cases in this file which are using that procedure.
2023-01-17 12:28:00 +01:00
Kamil Braun
4f0801406e service/raft: raft_group0: link to Raft docs where appropriate
Resolve some TODOs.
2023-01-17 12:28:00 +01:00
Kamil Braun
2befbaa341 service/raft: raft_group0: more logging
Make the logs in leave_group0 consistent with logs in
remove_from_group0.
2023-01-17 12:28:00 +01:00
Kamil Braun
77dc1c4c70 service/raft: raft_group0: separate function for checking and waiting for Raft
leave_group0 and remove_from_group0 functions both start with the
following steps:
- if Raft is disabled or in RECOVERY mode, print a simple log message
  and abort
- if Raft cluster feature flag is not yet enabled, print a complex log
  message and abort
- wait for Raft upgrade procedure to finish
- then perform the actual group 0 reconfiguration.

Refactor these preparation steps to a separate function,
`wait_for_raft`. This reduces code duplication; the function will also
be used in more operations later (becoming a nonvoter or turning another
server into a nonvoter).

We also change the API so that the preparation function is called from
outside by the caller before they call the reconfiguration function.
This is because in later commits, some of the call sites (mainly
`removenode`) will want to check explicitly whether Raft is enabled and
wait for Raft's availabilty, then perform a sequence of steps related
to group 0 configuration depending on the result.

Also add a private function `raft_upgrade_complete()` which we use to
assert that Raft is ready to be used.
2023-01-17 12:27:58 +01:00
Wojciech Mitros
5f45b32bfa forward_service: prevent heap use-after-free of forward_aggregates
Currently, we create `forward_aggregates` inside a function that
returns the result of a future lambda that captures these aggregates
by reference. As a result, the aggregates may be destructed before
the lambda finishes, resulting in a heap use-after-free.

To prolong the lifetime of these aggregates, we cannot use a move
capture, because the lambda is wrapped in a with_thread_if_needed()
call on these aggregates. Instead, we fix this by wrapping the
entire return statement in a do_with().

Fixes #12528

Closes #12533
2023-01-17 13:25:57 +02:00
Botond Dénes
8ea128cc27 test: reader_concurrency_semaphore_test: add tests for semaphore memory limits 2023-01-17 05:27:04 -05:00
Botond Dénes
ec1c615029 reader_permit: expose operator<<(reader_permit::state) 2023-01-17 05:27:04 -05:00
Botond Dénes
78583b84f1 reader_permit: add id() accessor
Effectively returns the address of the underlying permit impl as an
`uintptr_t`. This can be used to determine the identity of the permit.
2023-01-17 05:27:04 -05:00
Botond Dénes
7f8469db27 reader_concurrency_semaphore: add foreach_permit()
Allows iterating over all permits.
2023-01-17 05:27:04 -05:00
Botond Dénes
4c70b58993 reader_concurrency_semaphore: document the new memory limits 2023-01-17 05:27:04 -05:00
Botond Dénes
edb32cb171 reader_concurrency_semaphore: add OOM killer
When the collective memory consumption of all readers goes above
$kill_limit_multiplier * $memory_limit, consume() will throw
std::bad_alloc(), instantly unwinding the read that is unlucky enough
to have requested the last bytes of memory. This should help situation
where there are some problematic partitions, either because of large
cells or because they are scattered in too many sstables. Currently
nothing prevents such reads from bringing down the entire node via OOM.
2023-01-17 05:27:04 -05:00
Botond Dénes
81e2a2be7d reader_concurrency_semaphore: make consume() and signal() private
Using this API is quite dangerous as any mistakes can lead to leaking
resources from the semaphore. Also, soon we will tie this API closer to
permits, so they won't be as generic. Make them private so we don't have
to worry about correct usage. All external users are patched away
already.
2023-01-17 05:27:04 -05:00
Botond Dénes
ab18e7b178 test: stop using reader_concurrency_semaphore::{consume,signal}() directly
These methods will soon be retired (made private) so migrate away from
them. Consume memory through a permit instead. It is also safer this
way: all memory consumed through the permit is guaranteed to be released
when the permit is destroyed at the latest.
2023-01-17 05:27:04 -05:00
Botond Dénes
8f9e8aafdf reader_concurrency_semaphore: move consume() out-of-line
Its about to get a little bit more complex.
2023-01-17 05:27:04 -05:00
Botond Dénes
e4ef28284b reader_permit: consume(): make it exception-safe
reader_concurrency_semaphroe::consume() will soon throw.
2023-01-17 05:27:04 -05:00
Botond Dénes
029269af42 reader_permit: resource_units::reset(): only call consume() if needed
reset() is called from the destructor, with null resources. Calling
consume() can be avoided in this case and in fact it is required as
consume() is soon going to throw in some cases.
2023-01-17 05:27:04 -05:00
Botond Dénes
dd9a0a16e6 reader_concurrency_semaphore: tracked_file_impl: use request_memory()
Use the recently added `request_memory()` to aquire the memory units for
the I/O. This allows blocking all but one readers when memory
consumption grows too high.
2023-01-17 05:27:04 -05:00
Botond Dénes
9ed5d861be reader_concurrency_semaphore: add request_memory()
A possibly blocking request for more memory. If the collective memory
consumption of all reads goes above
$serialize_limit_multiplier * $memory_limit this request will block for
all but one reader (the first requester). Until this situation is
resolved, that is until memory stays above the above explained limit,
only this one reader is allowed to make progress. This should help reign
in the memory consumption of reads in a situation where their memory
consumption used to baloon without constraints before.
2023-01-17 05:27:04 -05:00
Gleb Natapov' via ScyllaDB development
15ebd59071 lwt: upgrade stored mutations to the latest schema during prepare
Currently they are upgraded during learn on a replica. The are two
problems with this.  First the column mapping may not exist on a replica
if it missed this particular schema (because it was down for instance)
and the mapping history is not part of the schema. In this case "Failed
to look up column mapping for schema version" will be thrown. Second lwt
request coordinator may not have the schema for the mutation as well
(because it was freed from the registry already) and when a replica
tries to retrieve the schema from the coordinator the retrieval will fail
causing the whole request to fail with "Schema version XXXX not found"

Both of those problems can be fixed by upgrading stored mutations
during prepare on a node it is stored at. To upgrade the mutation its
column mapping is needed and it is guarantied that it will be present
at the node the mutation is stored at since it is pre-request to store
it that the corresponded schema is available. After that the mutation
is processed using latest schema that will be available on all nodes.

Fixes #10770

Message-Id: <Y7/ifraPJghCWTsq@scylladb.com>
2023-01-17 11:14:46 +01:00
Raphael S. Carvalho
f2f839b9cc compaction: LCS: don't reshape all levels if only a single breaks disjointness
LCS reshape is compacting all levels if a single one breaks
disjointness. That's unnecessary work because rewriting that single
level is enough to restore disjointness. If multiple levels break
disjointness, they'll each be reshaped in its own iteration, so
reducing operation time for each step and disk space requirement,
as input files can be released incrementally.
Incremental compaction is not applied to reshape yet, so we need to
avoid "major compaction", to avoid the space overhead.
But space overhead is not the only problem, the inefficiency, when
deciding what to reshape when overlapping is detected, motivated
this patch.

Fixes #12495.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12496
2023-01-17 09:55:15 +02:00
Michał Chojnowski
9e17564c70 types: add some missing explicit instantiations
Some functions defined by a template in types.cc are used in other
translation units (via `cql3/untyped_result_set.hh`), but aren't
explicitly instantiated. Therefore their linking can fail, depending
on inlining decisions. (I experienced this when playing with compiler
options).
Fix that.

Closes #12539
2023-01-17 10:46:01 +02:00
Nadav Har'El
5bf94ae220 cql: allow disabling of USING TIMESTAMP sanity checking
As requested by issue #5619, commit 2150c0f7a2
added a sanity check for USING TIMESTAMP - the number specified in the
timestamp must not be more than 3 days into the future (when viewed as
a number of microseconds since the epoch).

This sanity checking helps avoid some annoying client-side bugs and
mis-configurations, but some users genuinely want to use arbitrary
or futuristic-looking timestamps and are hindered by this sanity check
(which Cassandra doesn't have, by the way).

So in this patch we add a new configuration option, restrict_future_timestamp
If set to "true", futuristic timestamps (more than 3 days into the future)
are forbidden. The "true" setting is the default (as has been the case
sinced #5619). Setting this option to "false" will allow using any 64-bit
integer as a timestamp, like is allowed Cassanda (and was allowed in
Scylla prior to #5619.

The error message in the case where a futuristic timestamp is rejected
now mentions the configuration paramter that can be used to disable this
check (this, and the option's name "restrict_*", is similar to other
so-called "safe mode" options).

This patch also includes a test, which works in Scylla and Cassandra,
with either setting of restrict_future_timestamp, checking the right
thing in all these cases (the futuristic timestamp can either be written
and read, or can't be written). I used this test to manually verify that
the new option works, defaults to "true", and when set to "false" Scylla
behaves like Cassandra.

Fixes #12527

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12537
2023-01-16 23:18:56 +02:00
Kefu Chai
114f30016a main: use std::shift_left() to consume tool name
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12536
2023-01-16 21:01:34 +02:00
Nadav Har'El
feef3f9dda test/cql-pytest: test more than one restriction on same clustering column
Cassandra refuses a request with more than one relation to the same
clustering column, for example

    DELETE FROM tbl WHERE p = ? and c = ? AND c > ?

complains that

    c cannot be restricted by more than one relation if it includes an Equal

But it produces different error messages for different operators and
even order.

Currently, Scylla doesn't consider such requests an error. Whether or
not we should be compatible with Cassandra here is discussed in
issue #12472. But as long as we do accept these queries, we should be
sure we do the right thing: "WHERE c = 1 AND c > 2" should match
nothing, "WHERE c = 1 AND c > 0" should match the matches of c = 1,
and so on. This patch adds a test for verify that these requests indeed
yield correct results. The test is scylla_only because, as explained
above, Cassandra doesn't support these requests at all.

Refs #12472

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12498
2023-01-16 20:41:16 +02:00
Kefu Chai
86b451d45c SCYLLA-VERSION-GEN: remove unnecessary bashism
remove unnecessary bashism, so that this script can be interpreted
by a POSIX shell.

/bin/sh is specified in the shebang line. on debian derivatives,
/bin/sh is dash, which is POSIX compliant. but this script is
written in the bash dialect.

before this change, we could run into following build failure
when building the tree on Debian:

[7/904] ./SCYLLA-VERSION-GEN
./SCYLLA-VERSION-GEN: 37: [[: not found

after this change, the build is able to proceed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12530
2023-01-16 20:34:01 +02:00
Avi Kivity
0b418fa7cf cql3, transport, tests: remove "unset" from value type system
The CQL binary protocol introduced "unset" values in version 4
of the protocol. Unset values can be bound to variables, which
cause certain CQL fragments to be skipped. For example, the
fragment `SET a = :var` will not change the value of `a` if `:var`
is bound to an unset value.

Unsets, however, are very limited in where they can appear. They
can only appear at the top-level of an expression, and any computation
done with them is invalid. For example, `SET list_column = [3, :var]`
is invalid if `:var` is bound to unset.

This causes the code to be littered with checks for unset, and there
are plenty of tests dedicated to catching unsets. However, a simpler
way is possible - prevent the infiltration of unsets at the point of
entry (when evaluating a bind variable expression), and introduce
guards to check for the few cases where unsets are allowed.

This is what this long patch does. It performs the following:

(general)

1. unset is removed from the possible values of cql3::raw_value and
   cql3::raw_value_view.

(external->cql3)

2. query_options is fortified with a vector of booleans,
   unset_bind_variable_vector, where each boolean corresponds to a bind
   variable index and is true when it is unset.
3. To avoid churn, two compatiblity structs are introduced:
   cql3::raw_value{,_view}_vector_with_unset, which can be constructed
   from a std::vector<raw_value{,_view/}>, which is what most callers
   have. They can also be constructed with explicit unset vectors, for
   the few cases they are needed.

(cql3->variables)

4. query_options::get_value_at() now throws if the requested bind variable
   is unset. This replaces all the throwing checks in expression evaluation
   and statement execution, which are removed.
5. A new query_options::is_unset() is added for the users that can tolerate
   unset; though it is not used directly.
6. A new cql3::unset_operation_guard class guards against unsets. It accepts
   an expression, and can be queried whether an unset is present. Two
   conditions are checked: the expression must be a singleton bind
   variable, and at runtime it must be bound to an unset value.
7. The modification_statement operations are split into two, via two
   new subclasses of cql3::operation. cql3::operation_no_unset_support
   ignores unsets completely. cql3::operation_skip_if_unset checks if
   an operand is unset (luckily all operations have at most one operand that
   tolerates unset) and applies unset_operation_guard to it.
8. The various sites that accept expressions or operations are modified
   to check for should_skip_operation(). This are the loops around
   operations in update_statement and delete_statement, and the checks
   for unset in attributes (LIMIT and PER PARTITION LIMIT)

(tests)

9. Many unset tests are removed. It's now impossible to enter an
   unset value into the expression evaluation machinery (there's
   just no unset value), so it's impossible to test for it.
10. Other unset tests now have to be invoked via bind variables,
   since there's no way to create an unset cql3::expr::constant.
11. Many tests have their exception message match strings relaxed.
   Since unsets are now checked very early, we don't know the context
   where they happen. It would be possible to reintroduce it (by adding
   a format string parameter to cql3::unset_operation_guard), but it
   seems not to be worth the effort. Usage of unsets is rare, and it is
   explicit (at least with the Python driver, an unset cannot be
   introduced by ommission).

I tried as an alternative to wrap cql3::raw_value{,_view} (that doesn't
recognize unsets) with cql3::maybe_unset_value (that does), but that
caused huge amounts of churn, so I abandoned that in favor of the
current approach.

Closes #12517
2023-01-16 21:10:56 +02:00
Marcin Maliszkiewicz
6f055ca5f9 alternator: evaluate expressions as false for stored malformed binary
data

We'll try to distinguish the case when data comes from the storage rather
than user reuqest. Such attribute can be used in expressions and
when it can't be decoded it should make expression evaluate as
false to simply exclude the row during filter query or scan.

Note that this change focuses on binary type, for other types we
may have some inconsistencies in the implementation.
2023-01-16 15:15:27 +01:00
Marcin Maliszkiewicz
bcbaccc143 rjson: avoid copy constructors in from_string calls when possible
This function anyway copies the value so no need to do extra copy.
2023-01-16 15:15:26 +01:00
Kamil Braun
7510144fba Merge 'Add replace-node-first-boot option' from Benny Halevy
Allow replacing a node given its Host ID rather than its ip address.

This series adds a replace_node_first_boot option to db/config
and makes use of it in storage_service.

The new option takes priority over the legacy replace_address* options.
When the latter are used, a deprecation warning is printed.

Documentation updated respectively.

And a cql unit_test is added.

Ref #12277

Closes #12316

* github.com:scylladb/scylladb:
  docs: document the new replace_node_first_boot option
  dist/docker: support --replace-node-first-boot
  db: config: describe replace_address* options as deprecated
  test: test_topology: test replace using host_id
  test: pylib: ServerInfo: add host_id
  storage_service: get rid of get_replace_address
  storage_service: is_replacing: rely directly on config options
  storage_service: pass replacement_info to run_replace_ops
  storage_service: pass replacement_info to booststrap
  storage_service: join_token_ring: reuse replacement_info.address
  storage_service: replacement_info: add replace address
  init: do not allow cfg.replace_node_first_boot of seed node
  db: config: add replace_node_first_boot option
2023-01-16 15:08:31 +01:00
Marcin Maliszkiewicz
668fffb6c5 alternator: remove unused parameters from describe_items func 2023-01-16 14:36:23 +01:00
Marcin Maliszkiewicz
86dc1bfdb1 utils: throw error on malformed input in base64 decode
We already fixed the case of missing padding but there is also
more generic one where input for decode function contains non
base64 characters.

This is mostly done for alternator purpose, it should discard
the request containing such data and return 400 http error.

Addionally some harmless integer overflow during integer casting
was fixed here. This was attempted to be fixed by 2d33a3f
but since we also implicitly cast to uint8_t the problem persisted.
2023-01-16 14:36:23 +01:00
Marcin Maliszkiewicz
f53c0fd0fc utils: throw error on missing padding in base64 decode
This is done to make alternator behavior more on a pair with dynamodb.
Decode function is used there when processing user requests containing binary
item values. We will now discard improperly formed user input with 400 http error.

It also makes it more consistent as some of our other base64 functions
may have assumed padding is present.

The patch should not break other usages of base64 functions as the only one is
in db/hints where the code already throws std::runtime_error.

Fixes #6487
2023-01-16 14:36:23 +01:00
Michał Sala
bbbe12af43 forward_service: fix timeout support in parallel aggregates
`forward_request` verb carried information about timeouts using
`lowres_clock::time_point` (that came from local steady clock
`seastar::lowres_clock`). The time point was produced on one node and
later compared against other node `lowres_clock`. That behavior
was wrong (`lowres_clock::time_point`s produced with different
`lowres_clock`s cannot be compared) and could lead to delayed or
premature timeout.

To fix this issue, `lowres_clock::time_point` was replaced with
`lowres_system_clock::time_point` in `forward_request` verb.
Representation to which both time point types serialize is the same
(64-bit integer denoting the count of elapsed nanoseconds), so it was
possible to do an in-place switch of those types using logic suggested
by @avikivity:
    - using steady_clock is just broken, so we aren't taking anything
        from users by breaking it further
    - once all nodes are upgraded, it magically starts to work

Closes #12529
2023-01-16 12:08:13 +02:00
Botond Dénes
3d9ab1d9eb Merge 'Get recursive tasks' statuses with task manager api call' from Aleksandra Martyniuk
The PR adds an api call allowing to get the statuses of a given
task and all its descendants.

The parent-child tree is traversed in BFS order and the list of
statuses is returned to user.

Closes #12317

* github.com:scylladb/scylladb:
  test: add test checking recursive task status
  api: get task statuses recursively
  api: change retrieve_status signature
2023-01-16 11:44:50 +02:00
Botond Dénes
969beebe5f reader_concurrency_semaphore: wrap wait list
The wait list will become two lists soon. To keep callers simple (as if
there was still one list) we wrap it with a wrapper which abstracts this
away.
2023-01-16 02:05:27 -05:00
Botond Dénes
8658cfc066 reader_concurrency_semaphore: add {serialize,kill}_limit_multiplier parameters
Propagate the recently added
reader_concurrency_semaphore_{serialize,kill}_limit_multiplier config items
to the semaphore. Not used yet.
2023-01-16 02:05:27 -05:00
Botond Dénes
24d4b484f2 test/boost/reader_concurrency_semaphore_test: dummy_file_impl: don't use hardoced buffer size
In `dma_read_bulk()`, use the `range_size` passed as parameter and have
the callers pass meaningful sizes. We got away with callers passing 0
and using a hard-coded size internally because the tracking file wrapper
used the size of the returned buffer as the basis for memory tracking.
This will soon not be the case and instead the passed-in size will be
used, so this has to be fixed.
2023-01-16 02:05:27 -05:00
Botond Dénes
8b0afc28d4 reader_permit: add make_new_tracked_temporary_buffer()
A separate method for callers of make_tracked_temporary_buffer() who
are creating new empty tracked buffers of a certain size.
make_tracked_temporary_buffer() is about to be changed to be more
targeted at callers who call it with pre-consumed memory units.
2023-01-16 02:05:27 -05:00
Botond Dénes
397266f420 reader_permit: add get_state() accessor 2023-01-16 02:05:27 -05:00
Botond Dénes
87e2bf90b9 reader_permit: resource_units: add constructor for already consumed res 2023-01-16 02:05:27 -05:00
Botond Dénes
d2cfc25494 reader_permit: resource_units: remove noexcept qualifier from constructor
It won't be noexcept soon. Also make it exception safe.
2023-01-16 02:05:27 -05:00
Botond Dénes
7eb093899a db/config: introduce reader_concurrency_semaphore_{serialize,kill}_limit_multiplier
Will be propagated to reader concurrency semaphores. Not wired in yet.
2023-01-16 02:05:27 -05:00
Botond Dénes
a019dbaa34 scylla-gdb.py: scylla-memory: extract semaphore stats formatting code
So it can be shared for the 3 semaphores, instead of repeating the same
open-coded method for each of them.
2023-01-16 02:05:27 -05:00
Botond Dénes
15d6d34cfa scylla-gdb.py: fix spelling of "graphviz" 2023-01-16 02:05:27 -05:00
Tzach Livyatan
073f0f00c6 Add Scylla Summit 2023 in the top banner
Closes #12519
2023-01-16 08:05:20 +02:00
Avi Kivity
5a07641b95 Update python3 submodule (license file fix)
* tools/python3 548e860...279b6c1 (1):
  > create-relocatable-package: s/pyhton3-libs/python3-libs/
2023-01-15 17:59:27 +02:00
Benny Halevy
de3142e540 docs: document the new replace_node_first_boot option
And mention that replacing a node using the legacy
replace_addr* options is deprecated.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:41:44 +02:00
Benny Halevy
d4f1563369 dist/docker: support --replace-node-first-boot
And mention that replace_address_first_boot is deprecated

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:36:09 +02:00
Benny Halevy
1577aa8098 db: config: describe replace_address* options as deprecated
The replace_address options are still supported
But mention in their description that they are now deprecated
and the user should use replace_node_first_boot instead.

While at it fix a typo in ignore_dead_nodes_for_replace

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:36:09 +02:00
Benny Halevy
90faeedb77 test: test_topology: test replace using host_id
Add test cases exercising the --replace-node-first-boot option
by replacing nodes using their host_id rather
than ip address.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:36:09 +02:00
Benny Halevy
7d0d9e28f1 test: pylib: ServerInfo: add host_id
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:36:07 +02:00
Benny Halevy
db2b76beb5 storage_service: get rid of get_replace_address
It is unused now.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:34:29 +02:00
Benny Halevy
17f70e4619 storage_service: is_replacing: rely directly on config options
Rather than on get_replace_address, before we remove the latter.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:34:29 +02:00
Benny Halevy
7282d58d11 storage_service: pass replacement_info to run_replace_ops
So it won't need to call get_replace_address.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:34:09 +02:00
Benny Halevy
08598e4f64 storage_service: pass replacement_info to booststrap
So it won't need to call get_replace_address.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:30:48 +02:00
Benny Halevy
b863f7a75f storage_service: join_token_ring: reuse replacement_info.address
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:30:48 +02:00
Benny Halevy
add2f209b8 storage_service: replacement_info: add replace address
Populate replacement_info.address in prepare_replacement_info
as a first step towards getting rid of get_replace_address().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:30:48 +02:00
Benny Halevy
75c8a5addc init: do not allow cfg.replace_node_first_boot of seed node
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:30:48 +02:00
Benny Halevy
32e79185d4 db: config: add replace_node_first_boot option
For replacing a node given its (now unique) Host ID.

The existing options for replace_address*
will be deprecated in the following patches
and eventually we will stop supporting them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-13 18:30:48 +02:00
Tomasz Grabiec
abc43f97c9 Merge 'Simplify some Raft tables' from Kamil Braun
Rename `system.raft_config` to `system.raft_snapshot_config` to make it clearer
what the table stores.

Remove the `my_server_id` partition key column from
`system.raft_snapshot_config` and a corresponding column from
`system.raft_snapshots` which would store the Raft server ID of the local node.
It's unnecessary, all servers running on a given node in different groups will
use the same ID - the Raft ID of the node which is equal to its Host ID. There
will be no multiple servers running in a single Raft group on the same node.

Closes #12513

* github.com:scylladb/scylladb:
  db: system_keyspace: remove (my_)server_id column from RAFT_SNAPSHOTS and RAFT_SNAPSHOT_CONFIG
  db: system_keyspace: rename 'raft_config' to 'raft_snapshot_config'
2023-01-13 00:23:21 +01:00
Botond Dénes
4e41e7531c docs/dev/debugging.md: recommend open-coredump.sh for opening coredumps
Leave the guide for manual opening in though, the script might not work
in all cases.
Also update the version example, we changed how development versions
look like.

Closes #12511
2023-01-12 19:30:59 +02:00
Botond Dénes
ab8171ffd5 open-coredump.sh: handle dev versions
Like: 5.2.0~dev, which really means master. Don't try to checkout
branch-5.2 in this case, it doesn't exist yet, checkout master instead.

Closes #12510
2023-01-12 19:28:58 +02:00
Kamil Braun
be390285b6 db: system_keyspace: remove (my_)server_id column from RAFT_SNAPSHOTS and RAFT_SNAPSHOT_CONFIG
A single node will run a single Raft server in any given Raft group,
so this column is not necessary.
2023-01-12 16:48:50 +01:00
Kamil Braun
bed555d1e5 db: system_keyspace: rename 'raft_config' to 'raft_snapshot_config'
Make it clear that the table stores the snapshot configuration, which is
not necessarily the currently operating configuration (the last one
appended to the log).

In the future we plan to have a separate virtual table for showing the
currently operating configuration, perhaps we will call it
`system.raft_config`.
2023-01-12 16:21:26 +01:00
Botond Dénes
f87e3993ef Merge 'configure.py: a bunch of clean-up changes' from Michał Chojnowski
The planned integration of cross-module optimizations in scylladb/scylladb-enterprise requires several changes to `configure.py`. To minimize the divergence between the `configure.py`s of both repositories, this series upstreams some of these changes to scylladb/scylladb.

The changes mostly remove dead code and fix some traps for the unaware.

Closes #12431

* github.com:scylladb/scylladb:
  configure.py: prevent deduplication of seastar compile options
  configure.py: rename clang_inline_threshold()
  configure.py: rework the seastar_cflags variable
  configure.py: hoist the pkg_config() call for seastar-testing.pc
  configure.py: unify the libs variable for tests and non-tests
  configure.py: fix indentation
  configure.py: remove a stale code path for .a artifacts
2023-01-12 16:40:02 +02:00
Wojciech Mitros
082bfea187 rust: use depfile and Cargo.lock to avoid building rust when unnecessary
Currently, we call cargo build every time we build scylla, even
when no rust files have been changed.
This is avoided by adding a depfile to the ninja rule for the rust
library.
The rust file is generated by default during cargo build,
but it uses the full paths of all depenencies that it includes,
and we use relative paths. This is fixed by specifying
CARGO_BUILD_DEP_INFO_BASEDIR='.', which makes it so the current
path is subtracted from all generated paths.
Instead of using 'always' when specifying when to run the cargo
build, a dependency on Cargo.lock is added additionally to the
depfile. As a result, the rust files are recompiled not only
when the source files included in the depfile are modified,
but also when some rust dependency is updated.
Cargo may put an old cached file as a result of the build even
when the Cargo.lock was recently updated. Because of that, the
the build result may be older than the Cargo.lock file even
if the build was just performed. This may cause ninja to rebuilt
the file every following time. To avoid this, we 'touch' the
build result, so that its last modification time is up to date.
Because the dependency on Cargo.lock was added, the new command
for the build does not modify it. Instead, the developer must
update it when modifying the dependencies - the docs are updated
to reflect that.

Closes #12489

Fixes #12508
2023-01-12 14:44:11 +02:00
Kefu Chai
77baea2add docs/architecture: fix typo of SyllaDB
s/SyllaDB/ScyllaDB/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12505
2023-01-12 12:25:53 +02:00
Michał Chojnowski
1ff4abef4a configure.py: prevent deduplication of seastar compile options
In its infinite wisdom, CMake deduplicates the options passed
to `target_compile_options`, making it impossible to pass options which require
duplication, such as -mllvm.
Passing e.g.
`-mllvm;-pgso=false;-mllvm;-inline-threshold=2500` invokes the compiler
`-mllvm -pgso=false -inline-threshold=2500`, breaking the options.

As a workaround, CMake added the `SHELL:` syntax, which makes it possible to
pass the list of options not as a CMake list, but as a shell-quoted string.
Let's use it, so we can pass multiple -mllvm options.
2023-01-12 11:24:10 +01:00
Michał Chojnowski
85facefe45 configure.py: rename clang_inline_threshold()
There's a global variable (the CLI argument) with the same name.
Rename one of the two to avoid accidental mixups.
2023-01-12 11:24:10 +01:00
Michał Chojnowski
d9de78f6d3 configure.py: rework the seastar_cflags variable
The name of this variable is misleading. What it really does is pass flags to
static libraries compiled by us, not just to seastar.
We will need this capability to implement cross-artifact optimizations in our
build.
We will also need to pass linker flags, and we will need to vary those flags
depending on the build mode.

This patch splits the seastar_cflags variable into per-mode lib_cflags and
lib_ldflags variables. It shouldn't change the resulting build.ninja for now,
but will be needed by later planned patches.
2023-01-12 11:24:10 +01:00
Michał Chojnowski
ee462a9d3c configure.py: hoist the pkg_config() call for seastar-testing.pc
Put the pkg_config() for seastar-testing.pc in the same area as the call
for seastar.pc, outside of the loop.
This is a cosmetic change aimed at making following commits cleaner.
2023-01-12 11:24:10 +01:00
Michał Chojnowski
c9aeeeae11 configure.py: unify the libs variable for tests and non-tests
This is a cosmetic change aimed at make following commits in the same area
cleaner.
2023-01-12 11:24:09 +01:00
Michał Chojnowski
10ac881ef1 configure.py: fix indentation
Fix indentation after the preceeding commit.
2023-01-12 11:23:32 +01:00
Michał Chojnowski
be419adaf8 configure.py: remove a stale code path for .a artifacts
Scylla haven't had `.a` artifacts for a long time (since the Urchin days,
I believe), and the piece of code responsible for them is stale and untested.
Remove it.
2023-01-12 11:22:49 +01:00
Botond Dénes
8a86f8d4ef gdbinit: add ignore clause for SIG35
Another real-time even often raised in scylla, making debugging a live
process annoying.

Closes #12507
2023-01-12 12:13:04 +02:00
Avi Kivity
7a8a442c1e transport: drop some dead code around v1 and v2 protocols
In 424dbf43f ("transport: drop cql protocol versions 1 and 2"),
we dropped support for protocols 1 and 2, but some code remains
that checks for those versions. It is now dead code, so remove it.

Closes #12497
2023-01-12 12:52:19 +02:00
Avi Kivity
4de2524a42 build: update toolchain for scylla-driver package
Pull updated scylla-driver package, fixing an IP change related
bug [1].

[1] https://github.com/scylladb/python-driver/issues/198

Closes #12501
2023-01-11 22:16:35 +02:00
Nadav Har'El
7192283172 Merge 'doc: add the upgrade guide for ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2' from Anna Stuchlik
Fix https://github.com/scylladb/scylladb/issues/12315

This PR adds the upgrade guide from ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2.
Instead of adding separate guides per platform, I've merged the information to create one platform-agnostic guide, similar to what we did for [OSS->OSS](https://docs.scylladb.com/stable/upgrade/upgrade-opensource/upgrade-guide-from-5.0-to-5.1/) and [Enterprise->Enterprise ](https://github.com/scylladb/scylladb/pull/12339)guides.

Closes #12450

* github.com:scylladb/scylladb:
  doc: add the new upgrade guide to the toctree and fix its name
  docs: add the upgrade guide from ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2
2023-01-11 21:01:34 +02:00
Avi Kivity
cb2cb8a606 utils: small_vector: mark throw_out_of_range() const
It can be called from the const version of small_vector::at.

Closes #12493
2023-01-11 20:58:53 +02:00
Nadav Har'El
04d6402780 docs: cql-extensions.md: explain our NULL handling
Our handling of NULLs in expressions is different from Cassandra's,
and more uniform. For example, the filter "WHERE x = NULL" is an
error in Cassandra, but supported in Scylla. Let's explain how and why.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12494
2023-01-11 20:56:50 +02:00
Wojciech Mitros
95031074a5 configure: fix the order of rust header generation
Currently, no rule enforces that the cxx.h rust header
is generated before compiling the .cc files generated
from rust. This patch adds this dependency.

Closes #12492
2023-01-11 16:55:53 +02:00
Botond Dénes
210738c9ce Merge 'test.py: improve logging' from Kamil Braun
Make it easy to see which clusters are operated on by which tests in which build modes and so on.
Add some additional logs.

These improvements would have saved me a lot of debugging time if I had them last week and we would have https://github.com/scylladb/scylladb/pull/12482 much faster.

Closes #12483

* github.com:scylladb/scylladb:
  test.py: harmonize topology logs with test.py format
  test/pylib: additional logging during cluster setup
  test/pylib: prefix cluster/manager logs with the current test name
  test/pylib: pool: pass *args and **kwargs to the build function from get()
  test.py: include mode in ScyllaClusterManager logs
2023-01-11 16:32:56 +02:00
Aleksandra Martyniuk
fcb3f76e78 test: add test checking recursive task status
Rest api test checking whether task manager api returns recursive tasks'
statuses properly in BFS order.
2023-01-11 12:34:17 +01:00
Aleksandra Martyniuk
6b79c92cb7 api: get task statuses recursively
Sometimes to debug some task manager module, we may want to inspect
the whole tree of descendants of some task.

To make it easier, an api call getting a list of statuses of the requested
task and all its descendants in BFS order is added.
2023-01-11 12:34:06 +01:00
Konstantin Osipov
f3440240ee test.py: harmonize topology logs with test.py format
We need millisecond resolution in the log to be able to
correlate test log with test.py log and scylla logs. Harmonize
the log format for tests which actively manage scylla servers.
2023-01-11 10:09:42 +01:00
Kamil Braun
79712185d5 test/pylib: additional logging during cluster setup
This would have saved me a lot of debugging time.
2023-01-11 10:09:42 +01:00
Kamil Braun
4f7e5ee963 test/pylib: prefix cluster/manager logs with the current test name
The log file produced by test.py combines logs coming from multiple
concurrent test runs. Each test has its own log file as well, but this
"global" log file is useful when debugging problems with topology tests,
since many events related to managing clusters are stored there.

Make the logs easier to read by including information about the test case
that's currently performing operations such as adding new servers to
clusters and so on. This includes the mode, test run name and the name
of the test case.

We do this by using custom `Logger` objects (instead of calling
`logging.info` etc. which uses the root logger) with `LoggerAdapter`s
that include the prefixes. A bit of boilerplate 'plumbing' through
function parameters is required but it's mostly straightforward.

This doesn't apply to all events, e.g. boost test cases which don't
setup a "real" Scylla cluster. These events don't have additional
prefixes.

Example:
```

17:41:43.531 INFO> [dev/topology.test_topology.1] Cluster ScyllaCluster(name: 7a414ffc-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(1, 127.40.246.1, 29c4ec73-8912-45ca-ae19-8bfda701a6b5), ScyllaServer(4, 127.40.246.4, 75ae2afe-ff9b-4760-9e19-cd0ed8d052e7), ScyllaServer(7, 127.40.246.7, 67a27df4-be63-4b4c-a70c-aeac0506304f), stopped: ) adding server...
17:41:43.531 INFO> [dev/topology.test_topology.1] installing Scylla server in /home/kbraun/dev/scylladb/testlog/dev/scylla-10...
17:41:43.603 INFO> [dev/topology.test_topology.1] starting server at host 127.40.246.10 in scylla-10...
17:41:43.614 INFO> [dev/topology.test_topology.2] Cluster ScyllaCluster(name: 7a497fce-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(2, 127.40.246.2, f59d3b1d-efbb-4657-b6d5-3fa9e9ef786e), ScyllaServer(5, 127.40.246.5, 9da16633-ce53-4d32-8687-e6b4d27e71eb), ScyllaServer(9, 127.40.246.9, e60c69cd-212d-413b-8678-dfd476d7faf5), stopped: ) adding server...
17:41:43.614 INFO> [dev/topology.test_topology.2] installing Scylla server in /home/kbraun/dev/scylladb/testlog/dev/scylla-11...
17:41:43.670 INFO> [dev/topology.test_topology.2] starting server at host 127.40.246.11 in scylla-11...
```
2023-01-11 10:09:39 +01:00
Avi Kivity
de0c31b3b6 cql3: query_options: simplify batch query_options constructor
The batch constructor uses an unnecessarily complicated template,
where in fact it only vector<vector<raw_value | raw_value_view>>.

Simplify the constructor to allow exactly that. Delete some confusing
comments around it.

Closes #12488
2023-01-11 07:54:54 +02:00
Kamil Braun
2bda0f9830 test/pylib: pool: pass *args and **kwargs to the build function from get()
This will be used to specify a custom logger when building new clusters
before starting tests, allowing to easily pinpoint which tests are
waiting for clusters to be built and what's happening to these
particular clusters.
2023-01-10 17:41:54 +01:00
Kamil Braun
ff2c030bf9 test.py: include mode in ScyllaClusterManager logs
The logs often mention the test run and the current test case in a given
run, such as `test_topology.1` and
`test_topology.1::test_add_server_add_column`. However, if we run
test.py in multiple modes, the different modes might be running the same
test case and the logs become confusing. To disambiguate, prefix the
test run/case names with the mode name.

Example:
```
Leasing Scylla cluster ScyllaCluster(name: 7a414ffc-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(1, 127.40.246.1, 29c4ec73-8912-45ca-ae19-8bfda701a6b5), ScyllaServer(4, 127.40.246.4, 75ae2afe-ff9b-4
760-9e19-cd0ed8d052e7), ScyllaServer(7, 127.40.246.7, 67a27df4-be63-4b4c-a70c-aeac0506304f), stopped: ) for test dev/topology.test_topology.1::test_add_server_add_column
```
2023-01-10 17:41:54 +01:00
Wojciech Mitros
e558c7d988 functions: initialize aggregates on scylla start
Currently, UDAs can't be reused if Scylla has been
restarted since they have been created. This is
caused by the missing initialization of saved
UDAs that should have inserted them to the
cql3::functions::functions::_declared map, that
should store all (user-)created functions and
aggregates.

This patch adds the missing implementation in a way
that's analogous to the method of inserting UDF to
the _declared map.

Fixes #11309
2023-01-10 17:44:18 +02:00
Wojciech Mitros
d1b809754c database: wrap lambda coroutines used as arguments in coroutine::lambda
Using lambda coroutines as arguments can lead to a use-after-free.
Currently, the way these lambdas were used in do_parse_schema_tables
did not lead to such a problem, but it's better to be safe and wrap
them in coroutine::lambda(), so that they can't lead to this problem
as long as we ensure that the lambda finishes in the
do_parse_schema_tables() statement (for example using co_await).

Closes #12487
2023-01-10 17:24:52 +02:00
Nadav Har'El
0edb090c67 test/cql-pytest: add simple tests for SELECT DISTINCT
This patch adds a few simple functional test for the SELECT DISTINCT
feature, and how it interacts with other features especiall GROUP BY.

2 of the 5 new tests are marked xfail, and reproduce one old and one
newly-discovered issue:

Refs #5361: LIMIT doesn't work when using GROUP BY (the test here uses
            LIMIT and GROUP BY together with SELECT DISTINCT, so the
            LIMIT isn't honored).

Refs #12479: SELECT DISTINCT doesn't refuse GROUP BY with clustering
             column.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12480
2023-01-10 13:29:26 +02:00
Michał Radwański
dcab289656 boost/mvcc_test: use failure_injecting_allocation_strategy where it is meant to
In test_apply_is_atomic, a basic form of exception testing is used.
There is failure_injecting_allocation_strategy, which however is not
used for any allocation, since for some reason,
`with_allocator(r.allocator()` is used instead of
`with_allocator(alloc`. Fix that.

Closes #12354
2023-01-10 12:01:36 +01:00
Tomasz Grabiec
ebcd736343 cache: Fix undefined behavior when populating with non-full keys
Regression introduced in 23e4c8315.

view_and_holder position_in_partiton::after_key() triggers undefined
behavior when the key was not full because the holder is moved, which invalidates the view.

Fixes #12367

Closes #12447
2023-01-10 12:51:54 +02:00
Jan Ciolek
8d7e35caef cql3: expr: remove reference to temporary in get_rhs_receiver
The function underlying_type() returns an data_type by value,
but the code assigned it to a reference.

At first I was sure this is an error
(assigning temporary value to a reference), but it turns out
that this is most likely correct due to C++ lifetime
extension rules.

I think it's better to avoid such unituitive tricks.
Assigning to value makes it clearer that the code
is correct and there are no dangling references.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #12485
2023-01-10 09:42:49 +02:00
Raphael "Raph" Carvalho
407c7fdaf2 docs: Fix command to create a symbolic link to relocatable pkg dir
Closes #12481
2023-01-10 07:09:14 +02:00
Kamil Braun
822410c49b test/pylib: scylla_cluster: release IPs when cluster is no longer needed
With sufficiently many test cases we would eventually run out of IP
addresses, because IPs (which are leased from a global host registry)
would only be released at the end of an entire test suite.

In fact we already hit this during next promotions, causing much pain
indeed.

Release IPs when a cluster, after being marked dirty, is stopped and
thrown away.

Closes #12482
2023-01-10 06:59:41 +02:00
Avi Kivity
e71e1dc964 Merge 'tools/scylla-sstable: add lua scripting support' from Botond Dénes
Introduce a new "script" operation, which loads a script from the specified path, then feeds the mutation fragment stream to it. The script can then extract, process and present information from the sstable as it wishes.
For now only Lua scripts are supported for the simple reason that Lua is easy to write bindings for, it is simple and lightweight and more importantly we already have Lua included in the Scylla binary as it is used as the implementation language for UDF/UDA. We might consider WASM support in the future, but for now we don't have any language support in WASM available.

Example:
```lua
function new_stats(key)
    return {
        partition_key = key,
        total = 0,
        partition = 0,
        static_row = 0,
        clustering_row = 0,
        range_tombstone_change = 0,
    };
end

total_stats = new_stats(nil);

function inc_stat(stats, field)
    stats[field] = stats[field] + 1;
    stats.total = stats.total + 1;
    total_stats[field] = total_stats[field] + 1;
    total_stats.total = total_stats.total + 1;
end

function on_new_sstable(sst)
    max_partition_stats = new_stats(nil);
    if sst then
        current_sst_filename = sst.filename;
    else
        current_sst_filename = nil;
    end
end

function consume_partition_start(ps)
    current_partition_stats = new_stats(ps.key);
    inc_stat(current_partition_stats, "partition");
end

function consume_static_row(sr)
    inc_stat(current_partition_stats, "static_row");
end

function consume_clustering_row(cr)
    inc_stat(current_partition_stats, "clustering_row");
end

function consume_range_tombstone_change(crt)
    inc_stat(current_partition_stats, "range_tombstone_change");
end

function consume_partition_end()
    if current_partition_stats.total > max_partition_stats.total then
        max_partition_stats = current_partition_stats;
    end
end

function on_end_of_sstable()
    if current_sst_filename then
        print(string.format("Stats for sstable %s:", current_sst_filename));
    else
        print("Stats for stream:");
    end
    print(string.format("\t%d fragments in %d partitions - %d static rows, %d clustering rows and %d range tombstone changes",
        total_stats.total,
        total_stats.partition,
        total_stats.static_row,
        total_stats.clustering_row,
        total_stats.range_tombstone_change));
    print(string.format("\tPartition with max number of fragments (%d): %s - %d static rows, %d clustering rows and %d range tombstone changes",
        max_partition_stats.total,
        max_partition_stats.partition_key,
        max_partition_stats.static_row,
        max_partition_stats.clustering_row,
        max_partition_stats.range_tombstone_change));
end
```
Running this script wilt yield the following:
```
$ scylla sstable script --script-file fragment-stats.lua --system-schema system_schema.columns /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/me-1-big-Data.db
Stats for sstable /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f//me-1-big-Data.db:
        397 fragments in 7 partitions - 0 static rows, 362 clustering rows and 28 range tombstone changes
        Partition with max number of fragments (180): system - 0 static rows, 179 clustering rows and 0 range tombstone changes
```

Fixes: https://github.com/scylladb/scylladb/issues/9679

Closes #11649

* github.com:scylladb/scylladb:
  tools/scylla-sstable: consume_reader(): improve pause heuristincs
  test/cql-pytest/test_tools.py: add test for scylla-sstable script
  tools: add scylla-sstable-scripts directory
  tools/scylla-sstable: remove custom operation
  tools/scylla-sstable: add script operation
  tools/sstable: introduce the Lua sstable consumer
  dht/i_partitioner.hh: ring_position_ext: add weight() accessor
  lang/lua: export Scylla <-> lua type conversion methods
  lang/lua: use correct lib name for string lib
  lang/lua: fix type in aligned_used_data (meant to be user_data)
  lang/lua: use lua_State* in Scylla type <-> Lua type conversions
  tools/sstable_consumer: more consistent method naming
  tools/scylla-sstable: extract sstable_consumer interface into own header
  tools/json_writer: add accessor to underlying writer
  tools/scylla-sstable: fix indentation
  tools/scylla-sstable: export mutation_fragment_json_writer declaration
  tools/scylla-sstable: mutation_fragment_json_writer un-implement sstable_consumer
  tools/scylla-sstable: extract json writing logic from json_dumper
  tools/scylla-sstable: extract json_writer into its own header
  tools/scylla-sstable: use json_writer::DataKey() to write all keys
  tools/scylla-types: fix use-after-free on main lambda captures
2023-01-09 20:54:42 +02:00
Raphael S. Carvalho
05ffb024bb replica: Kill table::calculate_shard_from_sstable_generation()
Inferring shard from generation is long gone. We still use it in
some scripts, but that's no longer needed in Scylla, when loading
the SSTables, and it also conflicts with ongoing work of UUID-based
generations.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12476
2023-01-09 20:17:57 +02:00
Takuya ASADA
548c9e36a1 main: add tcp_timestamps sanity check
Check net.ipv4.tcp_timestamps, show warning message when it's not set to 1.

Fixes #12144

Closes #12199
2023-01-09 19:08:21 +02:00
Nadav Har'El
d6e6820f33 Merge 'Drop support for cql binary protocols versions 1 and 2' from Avi Kivity
The CQL binary protocol version 3 was introduced in 2014. All Scylla
version support it, and Cassandra versions 2.1 and newer.

Versions 1 and 2 have 16-bit collection sizes, while protocol 3 and newer
use 32-bit collection sizes.

Unfortunately, we implemented support for multiple serialization formats
very intrusively, by pushing the format everywhere. This avoids the need
to re-serialize (sometimes) but is quite obnoxious. It's also likely to be
broken, since it's almost untested and it's too easy to write
cql_serialization_format::internal() instead of propagating the client
specified value.

Since protocols 1 and 2 are obsolete for 9 years, just drop them. It's
easy to verify that they are no longer in use on a running system by
examining the `system.clients` table before upgrade.

Fixes #10607

Closes #12432

* github.com:scylladb/scylladb:
  treewide: drop cql_serialization_format
  cql: modification_statement: drop protocol check for LWT
  transport: drop cql protocol versions 1 and 2
2023-01-09 18:52:41 +02:00
Botond Dénes
bd42da6e69 tools/scylla-sstable: consume_reader(): improve pause heuristincs
The consume loop had some heuristics in place to determine whether after
pausing, the consumer wishes to skip just the partition or the remaining
content of the sstable. This heuristics was flawed so replace it with a
non-heuristic method: track the last consumed fragment and look at this
to determine what should be done.
2023-01-09 09:46:57 -05:00
Botond Dénes
1d222220e0 test/cql-pytest/test_tools.py: add test for scylla-sstable script
To test the script operation, we use some of the example scripts from
the example directory. Namely, dump.lua and slice.lua. These two scripts
together have a very good coverage of the entire script API. Testing
their functionality therefore also provides a good coverage of the lua
bindings. A further advantage is that since both scripts dump output in
identical format to that of the data-dump operation, it is trivial to do
a comparison against this already tested operation.
A targeted test is written for the sstable skip functionality of the
consumer API.
2023-01-09 09:46:57 -05:00
Botond Dénes
ace42202df tools: add scylla-sstable-scripts directory
To be the home of example scripts for scylla-sstable. For now only a
README.md is added describing the directory's purpose and with links to
useful resources.
One example script is added in this patch, more will come later.
2023-01-09 09:46:57 -05:00
Botond Dénes
7b40463f29 tools/scylla-sstable: remove custom operation
We now have a script operation, the custom operation (poor man's script
operation) has no reason to exist anymore.
2023-01-09 09:46:57 -05:00
Botond Dénes
e5071fdeab tools/scylla-sstable: add script operation
Loads the script from the specified path, then feeds the mutation
fragment stream to it. For now only Lua scripts are supported for the
simple reason that Lua is easy to write bindings for, it is simple and
lightweight and more importantly we already have Lua included in the
Scylla binary as it is used as the implementation language for UDF/UDA.
We might consider WASM support in the future, but for now we don't have
any language support in WASM available.
2023-01-09 09:46:57 -05:00
Botond Dénes
9dd5107919 tools/sstable: introduce the Lua sstable consumer
The Lua sstable consumer loads a script from the specified path then
feeds the mutation fragment stream to the script via the
sstable_consumer methods, each method of which the script is allowed to
define, effectively overloading the virtual method in Lua.
This allows for very wide and flexible customization opportunities for
what to extract from sstables and how to process and present them,
without the need to recompile the scylla-sstable tool.
2023-01-09 09:46:57 -05:00
Botond Dénes
50b155e706 dht/i_partitioner.hh: ring_position_ext: add weight() accessor 2023-01-09 09:46:57 -05:00
Botond Dénes
8699fe5001 lang/lua: export Scylla <-> lua type conversion methods
Currently hidden in lang/lua.cc, declare these in a header so others can
use it.
2023-01-09 09:46:57 -05:00
Botond Dénes
e9a52837cf lang/lua: use correct lib name for string lib
AFAIK the mistake had no real consequence, but still it is nicer to have
it correct.
2023-01-09 09:46:57 -05:00
Botond Dénes
76663d7774 lang/lua: fix type in aligned_used_data (meant to be user_data) 2023-01-09 09:46:57 -05:00
Botond Dénes
943fc3b6f3 lang/lua: use lua_State* in Scylla type <-> Lua type conversions
Instead of the lua_slice_state which is local to this file. We want to
reuse the Scylla type <-> Lua type conversion functions but for that
they have to use the more generic lua_State*. No functionality or
convenience is lost with the switch, the code didn't make use of the
other fields bundled in lua_slice_state.
2023-01-09 09:46:57 -05:00
Botond Dénes
8045751867 tools/sstable_consumer: more consistent method naming
Use `consume_` consistently across the entire interface, instead of having
some methods with `on_` and others with `consume_` prefixes.
2023-01-09 09:46:57 -05:00
Botond Dénes
8e117501ac tools/scylla-sstable: extract sstable_consumer interface into own header
So it can be used in code outside scylla-sstable.cc. This source file is
quite large already, and as we have yet another large chunk of code to
add, we want to add it in a separate file.
2023-01-09 09:46:57 -05:00
Botond Dénes
9b1c486051 tools/json_writer: add accessor to underlying writer 2023-01-09 09:46:57 -05:00
Botond Dénes
cfb5afbe9b tools/scylla-sstable: fix indentation
Left broken by previous patches.
2023-01-09 09:46:57 -05:00
Botond Dénes
d42b0bb5d5 tools/scylla-sstable: export mutation_fragment_json_writer declaration
To json_writer.hh. Method definition are left in scylla-sstable.cc.
Indentation is left broken, will be fixed by the next patch.
2023-01-09 09:46:57 -05:00
Botond Dénes
517135e155 tools/scylla-sstable: mutation_fragment_json_writer un-implement sstable_consumer
There is no point in the former implementing said interface. For one it
is a futurized interface, which is not needed for something writing to
the stdout. Rename the methods to follow the naming convention of rjson
writers more closely.
2023-01-09 09:46:57 -05:00
Botond Dénes
0ee1c6ca57 tools/scylla-sstable: extract json writing logic from json_dumper
We want to split this class into two parts: one with the actual logic
converting mutation fragments to json, and a wrapper over this one,
which implements the sstable_consumer interface.
As a first step we extract the class as is (no changes) and just forward
all-calls from now empty wrapper to it.
2023-01-09 09:46:57 -05:00
Botond Dénes
55ef0ed421 tools/scylla-sstable: extract json_writer into its own header
Other source files will want to use it soon.
2023-01-09 09:46:57 -05:00
Botond Dénes
8623818a8d tools/scylla-sstable: use json_writer::DataKey() to write all keys
This method was renamed from its previous name of PartitionKey. Since in
json partition keys and clustering keys look alike, with the only
difference being that the former may also have a token, it makes to have
a single method to write them (with an optional token parameter). This
was the case at some point, json_dumper::write_key() taking this role.
However at a later point, json_writer::PartitionKey() was introduced and
now the code uses both. Standardize on the latter and give it a more
generic name.
2023-01-09 09:46:57 -05:00
Botond Dénes
602fca0a12 tools/scylla-types: fix use-after-free on main lambda captures
The main lambda of scylla-types, the one passed to app_template::run()
was recently made a coroytine. app_template::run() however doesn't keep
this lambda alive and hence after the first suspention point, accessing
the lambda's captures triggers use-after-free.
The simple fix is to convert the coroutine into continuation chain.
2023-01-09 09:46:57 -05:00
Tomasz Grabiec
f97268d8f2 row_cache: Fix violation of the "oldest version are evicted first" when evicting last dummy
Consider the following MVCC state of a partition:

   v2: ==== <7> [entry2] ==== <9> ===== <last dummy>
   v1: ================================ <last dummy> [entry1]

Where === means a continuous range and --- means a discontinuous range.

After two LRU items are evicted (entry1 and entry2), we will end up with:

   v2: ---------------------- <9> ===== <last dummy>
   v1: ================================ <last dummy> [entry1]

This will cause readers to incorrectly think there are no rows before
entry <9>, because the range is continuous in v1, and continuity of a
snapshot is a union of continuous intervals in all versions. The
cursor will see the interval before <9> as continuous and the reader
will produce no rows.

This is only temporary, because current MVCC merging rules are such
that the flag on the latest entry wins, so we'll end up with this once
v1 is no longer needed:

   v2: ---------------------- <9> ===== <last dummy>

...and the reader will go to sstables to fetch the evicted rows before
entry <9>, as expected.

The bug is in rows_entry::on_evicted(), which treats the last dummy
entry in a special way, and doesn't evict it, and doesn't clear the
continuity by omission.

The situation is not easy to trigger because it requires certain
eviction pattern concurrent with multiple reads of the same partition
in different versions, so across memtable flushes.

Closes #12452
2023-01-09 16:10:52 +02:00
Avi Kivity
1bb1855757 Merge 'replica/database: fix read related metrics' from Botond Dénes
Sstable read related metrics are broken for a long time now. First, the introduction of inactive reads (https://github.com/scylladb/scylladb/issues/1865) diluted this metric, as it now also contained inactive reads (contrary to the metric's name). Then, after moving the semaphore in front of the cache (3d816b7c1) this metric became completely broken as this metric now contains all kinds of reads: disk, in-memory and inactive ones too.
This series aims to remedy this:
* `scylla_database_active_reads` is fixed to only include active reads.
* `scylla_database_active_reads_memory_consumption` is renamed to `scylla_database_reads_memory_consumption` and its description is brought up-to-date.
* `scylla_database_disk_reads` is added to track current reads that are gone to disk.
* `scylla_database_sstables_read` is added to track the number of sstables read currently.

Fixes: https://github.com/scylladb/scylladb/issues/10065

Closes #12437

* github.com:scylladb/scylladb:
  replica/database: add disk_reads and sstables_read metrics
  sstables: wire in the reader_permit's sstable read count tracking
  reader_concurrency_semaphore: add disk_reads and sstables_read stats
  replica/database: fix active_reads_memory_consumption_metric
  replica/database: fix active_reads metric
2023-01-09 12:18:49 +02:00
Pavel Emelyanov
e20738cd7d azure_snitch: Handle empty zone returned from IMDS
Azure metadata API may return empty zone sometimes. If that happens
shard-0 gets empty string as its rack, but propagates UNKNOWN_RACK to
other shards.

Empty zones response should be handled regardless.

refs: #12185

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12274
2023-01-09 11:57:45 +02:00
Nadav Har'El
2d845b6244 test/cql-pytest: a test for more than one equality in WHERE
Cassandra refuses a request with more than one equality relation to the
same column, for example

    DELETE FROM tbl WHERE partitionKey = ? AND partitionKey = ?

It complains that

    partitionkey cannot be restricted by more than one relation if it
    includes an Equal

Currently, Scylla doesn't consider such requests an error. Whether or
not we should be compatible with Cassandra here is discussed in
issue #12472. But as long as we do accept this query, we should be
sure we do the right thing: "WHERE p = 1 AND p = 2" should match
nothing (not the first, or last, value being tested..), and "WHERE p = 1
AND p = 1" should match the matches of p = 1. This patch adds a test
for verify that these requests indeed yield correct results. The
test is scylla_only because, as explained above, Cassandra doesn't
support this feature at all.

Refs #12472

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12473
2023-01-09 11:56:39 +02:00
Anna Stuchlik
b61515c871 doc: replace Scylla with ScyllaDB on the menu tree and major links; related: https://github.com/scylladb/scylla-docs/issues/3962
Closes #12456
2023-01-09 08:39:50 +02:00
Avi Kivity
42575340ba Update seastar submodule
* seastar ca586cfb8d...8889cbc198 (14):
  > http: request_parser: fix grammar ambiguity in field_content
Fixes #12468
  > sstring: use fold expression to simply copy_str_to()
  > sstring: use fold expression to simply str_len()
  > metrics: capture by move in make_function()
  > metrics: replace homebrew is_callable<> with is_invocable_v<>
  > reactor: use std::move() to avoid copy.
  > reactor: remove redundant semicolon.
  > reactor: use mutable to make std::move() work.
  > build: install liburing explicitly on ArchLinux.
  > reactor: use a for loop for submitting ios
  > metrics: add spaces around '='
  > parallel utils: align concept with implementation
  > reactor: s/resize(0)/clear()/
  > reactor: fix a typo in comment

Closes #12469
2023-01-08 18:56:00 +02:00
Alejo Sanchez
d632e1aa7a test/pytest: add missing import, remove unused import
Add missed import time and remove unused name import.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #12446
2023-01-08 17:38:46 +02:00
Avi Kivity
5ffe4fee6d Merge 'Remove legacy half reverse' from Michał Radwański
This commit removes consume_in_reverse::legacy_half_reverse, an option
once used to indicate that the given key ranges are sorted descending,
based on the clustering key of the start of the range, and that the
range tombstones inside partition would be sorted (descending, as all
the mutation fragments would) according to their end (but range
tombstone would still be stored according to their start bound).

As it turns out, mutation::consume, when called with legacy_half_reverse
option produces invalid fragment stream, one where all the row
tombstone changes come after all the clustering rows. This was not an
issue, since when constructing results from the query, Scylla would not
pass the tombstones to the client, but instead compact data beforehand.

In this commit, the consume_in_reverse::legacy_half_reverse is removed,
along with all the uses.

As for the swap out in mutation_partition.cc in query_mutation and
to_data_query_result:

The downstream was not prepared to deal with legacy_half_reverse.
mutation::consume contains

```
     if (reverse == consume_in_reverse::yes) {
         while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::yes>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
             co_await yield();
        }
     } else {
         while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::no>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
             co_await yield();
         }
     }
```

So why did it work at all? to_data_query_result deals with a single slice.
The used consumer (compact_for_query_v2) compacts-away the range tombstone
changes, and thus the only difference between the consume_in_reverse::no
and consume_in_reverse::yes was that one was ordered increasing wrt. ckeys
and the second one was ordered decreasing. This property is maintained if
we swap out for the consume_in_reverse::yes format.

Refs: #12353

Closes #12453

* github.com:scylladb/scylladb:
  mutation{,_consumer,_partition}: remove consume_in_reverse::legacy_half_reverse
  mutation_partition_view: treat query::partition_slice::option::reversed in to_data_query_result as consume_in_reverse::yes
  mutation: move consume_in_reverse def to mutation_consumer.hh
2023-01-08 15:42:00 +02:00
Botond Dénes
c4688563e3 sstables: track decompressed buffers
Convert decompressed temporary buffers into tracked buffers just before
returning them to the upper layer. This ensures these buffers are known
to the reader concurrency semaphore and it has an accurate view of the
actual memory consumption of reads.

Fixes: #12448

Closes #12454
2023-01-08 15:34:28 +02:00
Kamil Braun
b77df84543 test: test_topology: make test_nodes_with_different_smp less hacky
The test would use a trick to start a separate Scylla cluster from the
one provided originally by the test framework. This is not supported by
the test framework and may cause unexpected problems.

Change the test to perform regular node operations. Instead of starting
a fresh cluster of 3 nodes, we join the first of these nodes to the
original framework-provided cluster, then decommission the original
nodes, then bootstrap the other 2 fresh nodes.

Also add some logging to the test.

Refs: #12438, #12442

Closes #12457
2023-01-08 15:33:17 +02:00
Avi Kivity
02c9968e73 Merge 'Add WASM UDF implementation in Rust' from Wojciech Mitros
This series adds the implementation and usage of rust wasmtime bindings.

The WASM UDFs introduced by this patch are interruptable and use memory allocated using the seastar allocator.

This series includes #11102 (the first two commits) because #11102 required disabling wasm UDFs completely. This patch disables them in the middle of the series, and enables them again at the end.
After this patch, `libwasmtime.a` can be removed from the toolchain.
This patch also removes the workaround for #https://github.com/scylladb/scylladb/issues/9387 but it hasn't been tested with ARM yet - if the ARM test causes issues I'll revert this part of the change.

Closes #11351

* github.com:scylladb/scylladb:
  build: remove references to unused c bindings of wasmtime
  test: assert that WASM allocations can fail without crashing
  wasm: limit memory allocated using mmap
  wasm: add configuration options for instance cache and udf execution
  test: check that wasmtime functions yield
  wasm: use the new rust bindings of wasmtime
  rust: add Wasmtime bindings
  rust: add build profiles more aligned with ninja modes
  rust: adjust build according to cxxbridge's recommendations
  tools: toolchain: dbuild: prepare for sharing cargo cache
2023-01-08 15:31:09 +02:00
Nadav Har'El
f5cda3cfc3 test/cql-pytest: add more tests for "timestamp" column type
In issue #3668, a discussion spanning several years theorized that several
things are wrong with the "timestamp" type. This patch begins by adding
several tests that demonstrate that Scylla is in fact behaving correctly,
and mostly identically to Cassandra except one esoteric error handling
case.

However, after eliminating the red herrings, we are left for the real
issue that prompted opening #3668, which is a duplicate of issues #2693
and #2694, and this patch also adds a reproducer for that. The issue is
that Cassandra 4 added support for arithmetic expressions on values,
and timestamps can be added durations, for example:

        '2011-02-03 04:05:12.345+0000' - 1d

is a valid timestamp - and we don't currently support this syntax.
So the new test - which passes on Cassandra 4 and fails on Scylla
(or Cassandra 3) is marked xfail.

Refs #2693
Refs #2694

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12436
2023-01-08 15:00:49 +02:00
Michał Chojnowski
08b3a9c786 configure: don't reduce parsers' optimization level to 1 in release
The line modified in this patch was supposed to increase the
optimization levels of parsers in debug mode to 1, because they
were too slow otherwise. But as a side effect, it also reduced the
optimization level in release mode to 1. This is not a problem
for the CQL frontend, because statement preparation is not
performance-sensitive, but it is a serious performance problem
for Alternator, where it lies in the hot path.

Fix this by only applying the -O1 to debug modes.

Fixes #12463

Closes #12460
2023-01-06 18:04:36 +02:00
Wojciech Mitros
903c4874d0 build: remove references to unused c bindings of wasmtime
Before the changes intorducing the new wasmtime bindings we relied
on an downloaded static library libwasmtime.a. Now that the bindings
are introduced, we do not rely on it anymore, so all references to
it can be removed.
2023-01-06 14:07:29 +01:00
Wojciech Mitros
996a942e05 test: assert that WASM allocations can fail without crashing
The main source of big allocations in the WASM UDF implementation
is the WASM Linear Memory. We do not want Scylla to crash even if
a memory allocation for the WASM Memory fails, so we assert that
an exception is thrown instead.

The wasmtime runtime does not actually fail on an allocation failure
(assuming the memory allocator does not abort and returns nullptr
instead - which our seastar allocator does). What happens then
depends on the failed allocation handling of the code that was
compiled to WASM. If the original code threw an exception or aborted,
the resulting WASM code will trap. To make sure that we can handle
the trap, we need to allow wasmtime to handle SIGILL signals, because
that what is used to carry information about WASM traps.

The new test uses a special WASM Memory allocator that fails after
n allocations, and the allocations include both memory growth
instructions in WASM, as well as growing memory manually using the
wasmtime API.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2023-01-06 14:07:29 +01:00
Wojciech Mitros
f05d612da8 wasm: limit memory allocated using mmap
The wasmtime runtime allocates memory for the executable code of
the WASM programs using mmap and not the seastar allocator. As
a result, the memory that Scylla actually uses becomes not only
the memory preallocated for the seastar allocator but the sum of
that and the memory allocated for executable codes by the WASM
runtime.
To keep limiting the memory used by Scylla, we measure how much
memory do the WASM programs use and if they use too much, compiled
WASM UDFs (modules) that are currently not in use are evicted to
make room.
To evict a module it is required to evict all instances of this
module (the underlying implementation of modules and instances uses
shared pointers to the executable code). For this reason, we add
reference counts to modules. Each instance using a module is a
reference. When an instance is destroyed, a reference is removed.
If all references to a module are removed, the executable code
for this module is deallocated.
The eviction of a module is actually acheved by eviction of all
its references. When we want to free memory for a new module we
repeatedly evict instances from the wasm_instance_cache using its
LRU strategy until some module loses all its instances. This
process may not succeed if the instances currently in use (so not
in the cache) use too much memory - in this case the query also
fails. Otherwise the new module is added to the tracking system.
This strategy may evict some instances unnecessarily, but evicting
modules should not happen frequently, and any more efficient
solution requires an even bigger intervention into the code.
2023-01-06 14:07:29 +01:00
Wojciech Mitros
b8d28a95bf wasm: add configuration options for instance cache and udf execution
Different users may require different limits for their UDFs. This
patch allows them to configure the size of their cache of wasm,
the maximum size of indivitual instances stored in the cache, the
time after which the instances are evicted, the fuel that all wasm
UDFs are allowed to consume before yielding (for the control of
latency), the fuel that wasm UDFs are allowed to consume in total
(to allow performing longer computations in the UDF without
detecting an infinite loop) and the hard limit of the size of UDFs
that are executed (to avoid large allocations)
2023-01-06 14:07:27 +01:00
Wojciech Mitros
3214f5c2db test: check that wasmtime functions yield
The new implementation for WASM UDFs allows executing the UDFs
in pieces. This commit adds a test asserting that the UDF is in fact
divided and that each of the execution segments takes no longer than
1ms.
2023-01-06 14:05:53 +01:00
Wojciech Mitros
3146807192 wasm: use the new rust bindings of wasmtime
This patch replaces all dependencies on the wasmtime
C++ bindings with our new ones.
The wasmtime.hh and wasm_engine.hh files are deleted.
The libwasmtime.a library is no longer required by
configure.py. The SCYLLA_ENABLE_WASMTIME macro is
removed and wasm udfs are now compiled by default
on all architectures.
In terms of implementation, most of code using
wasmtime was moved to the Rust source files. The
remaining code uses names from the new bindings
(which are mostly unchanged). Most of wasmtime objects
are now stored as a rust::Box<>, to make it compatible
with rust lifetime requirements.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2023-01-06 14:05:53 +01:00
Wojciech Mitros
50b24cf036 rust: add Wasmtime bindings
The C++ bindings provided by wasmtime are lacking a crucial
capability: asynchronous execution of the wasm functions.
This forces us to stop the execution of the function after
a short time to prevent increasing the latency. Fortunately,
this feature is implemented in the native language
of Wasmtime - Rust. Support for Rust was recently added to
scylla, so we can implement the async bindings ourselves,
which is done in this patch.

The bindings expose all the objects necessary for creating
and calling wasm functions. The majority of code implemented
in Rust is a translation of code that was previously present
in C++.

Types exported from Rust are currently required to be defined
by the  same crate that contains the bridge using them, so
wasmtime types can't be exported directly. Instead, for each
class that was supposed to be exported, a wrapper type is
created, where its first member is the wasmtime class. Note
that the members are not visible from C++ anyway, the
difference only applies to Rust code.

Aside from wasmtime types and methods, two additional types
are exported with some associated methods.
- The first one is ValVec, which is a wrapper for a rust Vec
of wasmtime Vals. The underlying vector is required by
wasmtime methods for calling wasm functions. By having it
exported we avoid multiple conversions from a Val wrapper
to a wasmtime Val, as would be required if we exported a
rust Vec of Val wrappers (the rust Vec itself does not
require wrappers if the type it contains is already wrapped)
- The second one is Fut. This class represents an computation
tha may or may not be ready. We're currently using it
to control the execution of wasm functions from C++. This
class exposes one method: resume(), which returns a bool
that signals whether the computation is finished or not.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2023-01-06 14:05:53 +01:00
Wojciech Mitros
33c97de25c rust: add build profiles more aligned with ninja modes
A cargo profile is created for each of build modes: dev, debug,
sanitize, realease and coverage. The names of cargo profiles are
prefixed by "rust-" because cargo does not allow separate "dev"
and "debug" profiles.

The main difference between profiles are their optimization levels,
they correlate to the levels used in configure.py. The debug info
is stripped only in the dev mode, and only this mode uses
"incremental" compilation to speed it up.
2023-01-06 14:05:53 +01:00
Wojciech Mitros
4d7858e66d rust: adjust build according to cxxbridge's recommendations
Currently, the rust build system in Scylla creates a separate
static library for each incuded rust package. This could cause
duplicate symbol issues when linking against multiple libraries
compiled from rust.

This issue is fixed in this patch by creating a single static library
to link against, which combines all rust packages implemented in
Scylla.

The Cargo.lock for the combined build is now tracked, so that all
users of the same scylla version also use the same versions of
imported rust modules.

Additionally, the rust package implementation and usage
docs are modified to be compatible with the build changes.

This patch also adds a new header file 'rust/cxx.hh' that contains
definitions of additional rust types available in c++.
2023-01-06 14:05:53 +01:00
Avi Kivity
eeaa475de9 tools: toolchain: dbuild: prepare for sharing cargo cache
Rust's cargo caches downloaded sources in ~/.cargo. However dbuild
won't provide access to this directory since it's outside the source
directory.

Prepare for sharing the cargo cache between the host and the dbuild
environment by:
 - Creating the cache if it doesn't already exist. This is likely if
   the user only builds in a dbuild environment.
 - Propagating the cache directory as a mounted volume.
 - Respecting the CARGO_HOME override.
2023-01-06 14:05:53 +01:00
Avi Kivity
6868dcf30b tools: toolchain: drop s390x from prepare script architecture list
It's been a long while since we built ScyllaDB for s390x, and in
fact the last time I checked it was broken on the ragel parser
generator generating bad source files for the HTTP parser. So just
drop it from the list.

I kept s390x in the architecture mapping table since it's still valid.

Closes #12455
2023-01-06 09:08:01 +02:00
Michał Radwański
1fbf433966 mutation{,_consumer,_partition}: remove consume_in_reverse::legacy_half_reverse
This commit removes consume_in_reverse::legacy_half_reverse, an option
once used to indicate that the given key ranges are sorted descending,
based on the clustering key of the start of the range, and that the
range tombstones inside partition would be sorted (descending, as all
the mutation fragments would) according to their end (but range
tombstone would still be stored according to their start bound).

As it turns out, mutation::consume, when called with legacy_half_reverse
option produces invalid fragment stream, one where all the row
tombstone changes come after all the clustering rows. This was not an
issue, since when constructing results from the query, Scylla would not
pass the tombstones to the client, but instead compact data beforehand.

In this commit, the consume_in_reverse::legacy_half_reverse is removed,
along with all the uses.

As for the swap out in mutation_partition.cc in query_mutation and
to_data_query_result:

The downstream was not prepared to deal with legacy_half_reverse.
mutation::consume contains

```
     if (reverse == consume_in_reverse::yes) {
         while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::yes>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
             co_await yield();
        }
     } else {
         while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::no>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
             co_await yield();
         }
     }
```

So why did it work at all? to_data_query_result deals with a single slice.
The used consumer (compact_for_query_v2) compacts-away the range tombstone
changes, and thus the only difference between the consume_in_reverse::no
and consume_in_reverse::yes was that one was ordered increasing wrt. ckeys
and the second one was ordered decreasing. This property is maintained if
we swap out for the consume_in_reverse::yes format.
2023-01-05 18:48:55 +01:00
Botond Dénes
2612f98a6c Merge 'Abort repair tasks' from Aleksandra Martyniuk
Aborting of repair operation is fully managed by task manager.
Repair tasks are aborted:
- on shutdown; top level repair tasks subscribe to global abort source. On shutdown all tasks are aborted recursively
- through node operations (applies to data_sync_repair_task_impls and their descendants only); data_sync_repair_task_impl subscribes to node_ops_info abort source
- with task manager api (top level tasks are abortable)
- with storage_service api and on failure; these cases were modified to be aborted the same way as the ones from above are.

Closes #12085

* github.com:scylladb/scylladb:
  repair: make top level repair tasks abortable
  repair: unify a way of aborting repair operations
  repair: delete sharded abort source from node_ops_info
  repair: delete unused node_ops_info from data_sync_repair_task_impl
  repair: delete redundant abort subscription from shard_repair_task_impl
  repair: add abort subscription to data sync task
  tasks: abort tasks on system shutdown
2023-01-05 15:21:35 +01:00
Avi Kivity
cc6010b512 Merge 'Make restore_replica_count abortable' from Benny Halevy
Similar to the way we allow aborting streaming-based
removenode, subscribe to storage_service::_abort_source
to request abort locally and pass a shared_ptr<abort_source>
to `node_ops_info`, used to abort removenode_with_repair
on shutdown.

Fixes #12429

Closes #12430

* github.com:scylladb/scylladb:
  storage_service: restore_replica_count: demote status_checker related logging to debug level
  storage_service: restore_replica_count: allow aborting removenode_with_repair
  storage_service: coroutinize restore_replica_count
  storage_service: restore_replica_count: undefer stop_status_checker
  storage_service: restore_replica_count: handle exceptions from stream_async and send_replication_notification
  storage_service: restore_replica_count: coroutinize status_checker
2023-01-05 15:21:35 +01:00
Kamil Braun
09da661eeb Merge 'raft: replace experimental raft option with dedicated flag' from Gleb Natapov
Unlike other experimental feature we want to raft to be opt in even
after it leaves experimental mode. For that we need to have a separate
option to enable it. The patch adds the binary option "consistent-cluster-management"
for that.

* 'consistent-cluster-management-flag' of github.com:scylladb/scylla-dev:
  raft: replace experimental raft option with dedicated flag
  main: move supervisor notification about group registry start where it actually starts
2023-01-05 15:21:35 +01:00
Anna Stuchlik
44e6f18d1b doc: add the new upgrade guide to the toctree and fix its name 2023-01-05 14:13:33 +01:00
Anna Stuchlik
0ad2e3e63a docs: add the upgrade guide from ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2 2023-01-05 13:30:10 +01:00
Aleksandra Martyniuk
dcb91457da api: change retrieve_status signature
Sometimes we may need task status to be nothrow move constructible.
httpd::task_manager_json::task_status does not satisfy this requirement.

retrieve_status returns future<full_task_status> instead of future<task_status>
to provide an intermediate struct with better properties. An argument
is passed by reference to prevent the necessity to copy foreign_ptr.
2023-01-05 13:28:51 +01:00
Kamil Braun
df72536fc5 Merge 'docs: add the upgrade guide for Enterprise from 2022.1 to 2022.2' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/12314

This PR adds the upgrade guide for ScyllaDB Enterprise - from version
2022.1 to 2022.2.  Using this opportunity, I've replaced "Scylla" with
"ScyllaDB" in the upgrade-enterprise index file.

In previous releases, we added several upgrade guides - one per platform
(and version). In this PR, I've merged the information for different
platforms to create one generic upgrade guide. It is similar to what
@kbr- added for the Open Source upgrade guide from 5.0 to 5.1. See
https://docs.scylladb.com/stable/upgrade/upgrade-opensource/upgrade-guide-from-5.0-to-5.1/.

Closes #12339

* github.com:scylladb/scylladb:
  docs: add the info about minor release
  docs: add the new upgade guide 2022.1 to 2022.2 to the index and the toctree
  docs: add the index file for the new upgrage guide from 2022.1 to 2022.2
  docs: add the metrics update file to the upgrade guide 2022.1 to 2022.2
  docs: add the upgrade guide for ScyllaDB Enterprise from 2022.1 to 2022.2
2023-01-04 18:07:00 +01:00
Benny Halevy
086546f575 storage_service: restore_replica_count: demote status_checker related logging to debug level
the status_checker is not the main line of business
of restore_replica_count, starting and stopping it
do nt seem to deserve info level logging, which
might have been useful in the past to debug issues
surrounding that.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-04 19:05:04 +02:00
Benny Halevy
3879ee1db8 storage_service: restore_replica_count: allow aborting removenode_with_repair
Similar to the way we allow aborting streaming-based
removenode, subscribe to storage_service::_abort_source
to request abort locally and pass a shared_ptr<abort_source>
to `node_ops_info`, used to abort removenode_with_repair
on shutdown.

Fixes #12429

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-04 19:05:04 +02:00
Benny Halevy
afece5bdc4 storage_service: coroutinize restore_replica_count
and unwrap the async thread started for streaming.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-04 19:05:04 +02:00
Benny Halevy
d1eadc39c1 storage_service: restore_replica_count: undefer stop_status_checker
Now that all exceptions in the rest of the function
are swallowed, just execute the stop_status_checker
deferred action serially before returning, on the
wau to coroutinizing restore_replica_count (since
we can't co_await status_checker inside the deferred
action).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-04 19:05:04 +02:00
Benny Halevy
788ecb738d storage_service: restore_replica_count: handle exceptions from stream_async and send_replication_notification
On the way to coroutinizing restore_replica_count,
extract awaiting stream_async and send_replication_notification
into a try/catch blocks so we can later undefer stop_status_checker.

The exception is still returned as an exceptional future
which is logged by the caller as warning.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-04 19:02:42 +02:00
Benny Halevy
b54d121dfd storage_service: restore_replica_count: coroutinize status_checker
There is no need to start a thread for the status_checker
and can be implemented using a background coroutine.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-04 19:02:20 +02:00
Botond Dénes
1d273a98b9 readers/multishard: shard_reader::close() silence read-ahead timeouts
Timouts are benign, especially on a read-ahead that turned out to be not
needed at all. They just introduce noise in the logs, so silence them.

Fixes: #12435

Closes #12441
2023-01-04 16:10:09 +02:00
Anna Stuchlik
9216b657c8 doc: fix the version in the comment on removing the note 2023-01-04 14:01:33 +01:00
Kamil Braun
4268b1bbc2 Merge 'raft: raft_group0, register RPC verbs on all shards' from Gusev Petr
raft_group0 used to register RPC verbs only on shard 0. This worked on
clusters with the same --smp setting on all nodes, since RPCs in this
case are processed on the same shard as the calling code, and
raft_group0 methods only run on shard 0.

A new test test_nodes_with_different_smp was added to identify the
problem. Since --smp can only be specified via the command line, a
corresponding parameter was added to the ManagerClient.server_add
method.  It allows to override the default parameters set by the
SCYLLA_CMDLINE_OPTIONS variable by changing, adding or deleting
individual items.

Fixes: #12252

Closes #12374

* github.com:scylladb/scylladb:
  raft: raft_group0, register RPC verbs on all shards
  raft: raft_append_entries, copy entries to the target shard
  test.py, allow to specify the node's command line in test
2023-01-04 11:11:21 +01:00
Marcin Maliszkiewicz
61a9816bad utils/rjson: enable inlining in rapidjson library
Due to lack of NDEBUG macro inlining was disabled. It's
important for parsing and printing performance.

Testing with perf_simple_query shows that it reduced around
7000 insns/op, thus increasing median tps by 4.2% for the alternator frontend.

Because inlined functions are called for every character
in json this scales with request/response size. When
default write size is increased by around 7x (from ~180 to ~ 1255
bytes) then the median tps increased by 12%.

Running:
./build/release/test/perf/perf_simple_query_g --smp 1 \
                                --alternator forbid --default-log-level error \
                                --random-seed=1235000092 --duration=60 --write

Results before the patch:

median 46011.50 tps (197.1 allocs/op,  12.1 tasks/op,  170989 insns/op,        0 errors)
median absolute deviation: 296.05
maximum: 46548.07
minimum: 42955.49

Results after the patch:

median 47974.79 tps (197.1 allocs/op,  12.1 tasks/op,  163723 insns/op,        0 errors)
median absolute deviation: 303.06
maximum: 48517.53
minimum: 44083.74

The change affects both json parsing and printing.

Closes #12440
2023-01-04 10:27:35 +02:00
Michał Jadwiszczak
83bb77b8bb test/boost/cql_query_test: enable parallelized_aggregation
Run tests for parallelized aggregation with
`enable_parallelized_aggregation` set always to true, so the tests work
even if the default value of the option is false.

Closes #12409
2023-01-04 10:11:25 +02:00
Anna Stuchlik
c4d779e447 doc: Fix https://github.com/scylladb/scylla-doc-issues/issues/854 - update the procedure to update topology strategy when nodes are on different racks
Closes #12439
2023-01-04 09:50:10 +02:00
Avi Kivity
2739ac66ed treewide: drop cql_serialization_format
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).

A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.

Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.

One test checking the 16-bit serialization format is removed.
2023-01-03 19:54:13 +02:00
Avi Kivity
654b96660a cql: modification_statement: drop protocol check for LWT
CQL protocol 1 did not support LWT, but since we don't support it
any more, we can drop the check and the supporting get_protocol_version()
helper.
2023-01-03 19:51:57 +02:00
Avi Kivity
424dbf43f3 transport: drop cql protocol versions 1 and 2
Version 3 was introduced in 2014 (Cassandra 2.1) and was supported
in the very first version of Scylla (2a7da21481 "CQL binary protocol").

Cassandra 3.0 (2015) dropped protocols 1 and 2 as well.
It's safe enough to drop it now, 9 years after introduction of v3
and 7 years after Cassandra stopped supporting it.

Dropping it allows dropping cql_serialization_format, which causes
quite a lot of pain, and is probably broken. This will be dropped in the
following patch.
2023-01-03 19:47:49 +02:00
Avi Kivity
f600ad5c1b Update seastar submodule
* seastar 3db15b5681...ca586cfb8d (28):
  > reactor: trim returned buffer to received number of bytes
  > util/process: include used header
  > build: drop unused target_include_directories()
  > build: use BUILD_IN_SOURCE instead chdir <SOURCE_DIR>
  > build: specify CMake policy CMP0135 to new
  > tests: only destroy allocated pending connections
  > build: silence the output when generating private keys
  > tests, httpd: Limit loopback connection factory sharding
  > lw_shared_ptr: Add nullptr_t comparing operators
  > noncopyable_function: Add concept for (Func func) constructor
  > reactor: add process::terminate() and process::kill()
  > Merge 'tests, include: include headers without ".." in path' from Kefu Chai
  > build: customize toolset for building Boost
  > build: use different toolset base on specified compiler
  > allocator: add an option to reserve additional memory for the OS
  > Merge 'build: pass cflags and ldflags to cooking.sh' from Kefu Chai
  > build: build static library of cryptopp
  > gate: add gate holders debugging
  > build: detect debug build of yaml-cpp also
  > build: do not use pkg_search_module(IMPORTED_TARGET) for finding yaml-cpp
  > build: bump yaml-cpp to 0.7.0 in cooking_recipe
  > build: bump cryptopp to 8.7.0 in cooking_recipe
  > build: bump boost to 1.81.0 in cooking_recipe
  > build: bump fmtlib to 9.1.0 in cooking_recipe
  > shared_ptr: add overloads for fmt::ptr()
  > chunked_fifo: const_iterator: use the base class ctor
  > build: s/URING_LIBARIES/URING_LIBRARIES/
  > build: export the full path of uring with URING_LIBRARIES

Closes #12434
2023-01-03 17:58:31 +02:00
Alejo Sanchez
889acf710c test/python: increase CQL connection timeout for...
test_ssl

In very slow debug builds the default driver timeouts are too low and
tests might fail. Bump up the values to a more reasonable time.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #12408
2023-01-03 17:10:46 +02:00
Nadav Har'El
1c96d2134f docs,alternator: link to issue about missing ACL feature
The alternator compatibility.md document mentions the missing ACL
(access control) feature, but unlike other missing features we
forgot to link to the open issue about this missing feature.
So let's add that link.

Refs #5047.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12399
2023-01-03 16:50:33 +02:00
Kamil Braun
fc57626afa Merge 'docs: remove auto_bootstrap option from the documentation' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/12318

This PR removes all occurrences of the `auto_bootstrap` option in the docs.
In most cases, I've simply removed the option name and its definition, but sometimes additional changes were necessary:
- In node-joined-without-any-data.rst, I removed the `auto_bootstrap `option as one of the causes of the problem.
- In rebuild-node.rst, I removed the first step in the procedure (enabling the `auto_bootstrap `option).
- In admin. rst, I removed the section about manual bootstrapping - it's based on setting `auto_bootstrap` to false, which is not possible now.

Closes #12419

* github.com:scylladb/scylladb:
  docs: remove the auto_bootstrap option from the admin procedures - involves removing the Manual Bootstraping section
  docs: remove the auto_bootstrap option from the procedure to replace a dead node
  docs: remove the auto_bootstrap option from the Troubleshooting article about a node joining with no data
  docs: remove the auto_bootstrap option from the procedure to rebuild a node after losing the data volume
  docs: remove the auto_bootstrap option from the procedures to create a cluster or add a DC
2023-01-03 15:44:00 +01:00
Botond Dénes
e4d5b2a373 replica/database: add disk_reads and sstables_read metrics
Tracking the current number of reads gone to disk and the current number
of sstables read by all such reads respectively.
2023-01-03 09:37:29 -05:00
Botond Dénes
2acfa950d7 sstables: wire in the reader_permit's sstable read count tracking
Hook in the relevant methods when creating and destroying sstable
readers.
2023-01-03 09:37:29 -05:00
Botond Dénes
2c0de50969 reader_concurrency_semaphore: add disk_reads and sstables_read stats
And the infrastructure to reader_permit to update them. The
infrastructure is not wired in yet.
These metrics will be used to count the number of reads gone to disk and
the number of sstables read currently respectively.
2023-01-03 09:37:29 -05:00
Botond Dénes
dcd2deb5af replica/database: fix active_reads_memory_consumption_metric
Rename to reads_memory_consumption and drop the "active" from the
description as well. This metric tracks the memory consumption of all
reads: active or inactive. We don't even currently have a way to track
the memory consumption of only active reads.
Drop the part of the description which explains the interaction with
other metrics: this part is outdated and the new interactions are much
more complicated, no way to explain in a metric description.
Also ask the semaphore to calculate the memory amount, instead of doing
it in the metric itself.
2023-01-03 09:25:47 -05:00
Petr Gusev
8417840647 raft: raft_group0, register RPC verbs on all shards
raft_group0 used to register RPC verbs only on shard 0.
This worked on clusters with the same --smp setting on
all nodes, since RPCs in this case are (usually)
processed on the same shard as the calling code,
and raft_group0 methods only run on shard 0.

A new test test_nodes_with_different_smp was added
to identify the problem.

Fixes: #12252
2023-01-03 17:04:07 +03:00
Anna Stuchlik
00ef20c3df docs: remove the auto_bootstrap option from the admin procedures - involves removing the Manual Bootstraping section 2023-01-03 14:48:01 +01:00
Anna Stuchlik
b7d62b2fc7 docs: remove the auto_bootstrap option from the procedure to replace a dead node 2023-01-03 14:47:55 +01:00
Anna Stuchlik
bc62e61df1 docs: remove the auto_bootstrap option from the Troubleshooting article about a node joining with no data 2023-01-03 14:46:38 +01:00
Anna Stuchlik
1602f27cd7 docs: remove the auto_bootstrap option from the procedure to rebuild a node after losing the data volume 2023-01-03 14:45:08 +01:00
Botond Dénes
929481ea9c replica/database: fix active_reads metric
This metric has been broken for a long time, since inactive reads were
introduced. As calculated currently, it includes all permits that passed
admission, including inactive reads. On the other hand, it excludes
permits created bypassing admission.
Fix by using the newly introduced (in this patch)
reader_concurrency_semaphore::active_reads() as the basis of this
metric: this now includes all permits (reads) that are currently active,
excluding waiters and inactive reads.
2023-01-03 08:12:25 -05:00
Petr Gusev
7725e03a09 raft: raft_append_entries, copy entries to the target shard
If append_entries RPC was received on a non-zero shard, we may
need to pass it to a zero (or, potentially, some other) shard.
The problem is that raft::append_request contains entries in the form
of raft::log_entry_ptr == lw_shared_ptr<log_entry>, which doesn't
support cross-shard reference counting. In debug mode it contains
a special ref-counting facility debug_shared_ptr_counter_type,
which resorts to on_internal_error if it detects such a case.

To solve this, we just copy log entries to the target shard if it
isn't equal to the current one. In most cases, if --smp setting
is the same on all nodes, RPC will be handled on zero shard,
so there will be no overhead.
2023-01-03 15:25:00 +03:00
Petr Gusev
1c23390f12 test.py, allow to specify the node's command line in test
An optional parameter cmdline has been added to
the ManagerClient.server_add method.
It allows you to override the default parameters
set by the SCYLLA_CMDLINE_OPTIONS variable
by changing, adding or deleting individual
items. To change or add a parameter just specify
its name and value one after the other.
To remove parameter use the special keyword
__remove__ as a value. To set a parameter
without a value (such as --overprovisioned)
use the special keyword __missing__ as the value.
2023-01-03 15:24:54 +03:00
Nadav Har'El
eb85f136c8 cql-pytest: document how to write new cql-pytest tests
Add to test/cql-pytest/README.md an explanation of the philosophy
of the cql-pytest test suite, and some guideliness on how to write
good tests in that framework.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12400
2023-01-03 12:13:22 +02:00
Anna Stuchlik
994bc33147 docs: fix the command on the Manager-Monitoring Integration troubleshooting page
Closes #12375
2023-01-03 11:41:16 +02:00
Anna Stuchlik
9d17d812c0 docs: Fix https://github.com/scylladb/scylla-doc-issues/issues/870, update the nodetool rebuild command
Closes #12416
2023-01-03 11:40:40 +02:00
Gleb Natapov
1688163233 raft: replace experimental raft option with dedicated flag
Unlike other experimental feature we want to raft to be optional even
after it leaves experimental mode. For that we need to have a separate
option to enable it. The patch adds the binary option "consistent-cluster-management"
for that.
2023-01-03 11:15:11 +02:00
Gleb Natapov
29060cc235 main: move supervisor notification about group registry start where it actually starts
99fe580068 moved raft_group_registry::start call a bit later, but
forget to move supervisor notification call. Do it now.
2023-01-03 11:09:30 +02:00
Botond Dénes
2ef71e9c70 Merge 'Improve verbosity of task manager api' from Aleksandra Martyniuk
The PR introduces changes to task manager api:
- extends tasks' list returned with get_tasks with task type,
   keyspace, table, entity, and sequence number
- extends status returned with get_task_status and wait_task
   with a list of children's ids

Closes #12338

* github.com:scylladb/scylladb:
  api: extend status in task manager api
  api: extend get_tasks in task manager api
2023-01-03 10:39:41 +02:00
Botond Dénes
82101b786d Merge 'docs: document scylla-api-client' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/11999.

This PR adds a description of scylla-api-cli.

Closes #12392

* github.com:scylladb/scylladb:
  docs: fix the description of the system log POST example
  docs: uptate the curl tool name
  docs: describe how to use the scylla-api-client tool
  docs: fix the scylla-api-client tool name
  docs: document scylla-api-cli
2023-01-03 10:30:04 +02:00
Benny Halevy
63c2cdafe8 sstables: index_reader: close(index_bound&) reset current_list
When closing _lower_bound and *_upper_bound
in the final close() call, they are currently left with
an engaged current_list member.

If the index_reader uses a _local_index_cache,
it is evicted with evict_gently which will, rightfully,
see the respective pages as referenced, and they won't be
evicted gently (only later when the index_reader is destroyed).

Reset index_bound.current_list on close(index_bound&)
to free up the reference.

Ref #12271

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12370
2023-01-02 16:42:33 +01:00
Avi Kivity
767b7be8be Merge 'Get rid of handle_state_replacing' from Benny Halevy
Since [repair: Always use run_replace_ops](2ec1f719de), nodes no longer publish HIBERNATE state so we don't need to support handling it.

Replace is now always done using node operations (using repair or streaming).
so nodes are never expected to change status to HIBERNATE.

Therefore storage_service:handle_state_replacing is not needed anymore.

This series gets rid of it and updates documentation related to STATUS:HIBERNATE respectively.

Fixes #12330

Closes #12349

* github.com:scylladb/scylladb:
  docs: replace-dead-node: get rid of hibernate status
  storage_service: get rid of handle_state_replacing
2023-01-02 13:35:29 +02:00
Gleb Natapov
28952d32ff storage_service: move leave_ring outside of unbootstrap()
We want to reuse the later without the call.

Message-Id: <20221228144944.3299711-17-gleb@scylladb.com>
2023-01-02 12:03:29 +02:00
Gleb Natapov
229cef136d raft: add trace logging to raft::server::start
Allows to see initial state of the server during start.

Message-Id: <20221228144944.3299711-15-gleb@scylladb.com>
2023-01-02 11:57:53 +02:00
Gleb Natapov
96453ff75f service: raft: improve group0_state_machine::apply logging
Trace how many entries are applied as well.

Message-Id: <20221228144944.3299711-14-gleb@scylladb.com>
2023-01-02 11:57:16 +02:00
Gleb Natapov
dbd5b97201 storage_service: improve logging in update_pending_ranges() function
We pass the reason for the change. Log it as well.

Message-Id: <20221228144944.3299711-11-gleb@scylladb.com>
2023-01-02 11:54:03 +02:00
Gleb Natapov
04ab673359 messaging: check that a node knows its own topology before accessing it
We already check is remote's node topology is missing before creating a
connection, but local node topology can be missing too when we will use
raft to manage it. Raft needs to be able to create connections before
topology is knows.

Message-Id: <20221228144944.3299711-7-gleb@scylladb.com>
2023-01-02 11:53:14 +02:00
Gleb Natapov
6f104982e1 topology: use std::erase_if on std::map instead of ad-hoc loop
There is std::erase_if since c++20. We can use it here.

Message-Id: <20221228144944.3299711-6-gleb@scylladb.com>
2023-01-02 11:45:52 +02:00
Gleb Natapov
84eb5924ac system_keyspace: remove redundant include
storage_proxy.hh is included twice

Message-Id: <20221228144944.3299711-4-gleb@scylladb.com>
2023-01-02 11:39:22 +02:00
Gleb Natapov
5182543df2 raft: fix typo in read_barrier logging
The log logs applied index not append one.

Message-Id: <20221228144944.3299711-3-gleb@scylladb.com>
2023-01-02 11:38:47 +02:00
Gleb Natapov
5a96751534 storage_service: remove start_leaving since it is no longer used
Message-Id: <20221228144944.3299711-2-gleb@scylladb.com>
2023-01-02 11:37:48 +02:00
Raphael S. Carvalho
b4e4bbd64a database_test: Reduce x_log2_compaction_group values to avoid timeout
database_test in timing out because it's having to run the tests calling
do_with_cql_env_and_compaction_groups 3x, one for each compaction group
setting. reduce it to 2 settings instead of 3 if running in debug mode.

Refs #12396.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12421
2023-01-01 13:56:18 +02:00
Raphael S. Carvalho' via ScyllaDB development
a7c4a129cb sstables: Bump row_reads metrics for mx version
Metric was always 0 despite a row was processed by mx reader.

Fixes #12406.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20221227220202.295790-1-raphaelsc@scylladb.com>
2022-12-30 18:38:30 +01:00
Anna Stuchlik
601aeb924a docs: remove the auto_bootstrap option from the procedures to create a cluster or add a DC 2022-12-30 13:10:06 +01:00
Anna Stuchlik
705b347d36 doc: extend the information about the recommended RF on the Tracing page 2022-12-30 11:30:20 +01:00
Avi Kivity
8635d24424 build: drop abseil submodule, replace with distribution abseil
This lets us carry fewer things and rely on the distribution
for maintenance.

The frozen toolchain is updated. Incidental updates include clang 15.0.6,
and pytest that doesn't need workarounds.

Closes #12397
2022-12-28 19:02:23 +02:00
Avi Kivity
eced91b575 Revert "view: coroutinize maybe_mark_view_as_built"
This reverts commit ac2e2f8883. It causes
a regression ("std::bad_variant_access in load_view_build_progress").

Commit 2978052113 (a reindent) is also reverted as part of
the process.

Fixes #12395
2022-12-28 15:36:05 +02:00
Anna Stuchlik
6d70665185 doc: extend the information on removing an unavailable node 2022-12-28 13:19:58 +01:00
Anna Stuchlik
f95c6423c1 docs: extend the warning on the Remove a Node page 2022-12-28 13:16:36 +01:00
Nadav Har'El
200bc82913 test/cql-pytest: exit immediately if Scylla is down
In commit acfa180766 we added to
test/cql-pytest a mechanism to detect when Scylla crashes in the middle
of a test function - in which case we report the culprit test and exit
immediately to avoid having a hundred more tests report that they failed
as well just because Scylla was down.

However, if Scylla was *never* up - e.g., if the user ran "pytest" without
ever running Scylla -  we still report hundreds of tests as having failed,
which is confusing and not helpful.

So with this patch, if a connection cannot be made to Scylla at all,
the test exits immediately, explaining what went wrong, not blaming
any specific test:

    $ pytest
    ...
    ! _pytest.outcomes.Exit: Cannot connect to Scylla at --host=localhost --port=9042 !
    ============================ no tests ran in 0.55s =============================

Beyond being a helpful reminder for a developer who runs "pytest" without
having started Scylla first (or using test/cql-pytest/run or test.py to
start Scylla easily), this patch is also important when running tests
through test.py if it reuses an instance of Scylla that crashed during an
earlier pytest file's run.

This patch does not fix test.py - it can still try to run pytest with
a dead Scylla server without checking. But at least with this patch
pytest will notice this problem immediately and won't report hundreds of
test functions having failed. The only report the user will see will be
the last test which crashed Scylla, which will make it easier to find
this failure without being hidden between hundreds of spurious failures.

Fixes #12360

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12401
2022-12-28 13:04:28 +02:00
Anna Stuchlik
d0db1a27c3 docs: fix the description of the system log POST example 2022-12-28 11:25:54 +01:00
Anna Stuchlik
b7ec99b10b docs: uptate the curl tool name 2022-12-28 10:33:07 +01:00
Asias He
b9e5e340aa streaming: Enable offstrategy for all classic streaming based node ops
This patch enables offstrategy compaction for all classic streaming
based node ops. We can use this method because tables are streamed one
after another. As long as there is still streamed data for a given
table, we update the automatic trigger timer. When all the streaming has
finished, the trigger timer will timeout and fire the offstrategy
compaction for the given table.

I checked with this patch, rebuild is 3X faster. There was no compaction
in the middle of the streaming. The streamed sstables are compacted
together after streaming is done.

Time Before:
INFO  2022-11-25 10:06:08,213 [shard 0] range_streamer - Rebuild
succeeded, took 67 seconds, nr_ranges_remaining=0

Time After:
INFO  2022-11-25 09:42:50,943 [shard 0] range_streamer - Rebuild
succeeded, took 23 seconds, nr_ranges_remaining=0

Compaciton Before:
88 sstables were written -> 88 sstables were added into main set

Compaction After:
88 sstables written ->  after offstretegy 2 sstables were added into main seet

Closes #11848
2022-12-28 11:12:02 +02:00
Michał Chojnowski
5e79d6b30b tasks: task_manager: move invoke_on_task<> to .hh
invoke_on_task is used in translation units where its definition is not
visible, yet it has no explicit instantiations. If the compiler always
decides to inline the definition, not to instantiate it implicitly,
linking invoke_on_task will fail. (It happened to me when I turned up
inline-threshold). Fix that.

Closes #12387
2022-12-28 10:55:43 +02:00
Alejo Sanchez
d408b711e3 test/python: increase CQL connection timeouts
In very slow debug builds the default driver timeouts are too low and
tests might fail. Bump up the values to more reasonable time.

These timeout values are the same as used in topology tests.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #12405
2022-12-28 10:06:33 +02:00
Anna Stuchlik
39ade2f5a5 docs: describe how to use the scylla-api-client tool 2022-12-27 14:46:16 +01:00
Anna Stuchlik
2789501023 docs: fix the scylla-api-client tool name 2022-12-27 14:28:27 +01:00
Alejo Sanchez
1bfe234133 test/pylib: API get/set logger level of Scylla server
Provide helpers to get and set logger level for Scylla servers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #12394
2022-12-25 13:58:43 +02:00
Anna Stuchlik
ea7e23bf92 docs: fix the option name from compaction to compression on the Data Definition page
Fixes the option name in the "Other table options" table on the Data Definition page.

Fixes #12334

Closes #12382
2022-12-25 11:24:56 +02:00
Botond Dénes
b0d95948e1 mutation_compactor: reset stop flag on page start
When the mutation compactor has all the rows it needs for a page, it
saves the decision to stop in a member flag: _stop.
For single partition queries, the mutation compactor is kept alive
across pages and so it has a method, start_new_page() to reset its state
for the next page. This method didn't clear the _stop flag. This meant
that the value set at the end of the previous could cause the new page
and subsequently the entire query to be stopped prematurely.
This can happen if the new page starts with a row that is covered by a
higher level tombstone and is completely empty after compaction.
Reset the _stop flag in start_new_page() to prevent this.

This commit also adds a unit test which reproduces the bug.

Fixes: #12361

Closes #12384
2022-12-24 13:52:45 +02:00
Takuya ASADA
642d035067 docker: prevent hostname -i failure when server address is specified
On some docker instance configuration, hostname resolution does not
work, so our script will fail on startup because we use hostname -i to
construct cqlshrc.
To prevent the error, we can use --rpc-address or --listen-address
for the address since it should be same.

Fixes #12011

Closes #12115
2022-12-24 13:52:16 +02:00
Asias He
d819d98e78 storage_service: Ignore dropped table for repair_updater
In case a table is dropped, we should ignore it in the repair_updater,
since we can not update off strategy trigger for a dropped table.

Refs #12373

Closes #12388
2022-12-24 13:48:25 +02:00
Raphael S. Carvalho
67ebd70e6e compaction_manager: Fix reactor stalls during periodic submissions
Every 1 hour, compaction manager will submit all registered table_state
for a regular compaction attempt, all without yielding.

This can potentially cause a reactor stall if there are 1000s of table
states, as compaction strategy heuristics will run on behalf of each,
and processing all buckets and picking the best one is not cheap.
This problem can be magnified with compaction groups, as each group
is represented by a table state.

This might appear in dashboard as periodic stalls, every 1h, misleading
the investigator into believing that the problem is caused by a
chronological job.

This is fixed by piggybacking on compaction reevaluation loop which
can yield between each submission attempt if needed.

Fixes #12390.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12391
2022-12-24 13:43:16 +02:00
Anna Stuchlik
74fd776751 docs: document scylla-api-cli 2022-12-23 11:27:37 +01:00
Benny Halevy
8797958dfc schema: operator<<: print also tombstone_gc_options
They are currently missing from the printout
when the a table is created, but they are determinal
to understanding the mode with which tombstones are to
be garbage-collected in the table.  gcGraceSeconds alone
is no longer enough since the introduction of
tombstone_gc_option in a8ad385ecd.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12381
2022-12-22 16:40:18 +02:00
Anna Stuchlik
7e8977bf2d docs: add the info about minor release 2022-12-22 10:26:33 +01:00
Nadav Har'El
ef2e5675ed materialized views, test: add tests for CLUSTERING ORDER BY
In issue #10767, concerned were raised that the CLUSTERING ORDER BY
clause is handled incorrectly in a CREATE MATERIALIZED VIEW definition.

The tests in this patch try to explore the different ways in which
CLUSTERING ORDER BY can be used in CREATE MATERIALIZED VIEW and allows
us to compare Scylla's behaivor to Cassandra, and to common sense.

The tests discover that the CLUSTERING ORDER BY feature in materialized
views generally works as expected, but there are *three* differences
between Scylla and Cassandra in this feature. We consider two differences
to be bugs (and hence the test is marked xfail) and one a Scylla extension:

1. When a base table has a reverse-order clustering column and this
   clustering column is used in the materialized view, in Cassandra
   the view's clustering order inherits the reversed order. In Scylla,
   the view's clustering order reverts to the default order.
   Arguably, both behaviors can be justified, but usually when in doubt
   we should implement Cassandra's behavior - not pick a different
   behavior, even if the different behavior is also reasonable. So
   this test (test_mv_inherit_clustering_order()) is marked "xfail",
   and a new issue was created about this difference: #12308.

   If we want to fix this behavior to match Cassandra's we should also
   consider backward compatibility - what happens if we change this
   behavior in Scylla now, after we had the opposite behavior in
   previous releases? We may choose to enshrine Scylla's Cassandra-
   incompatible behavior here - and document this difference.

2. The CLUSTERING ORDER BY should, as its name suggests, only list
   clustering columns. In Scylla, specifying other things, like regular
   columns, partition-key columns, or non-existent columns, is silently
   ignored, whereas it should result in an Invalid Request error (as it
   does in Cassandra). So test_mv_override_clustering_order_error()
   is marked "xfail".
   This is the difference already discovered in #10767.

3. When a materialized view has several clustering columns, Cassandra
   requires that a CLUSTERING ORDER BY clause, if present, must specify
   the order of all of *all* clustering columns. Scylla, in contrast,
   allows the user to override the order of only *some* of these columns -
   and the rest get the default order. I consider this to be a
   legitimate Scylla extension, and not a compatibility bug, so marked
   the test with "scylla_only", and no issue was opened about it.

Refs #10767
Refs #12308

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12307
2022-12-22 09:48:16 +02:00
Nadav Har'El
6d2e146aa6 test/cql-pytest.py: add scylla_inject_error() utility
This patch adds a scylla_inject_error(), a context manager which tests
can use to temporarily enable some error injection while some test
code is running. It can be used to write tests that artificially
inject certain errors instead of trying to reach the elaborate (and
often requiring precise timing or high amounts of data) situation where
they occur naturally.

The error-injection API is Scylla-specific (it uses the Scylla REST API)
and does not work on "release"-mode builds (all other modes are supported),
so when Cassandra or release-mode build are being tested, the test which
uses scylla_inject_error() gets skipped.

Example usage:

```python
    from rest_api import scylla_inject_error
    with scylla_inject_error(cql, "injection_name", one_shot=True):
        # do something here
        ...
```

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12264
2022-12-22 09:39:10 +02:00
Nadav Har'El
01f0644b22 Merge 'scylla-gdb.py: introduce scylla get-config-value' from Botond Dénes
Retrieves the configuration item with the given name and prints its
value as well as its metadata.
Example:

    (gdb) scylla get-config-value compaction_static_shares
    value: 100, type: "float", source: SettingsFile, status: Used, live: MustRestart

Closes #12362

* github.com:scylladb/scylladb:
  scylla-gdb.py: add scylla get-config-value gdb command
  scylla-gdb.py: extract $downcast_vptr logic to standalone method
  test: scylla-gdb/run: improve diagnostics for failed tests
2022-12-21 18:38:23 +02:00
Aleksandra Martyniuk
599fce16cf repair: make top level repair tasks abortable 2022-12-21 11:52:58 +01:00
Aleksandra Martyniuk
e77de463e4 repair: unify a way of aborting repair operations 2022-12-21 11:52:53 +01:00
Aleksandra Martyniuk
f56e886127 repair: delete sharded abort source from node_ops_info
Sharded abort source in node_ops_info is no longer needed since
its functionality is provided by task manager's tasks structure.
2022-12-21 11:37:03 +01:00
Aleksandra Martyniuk
18efe0a4e8 repair: delete unused node_ops_info from data_sync_repair_task_impl 2022-12-21 11:28:30 +01:00
Aleksandra Martyniuk
ee13a5dde8 api: extend status in task manager api
Status of tasks returned with get_task_status and wait_task is extended
with the list of ids of child tasks.
2022-12-21 10:54:56 +01:00
Aleksandra Martyniuk
697af4ccf2 api: extend get_tasks in task manager api
Each task stats in a list returned from tm::get_task api call
is extended with info about: task type, keyspace, table, entity,
and sequence number.
2022-12-21 10:54:50 +01:00
Michał Chojnowski
19049150ef configure.py: remove --static, --pie, --so
These options have been nonsense since 2017.
--pie and --so are ignored, --static disables (sic!) static linking of
libraries.
Remove them.

Closes #12366
2022-12-21 11:01:56 +02:00
Botond Dénes
29d49e829e scylla-gdb.py: add scylla get-config-value gdb command
Retrieves the configuration item with the given name and prints its
value as well as its metadata.
Example:
    (gdb) scylla get-config-value compaction_static_shares
    value: 100, type: "float", source: SettingsFile, status: Used, live: MustRestart
2022-12-21 03:05:56 -05:00
Botond Dénes
0cdb89868a scylla-gdb.py: extract $downcast_vptr logic to standalone method
So it can be reused by regular python code.
2022-12-21 03:05:56 -05:00
Botond Dénes
24022c19a6 test: scylla-gdb/run: improve diagnostics for failed tests
By instructing gdb to print the full python stack in case of errors.
2022-12-21 03:05:56 -05:00
Michał Chojnowski
d9269abf5b sstables: index_reader: always evict the local cache gently
Due to an oversight, the local index cache isn't evicted gently
when _upper_bound existed. This is a source of reactor stalls.
Fix that.

Fixes #12271

Closes #12364
2022-12-20 18:23:27 +02:00
Michał Radwański
e7fbcd6c9d mutation_partition_view: treat query::partition_slice::option::reversed in to_data_query_result as consume_in_reverse::yes
The consume_in_reverse::legacy_half_reverse format is soon to be phased
out. This commit starts treating frozen_mutations from replicas for
reversed queries so that they are consumed with consume_in_reverse::yes.
2022-12-20 17:05:02 +01:00
Benny Halevy
1adb2bff18 mutation: move consume_in_reverse def to mutation_consumer.hh
To be used also by frozen_mutation consumer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-20 16:23:10 +01:00
Avi Kivity
bb731b4f52 Merge 'docs: move documentation of tools online' from Botond Dénes
Currently the scylla tools (`scylla-types` and `scylla-sstable`) have documentation in two places: high level documentation can be found at `docs/operating-scylla/admin-tools/scylla-{types,sstable}.rst`, while low level, more detailed documentation is embedded in the tool itself. This is especially pronounced for `scylla-sstable`, which only has a short description of its operations online, all details being found only in the command-line help.
We want to move away from this model, such that all documentation can be found online, with the command-line help being reserved to documenting how the various switches and flags work, on top of a short description of the operation and a link to the detailed online docs.

Closes #12284

* github.com:scylladb/scylladb:
  tool/scylla-sstable: move documentation online
  docs: scylla-sstable.rst: add sstable content section
  docs: scylla-{sstable,types}.rst: drop Syntax section
2022-12-20 17:04:47 +02:00
Avi Kivity
3fce43124a Merge 'Static compaction groups' from Raphael "Raph" Carvalho
Allows static configuration of number of compaction groups per table per shard.

To bootstrap the project, config option x_log2_compaction_groups was added which controls both number of groups and partitioning within a shard.

With a value of 0 (default), it means 1 compaction group, therefore all tokens go there.
With a value of 3, it means 8 compaction groups, and 3 most-significant-bits of tokens being used to decide which group owns the token.
And so on.

It's still missing:
- integration with repair / streaming
- integration with reshard / reshape.

perf/perf_simple_query --smp 1 --memory 1G

BEFORE
-----
median 61358.55 tps ( 71.1 allocs/op,  12.2 tasks/op,   56375 insns/op,        0 errors)
median 61322.80 tps ( 71.1 allocs/op,  12.2 tasks/op,   56391 insns/op,        0 errors)
median 61058.58 tps ( 71.1 allocs/op,  12.2 tasks/op,   56386 insns/op,        0 errors)
median 61040.94 tps ( 71.1 allocs/op,  12.2 tasks/op,   56381 insns/op,        0 errors)
median 61118.40 tps ( 71.1 allocs/op,  12.2 tasks/op,   56379 insns/op,        0 errors)

AFTER
-----
median 61656.12 tps ( 71.1 allocs/op,  12.2 tasks/op,   56486 insns/op,        0 errors)
median 61483.29 tps ( 71.1 allocs/op,  12.2 tasks/op,   56495 insns/op,        0 errors)
median 61638.05 tps ( 71.1 allocs/op,  12.2 tasks/op,   56494 insns/op,        0 errors)
median 61726.09 tps ( 71.1 allocs/op,  12.2 tasks/op,   56509 insns/op,        0 errors)
median 61537.55 tps ( 71.1 allocs/op,  12.2 tasks/op,   56491 insns/op,        0 errors)

Closes #12139

* github.com:scylladb/scylladb:
  test: mutation_test: Test multiple compaction groups
  test: database_test: Test multiple compaction groups
  test: database_test: Adapt it to compaction groups
  db: Add config for setting static number of compaction groups
  replica: Introduce static compaction groups
  test: sstable_test: Stop referencing single compaction group
  api: compaction_manager: Stop a compaction type for all groups
  api: Estimate pending tasks on all compaction groups
  api: storage_service: Run maintenance compactions on all compaction groups
  replica: table: Adapt assertion to compaction groups
  replica: database: stop and disable compaction on behalf of all groups
  replica: Introduce table::parallel_foreach_table_state()
  replica: disable auto compaction on behalf of all groups
  replica: table: Rework compaction triggers for compaction groups
  replica: Adapt table::get_sstables_including_compacted_undeleted() to compaction groups
  replica: Adapt table::rebuild_statistics() to compaction groups
  replica: table: Perform major compaction on behalf of all groups
  replica: table: Perform off-strategy compaction on behalf of all groups
  replica: table: Perform cleanup compaction on behalf of all groups
  replica: Extend table::discard_sstables() to operate on all compaction groups
  replica: table: Create compound sstable set for all groups
  replica: table: Set compaction strategy on behalf of all groups
  replica: table: Return min memtable timestamp across all groups
  replica: Adapt table::stop() to compaction groups
  replica: Adapt table::clear() to compaction groups
  replica: Adapt table::can_flush() to compaction groups
  replica: Adapt table::flush() to compaction groups
  replica: Introduce parallel_foreach_compaction_group()
  replica: Adapt table::set_schema() to compaction groups
  replica: Add memtables from all compaction groups for reads
  replica: Add memtable_count() method to compaction_group
  replica: table: Reserve reader list capacity through a callback
  replica: Extract addition of memtables to reader list into a new function
  replica: Adapt table::occupancy() to compaction groups
  replica: Adapt table::active_memtable() to compaction groups
  replica: Introduce table::compaction_groups()
  replica: Preparation for multiple compaction groups
  scylla-gdb: Fix backward compatibility of scylla_memtables command
2022-12-20 17:04:47 +02:00
Avi Kivity
623be22d25 Merge 'sstables: allow bypassing min max position metadata loading' from Botond Dénes
Said mechanism broke tools and tests to some extent: the read it executes on sstable load time means that if the sstable is broken enough to fail this read, it will fail to load, preventing diagnostic tools to load it and examine it and preventing tests from producing broken sstables for testing purposes.

Closes #12359

* github.com:scylladb/scylladb:
  sstables: allow bypassing first/last position metadata loading
  sstables: sstable::{load,open_data}(): fix indentation
  sstables: coroutinize sstable::open_data()
  sstables: sstable::open_data(): use clear_gently() to clear token ranges
  sstables: coroutinize sstable::load()
2022-12-20 17:04:47 +02:00
Aleksandra Martyniuk
60e298fda1 repair: change utils::UUID to node_ops_id
Type of the id of node operations is changed from utils::UUID
to node_ops_id. This way the id of node operations would be easily
distinguished from the ids of other entities.

Closes #11673
2022-12-20 17:04:47 +02:00
Avi Kivity
88a1fbd72f Update seastar submodule
* seastar 3a5db04197...3db15b5681 (27):
  > build: get the full path of c-ares
  > build: unbreak pkgconfig output
  > http: Add 206 Partial Content response code
  > http: Carry integer content_length on reply
  > tls_test: drop duplicated includes
  > tls_test: remove duplicated test case
  > reactor: define __NR_pidfd_open if not defined
  > sockets: Wait on socket peer closing the connection
  > tcp: Close connection when getting RST from server
  > Merge 'Enhance rpc tester with delays, timeouts and verbosity' from Pavel Emelyanov
  > Merge 'build: use pkg_search_module(.. IMPORTED_TARGET ..) ' from Kefu Chai
  > build: define GnuTLS_{LIBRARIES, INCLUDE_DIRS} only if GnuTLS is found
  > build: use pkg_search_module(.. IMPORTED_TARGET ..)
  > addr2line: extend asan regex
  > abort_source: move-assign operator: call base class unlink
  > coroutine: correct syntax error in doxygen comment
  > demo: Extend http connection demo with https
  > test: temporarily disable warning for tests triggering warnings
  > tests/unit/coroutine: Include <ranges>
  > sstring: Document why sstring exists at all
  > test: log error when read/write to pipe fails
  > test: use executables in /bin
  > tests: spawn_test: use BOOST_CHECK_EQUAL() for checking equality of temporary_buffer
  > docker: bump up to clang {14,15} and gcc {11,12}
  > shared_ptr: ignore false alarm from GCC-12
  > build: check for fix of CWG2631
  > circleci: use versioned container image

Closes #12355
2022-12-20 17:04:47 +02:00
Botond Dénes
3c8949d34c sstables: allow bypassing first/last position metadata loading
When loading an sstable. Tests and tools might want to do this to be
able to load a damaged sstable to do tests/diagnostics on it.
2022-12-20 01:45:38 -05:00
Botond Dénes
bba956c13c sstables: sstable::{load,open_data}(): fix indentation 2022-12-20 01:45:38 -05:00
Botond Dénes
c85ff7945d sstables: coroutinize sstable::open_data()
Used once when sstable is opened on startup, not performance sensitive.
2022-12-20 01:45:38 -05:00
Botond Dénes
15966a0b1b sstables: sstable::open_data(): use clear_gently() to clear token ranges
Instead of an open-coded loop. It also makes the code easier to
coroutinize (next patch).
2022-12-20 01:45:22 -05:00
Nadav Har'El
08c8e0d282 test/alternator: enable tests for long strings of consecutive tombstones
In the past we had issue #7933 where very long strings of consecutive
tombstones caused Alternator's paging to take an unbounded amount of
time and/or memory for a single page. This issue was fixed (by commit
e9cbc9ee85) but the two tests we had
reproducing that issue were left with the "xfail" mark.
They were also marked "veryslow" - each taking about 100 seconds - so
they didn't run by default so nobody noticed they started to pass.

In this patch I make these tests much faster (taking less than a second
together), confirm that they pass - and remove the "xfail" mark and
improve their descriptions.

The trick to making these tests faster is to not create a million
tombstones like we used to: We now know that after string of just 10,000
tombstones ('query_tombstone_page_limit') the page should end, so
we can check specifically this number. The story is more complicated for
partition tombstones, but there too it should be a multiple of
query_tombstone_page_limit. To make the tests even faster, we change
run.py to lower the query_tombstone_page_limit from the default 10,000
to 1000. The tests work correctly even without this change, but they are
ten times faster with it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12350
2022-12-20 07:08:36 +02:00
Botond Dénes
94f3fb341f Merge 'Fix nix devenv' from Michael Livshin
* Update Nixpkgs base

* Clarify some comments

* Get rid of custom-packaged cxxbridge (it's now present in Nixpkgs as
  cxx-rs)

* Add missing libraries (libdeflate, libxcrypt)

* Fix expected hash of the gdb patch

* Fix a couple of small build problems

Fixes #12259

Closes #12346

* github.com:scylladb/scylladb:
  build: fix Nix devenv
  cql3: mark several private fields as maybe_unused
  configure.py: link with more abseil libs
2022-12-20 07:01:06 +02:00
Michael Livshin
7c383c6249 build: fix Nix devenv
* Update Nixpkgs base

* Clarify some comments

* Get rid of custom-packaged cxxbridge (it's now present in Nixpkgs as
  cxx-rs)

* Add missing libraries (libdeflate, libxcrypt)

* Fix expected hash of the gdb patch

* Bump Python driver to 3.25.20-scylla

Fixes #12259
2022-12-19 20:53:07 +02:00
Michael Livshin
4407828766 cql3: mark several private fields as maybe_unused
Because they are indeed unused -- they are initialized, passed down
through some layers, but not actually used.  No idea why only Clang 12
in debug mode in Nix devenv complains about it, though.
2022-12-19 20:53:07 +02:00
Michael Livshin
c0c8afb79e configure.py: link with more abseil libs
Specifically libabsl_strings{,_internal}.a.

This fixes failure to link tests in the Nix devenv; since presumably
all is good in other setups, it must be something weird having to do
with inlining?

The extra linked libraries shouldn't hurt in any case.
2022-12-19 20:53:07 +02:00
Raphael S. Carvalho
e7380bea65 test: mutation_test: Test multiple compaction groups
Extends mutation_test to run the tests with more than one
compaction group, in addition to a single one (default).

Piggyback on existing tests. Avoids duplication.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 12:36:07 -03:00
Raphael S. Carvalho
e3e7c3c7e5 test: database_test: Test multiple compaction groups
Extends database_test to run the tests with more than one
compaction group, in addition to a single one (default).

Piggyback on existing tests. Avoids duplication.

Caught a bug when snapshotting, in implementation of
table::can_flush(), showing its usefulness.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 12:36:07 -03:00
Raphael S. Carvalho
e103e41c76 test: database_test: Adapt it to compaction groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 12:36:05 -03:00
Aleksandra Martyniuk
be529cc209 repair: delete redundant abort subscription from shard_repair_task_impl
data_sync_repair_task_impl subscribes to corresponding node_ops_info
abort source and then, when requested, all its descedants are
aborted recursively. Thus, shard_repair_task_impl does not need
to subscribe to the node_ops_info abort source, since the parent
task will take care of aborting once it is requested.

abort_subscription and connected attributes are deleted from
the shard_repair_task_impl.
2022-12-19 16:07:28 +01:00
Aleksandra Martyniuk
e48ca62390 repair: add abort subscription to data sync task
When node operation is aborted, same should happen with
the corresponding task manager's repair task.

Subscribe data_sync_repair_task_impl abort() to node_ops_info
abort_source.
2022-12-19 15:57:35 +01:00
Aleksandra Martyniuk
2b35d7df1b tasks: abort tasks on system shutdown
When system shutdowns, all task manager's top level tasks are aborted.
Responsibility for aborting child tasks is on their parents.
2022-12-19 15:57:35 +01:00
Botond Dénes
827cd0d37b sstables: coroutinize sstable::load()
It nicely simplified by it. No regression expected, this method is
supposedly only used by tests and tools.
2022-12-19 09:33:52 -05:00
Raphael S. Carvalho
d9ab59043e db: Add config for setting static number of compaction groups
This new option allows user to control the number of compaction groups
per table per shard. It's 0 by default which implies a single compaction
group, as is today.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:24 -03:00
Raphael S. Carvalho
9cf4dc7b62 replica: Introduce static compaction groups
This is the initial support for multiple groups.

_x_log2_compaction_groups controls the number of compaction groups
and the partitioning strategy within a single table.

The value in _x_log2_compaction_groups refers to log base 2 of the
actual number of groups.

0 means 1 compaction group.
1 means 2 groups and 2 most significant bits of token being
used to pick the target group.

The group partitioner should be later abstracted for making tablet
integration easier in the future.

_x_log2_compaction_groups is still a constant but a config option
will come next.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:23 -03:00
Raphael S. Carvalho
c807e61715 test: sstable_test: Stop referencing single compaction group
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:20 -03:00
Raphael S. Carvalho
254c38c4d2 api: compaction_manager: Stop a compaction type for all groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:19 -03:00
Raphael S. Carvalho
4e836cb96c api: Estimate pending tasks on all compaction groups
Estimates # of compaction jobs to be performed on a table.
Adaptation is done by adding estimation from all groups.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:17 -03:00
Raphael S. Carvalho
640436e72a api: storage_service: Run maintenance compactions on all compaction groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:15 -03:00
Raphael S. Carvalho
e0c5cbee8d replica: table: Adapt assertion to compaction groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:13 -03:00
Raphael S. Carvalho
d35cf88f09 replica: database: stop and disable compaction on behalf of all groups
With compaction group model, truncate_table_on_all_shards() needs
to stop and disable compaction for all groups.
replica::table::as_table_state() will be removed once no user
remains, as each table may map to multiple groups.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:12 -03:00
Raphael S. Carvalho
50b02ee0bd replica: Introduce table::parallel_foreach_table_state()
This will replace table::as_table_state(). The latter will be
killed once its usage drops to zero.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:10 -03:00
Raphael S. Carvalho
fd69bd433e replica: disable auto compaction on behalf of all groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:08 -03:00
Raphael S. Carvalho
6fefbe5706 replica: table: Rework compaction triggers for compaction groups
Allow table-wide compaction trigger, as well as fine-grained trigger
like after flushing a memtable on behalf of a single group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:07 -03:00
Raphael S. Carvalho
6a6adea3ab replica: Adapt table::get_sstables_including_compacted_undeleted() to compaction groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:05 -03:00
Raphael S. Carvalho
5919836da8 replica: Adapt table::rebuild_statistics() to compaction groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:04 -03:00
Raphael S. Carvalho
70b727db31 replica: table: Perform major compaction on behalf of all groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:01 -03:00
Raphael S. Carvalho
e3ccdb17a0 replica: table: Perform off-strategy compaction on behalf of all groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:00 -03:00
Raphael S. Carvalho
6efc9fd1f6 replica: table: Perform cleanup compaction on behalf of all groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:58 -03:00
Raphael S. Carvalho
36e11eb2a5 replica: Extend table::discard_sstables() to operate on all compaction groups
discard_sstables() runs on context of truncate, which is a table-wide
operation today, and will remain so with multiple static groups.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:55 -03:00
Raphael S. Carvalho
24c3687c3f replica: table: Create compound sstable set for all groups
Avoids extra compound set for single-compaction-group table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:52 -03:00
Raphael S. Carvalho
eb620da981 replica: table: Set compaction strategy on behalf of all groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:50 -03:00
Raphael S. Carvalho
7a0e4f900f replica: table: Return min memtable timestamp across all groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:49 -03:00
Raphael S. Carvalho
ceaa8a1ef1 replica: Adapt table::stop() to compaction groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:47 -03:00
Raphael S. Carvalho
facf923440 replica: Adapt table::clear() to compaction groups
clear() clears memtable content and cache.

Cache is shared by groups, therefore adaptation happens by only
clearing memtables of all groups.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:45 -03:00
Raphael S. Carvalho
a9c902cd5e replica: Adapt table::can_flush() to compaction groups
can_flush() is used externally to determine if a table has an active
memtable that can be flushed. Therefore, adaptation happens by
returning true if any of the groups can be flushed. A subsequent
flush request will flush memtable of all groups that are ready
for it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:44 -03:00
Raphael S. Carvalho
ea42090d47 replica: Adapt table::flush() to compaction groups
Adaptation of flush() happens by trigger flush on memtable of all
groups.
table::seal_active_memtable() will bail out if memtable is empty, so
it's not a problem to call flush on a group which memtable is empty.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:42 -03:00
Raphael S. Carvalho
7274c83098 replica: Introduce parallel_foreach_compaction_group()
This variant will be useful when iterating through groups
and performing async actions on each. It guarantees that all
groups are alive by the time they're reached in the loop.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:40 -03:00
Raphael S. Carvalho
89ab9d7227 replica: Adapt table::set_schema() to compaction groups
set_schema() is used by the database to apply schema changes to
table components which include memtables.
Adaptation happens by setting schema to memtable(s) of all groups.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:38 -03:00
Raphael S. Carvalho
0022322ae3 replica: Add memtables from all compaction groups for reads
Let's add memtables of all compaction groups. Point queries are
optimized by picking a single group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:36 -03:00
Raphael S. Carvalho
e044001176 replica: Add memtable_count() method to compaction_group
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:34 -03:00
Raphael S. Carvalho
f2ea79f26c replica: table: Reserve reader list capacity through a callback
add_memtables_to_reader_list() will be adapted to compaction groups.
For point queries, it will add memtables of a single group.
With the callback, add_memtables_to_reader_list() can tell its
caller the exact amount of memtable readers to be added, so it
can reserve precisely the readers capacity.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:33 -03:00
Raphael S. Carvalho
e841508685 replica: Extract addition of memtables to reader list into a new function
Will make it easier for adding memtables of all compaction groups.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:19 -03:00
Raphael S. Carvalho
530956b2de replica: Adapt table::occupancy() to compaction groups
table::occupancy() provides accumulated occupancy stats from
memtables.
Adaptation happens by accumulating stats from memtables of
all groups.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:17 -03:00
Raphael S. Carvalho
ef8f542d75 replica: Adapt table::active_memtable() to compaction groups
active_memtable() was fine to a single group, but with multiple groups,
there will be one active memtable per group. Let's change the
interface to reflect that.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:14 -03:00
Raphael S. Carvalho
429c5aa2f9 replica: Introduce table::compaction_groups()
Useful for iterating through all groups. This is intermediary
implementation which requires allocation as only one group
is supported today.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:12 -03:00
Raphael S. Carvalho
514008f136 replica: Preparation for multiple compaction groups
Adjusts scylla_memtables gdb command to multiple groups,
while keeping backward compatibility.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:10 -03:00
Raphael S. Carvalho
52b94b6dd7 scylla-gdb: Fix backward compatibility of scylla_memtables command
Fix it while refactoring the code for arrival of multiple compaction
groups.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:07 -03:00
Anna Stuchlik
bbfb9556fc doc: mark the in-memory tables feature as deprecated
Closes #12286
2022-12-19 15:39:31 +02:00
Avi Kivity
c70a9b0166 test: make test xml filenames more unique
ea99750de7 ("test: give tests less-unique identifiers") made
the disambiguating ids only be unambiguous within a single test
case. This made all tests named "run" have the name name "run.1".

Fix that by adding the suite name everywhere: in test paths, and
in junit test case names.

Fixes #12310.

Closes #12313
2022-12-19 15:03:51 +02:00
Botond Dénes
3e6ddf21bc Merge 'storage_service: unbootstrap: avoid unnecessary copy of ranges_to_stream' from Benny Halevy
`ranges_to_stream` is a map of ` std::unordered_multimap<dht::token_range, inet_address>` per keyspace.
On large clusters with a large number of keyspace, copying it may cause reactor stalls as seen in #12332

This series eliminates this copy by using std::move and also
turns `stream_ranges` into a coroutine, adding maybe_yield calls to avoid further stalls down the road.

Fixes #12332

Closes #12343

* github.com:scylladb/scylladb:
  storage_service: stream_ranges: unshare streamer
  storage_service: stream_ranges: maybe_yield
  storage_service: coroutinize stream_ranges
  storage_service: unbootstrap: move ranges_to_stream_by_keyspace to stream_ranges
2022-12-19 12:53:16 +02:00
Benny Halevy
e8aa1182b2 docs: replace-dead-node: get rid of hibernate status
With replace using node operations, the HIBERNATE
gossip status is not used anymore.

This change updates documentation to reflect that.
During replace, the replacing nodes shows in gossipinfo
in STATUS:NORMAL.

Also, the replaced node shows as DN in `nodetool status`
while being replaced, so remove paragraph showing it's
not listed in `nodetool status`.

Plus. tidy up the text alignment.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-19 12:19:10 +02:00
Benny Halevy
c9993f020d storage_service: get rid of handle_state_replacing
Since 2ec1f719de nodes no longer
publish HIBERNATE state so we don't need to support handling it.

Replace is now always done using node operations (using
repair or streaming).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-19 12:19:08 +02:00
Benny Halevy
60de7d28db storage_service: stream_ranges: unshare streamer
Now that stream_ranges is a coroutine
streamer can be an automatic variable on the
coroutine stack frame.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-19 07:42:07 +02:00
Benny Halevy
9badcd56ca storage_service: stream_ranges: maybe_yield
Prevent stalls with a large number of keyspaces
and token ranges.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-19 07:42:07 +02:00
Benny Halevy
2cf75319b0 storage_service: coroutinize stream_ranges
Before adding maybe_yield calls.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-19 07:42:01 +02:00
Benny Halevy
82486bb5d2 storage_service: unbootstrap: move ranges_to_stream_by_keyspace to stream_ranges
Avoid a potentially large memory copy causing
a reactor stall with a large number of keyspaces.

Fixes #12332

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-19 07:39:48 +02:00
Avi Kivity
7c7eb81a66 Merge 'Encapsulate filesystem access by sstable into filesystem_storage subsclass' from Pavel Emelyanov
This is to define the API sstable needs from underlying storage. When implementing object-storage backend it will need to implement those. The API looks like

        future<> snapshot(const sstable& sst, sstring dir, absolute_path abs) const;
        future<> quarantine(const sstable& sst, delayed_commit_changes* delay);
        future<> move(const sstable& sst, sstring new_dir, generation_type generation, delayed_commit_changes* delay);
        void open(sstable& sst, const io_priority_class& pc); // runs in async context
        future<> wipe(const sstable& sst) noexcept;

        future<file> open_component(const sstable& sst, component_type type, open_flags flags, file_open_options options, bool check_integrity);

It doesn't have "list" or alike, because it's not a method of an individual sstable, but rather the one from sstables_manager. It will come as separate PR.

Closes #12217

* github.com:scylladb/scylladb:
  sstable, storage: Mark dir/temp_dir private
  sstable: Remove get_dir() (well, almost)
  sstable: Add quarantine() method to storage
  sstable: Use absolute/relative path marking for snapshot()
  sstable: Remove temp_... stuff from sstable
  sstable: Move open_component() on storage
  sstable: Mark rename_new_sstable_component_file() const
  sstable: Print filename(type) on open-component error
  sstable: Reorganize new_sstable_component_file()
  sstable: Mark filename() private
  sstable: Introduce index_filename()
  tests: Disclosure private filename() calls
  sstable: Move wipe_storage() on storage
  sstable: Remove temp dir in wipe_storage()
  sstable: Move unlink parts into wipe_storage
  sstable: Remove get_temp_dir()
  sstable: Move write_toc() to storage
  sstable: Shuffle open_sstable()
  sstable: Move touch_temp_dir() to storage
  sstable: Move move() to storage
  sstable: Move create_links() to storage
  sstable: Move seal_sstable() to storage
  sstable: Tossing internals of seal_sstable()
  sstable: Move remove_temp_dir() to storage
  sstable: Move create_links_common() to storage
  sstable: Move check_create_links_replay() to storage
  sstable: Remove one of create_links() overloads
  sstable: Remove create_links_and_mark_for_removal()
  sstable: Indentation fix after prevuous patch
  sstable: Coroutinize create_links_common()
  sstable: Rename create_links_common()'s "dir" argument
  sstable: Make mark_for_removal bool_class
  sstable, table: Add sstable::snapshot() and use in table::take_snapshot
  sstable: Move _dir and _temp_dir on filesystem_storage
  sstable: Use sync_directory() method
  test, sstable: Use component_basename in test
  sstables: Move read_{digest|checksum} on sstable
2022-12-18 17:29:35 +02:00
Anna Stuchlik
6a8eb33284 docs: add the new upgade guide 2022.1 to 2022.2 to the index and the toctree 2022-12-16 17:13:50 +01:00
Anna Stuchlik
36f4ef2446 docs: add the index file for the new upgrage guide from 2022.1 to 2022.2 2022-12-16 17:11:25 +01:00
Anna Stuchlik
8d8983e029 docs: add the metrics update file to the upgrade guide 2022.1 to 2022.2 2022-12-16 17:09:21 +01:00
Anna Stuchlik
252c2139c2 docs: add the upgrade guide for ScyllaDB Enterprise from 2022.1 to 2022.2 2022-12-16 17:07:00 +01:00
Michał Chojnowski
b52bd9ef6a db: commitlog: remove unused max_active_writes()
Dead and misleading code.

Closes #12327
2022-12-16 10:23:03 +02:00
Nadav Har'El
327539b15d Merge 'test.py: fix cql failure handling' from Alecco
Fix a bug in failure handling and log level.

Closes #12336

* github.com:scylladb/scylladb:
  test.py: convert param to str
  test.py: fix error level for CQL tests
2022-12-16 09:29:21 +02:00
Botond Dénes
cc03becf82 Merge 'tasks: get task's type with method' from Aleksandra Martyniuk
Type of operation is related to a specific implementation
of a task. Then, it should rather be access with a virtual
method in tasks::task_manager::task::impl than be
its attribute.

Closes #12326

* github.com:scylladb/scylladb:
  api: delete unused type parameter from task_manager_test api
  tasks: repair: api: remove type attribute from task_manager::task::status
  tasks: add type() method to task_manager::task::impl
  repair: add reason attribute to repair_task
2022-12-16 09:20:26 +02:00
Aleksandra Martyniuk
f81ad2d66a repair: make shard tasks internal
Shard tasks should not be visible to users by default, thus they are
made internal.

Closes #12325
2022-12-16 09:05:30 +02:00
Aleksandra Martyniuk
bae887da3b tasks: add virtual destructor to task_manager::module
When an object of a class inheriting from task_manager::module
is destroyed, destructor of the derived class should be called.

Closes #12324
2022-12-16 08:59:26 +02:00
Raphael S. Carvalho
e6fb3b3a75 compaction: Delete atomically off-strategy input sstables
After commit a57724e711, off-strategy no longer races with view
building, therefore deletion code can be simplified and piggyback
on mechanism for deleting all sstables atomically, meaning a crash
midway won't result in some of the files coming back to life,
which leads to unnecessary work on restart.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12245
2022-12-16 08:15:49 +02:00
Alejo Sanchez
9b65448d38 test.py: convert param to str
The format_unidiff() function takes str, not pathlib PosixPath, so
convert it to str.

This prevented diff output of unexpected result to be shown in the log
file.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-12-15 20:46:35 +01:00
Alejo Sanchez
5142d80bb1 test.py: fix error level for CQL tests
If the test fails, use error log level.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-12-15 20:45:44 +01:00
Botond Dénes
64903ba7d5 test/cql-pytest: use pytest site-packages workaround
Recently, the pytest script shipped by Fedora started invoking python
with the `-s` flag, which disables python considering user site
packages. This caused problems for our tests which install the cassandra
driver in the user site packages. This was worked around in e5e7780f32
by providing our own pytest interposer launcher script which does not
pass the above mentioned flag to python. Said patch fixed test.py but
not the run.py in cql-pytest. So if the cql-pytest suite is ran via
test.py it works fine, but if it is invoked via the run script, it fails
because it cannot find the cassandra driver. This patch patches run.py
to use our own pytest launcher script, so the suite can be run via the
run script as well.
Since run.py is shared with the alternator pytest suite, this patch also
fixes said test suite too.

Closes #12253
2022-12-15 16:05:31 +02:00
Benny Halevy
639e247734 test: cql-pytest: test_describe: test_table_options_quoting: USE test_keyspace
Without that, I often (but not always) get the following error:
```
__________________________ test_table_options_quoting __________________________

cql = <cassandra.cluster.Session object at 0x7f1aafb10650>
test_keyspace = 'cql_test_1671103335055'

    def test_table_options_quoting(cql, test_keyspace):
        type_name = f"some_udt; DROP KEYSPACE {test_keyspace}"
        column_name = "col''umn -- @quoting test!!"
        comment = "table''s comment test!\"; DESC TABLES --quoting test"
        comment_plain = "table's comment test!\"; DESC TABLES --quoting test" #without doubling "'" inside comment

>       cql.execute(f"CREATE TYPE \"{type_name}\" (a int)")

test/cql-pytest/test_describe.py:623:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cassandra/cluster.py:2699: in cassandra.cluster.Session.execute
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   cassandra.InvalidRequest: Error from server: code=2200 [Invalid query] message="No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename"
```

CQL driver in use ise the scylla driver version 3.25.10.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12329
2022-12-15 14:35:33 +02:00
Aleksandra Martyniuk
f0b2b00a15 api: delete unused type parameter from task_manager_test api 2022-12-15 10:50:30 +01:00
Aleksandra Martyniuk
5bc09daa7a tasks: repair: api: remove type attribute from task_manager::task::status 2022-12-15 10:49:09 +01:00
Aleksandra Martyniuk
8d5377932d tasks: add type() method to task_manager::task::impl 2022-12-15 10:41:58 +01:00
Aleksandra Martyniuk
329176c7bc repair: add reason attribute to repair_task
As a preparation to creating a type() method in task_manager::task::impl
a streaming::stream_reason is kept in repair_task.
2022-12-15 10:38:38 +01:00
Botond Dénes
9713a5c314 tool/scylla-sstable: move documentation online
The inline-help of operations will only contain a short summary of the
operation and the link to the online documentation.
The move is not a straightforward copy-paste. First and foremost because
we move from simple markdown to RST. Informal references are also
replaced with proper RST links. Some small edits were also done on the
texts.
The intent is the following:
* the inline help serves as a quick reference for what the operation
  does and what flags it has;
* the online documentation serves as the full reference manual,
  explaining all details;
2022-12-15 04:10:21 -05:00
Botond Dénes
3cf7afdf95 docs: scylla-sstable.rst: add sstable content section
Provides a link to the architecture/sstable page for more details on the
sstable format itself. It also describes the mutation-fragment stream,
the parts of it that is relevant to the sstable operations.
The purpose of this section is to provide a target for links that want to
point to a common explanation on the topic. In particular, we will soon
move the detailed documentation of the scylla-sstable operations into
this file and we want to have a common explanation of the mutation
fragment stream that these operations can point to.
2022-12-15 04:10:21 -05:00
Botond Dénes
641fb4c8bb docs: scylla-{sstable,types}.rst: drop Syntax section
In both files, the section hierarchy is as follows:

    Usage
        Syntax
            Sections with actual content

This scheme uses up 3 levels of hierarchy, leaving not much room to
expand the sections with actual content with subsections of their own.
Remove the Syntax level altogether, directly embedding the sections with
content under the Usage section.
2022-12-15 04:03:00 -05:00
Botond Dénes
8f8284783a Merge 'Fix handling of non-full clustering keys in the read path' from Tomasz Grabiec
This PR fixes several bugs related to handling of non-full
clustering keys.

One is in trim_clustering_row_ranges_to(), which is broken for non-full keys in reverse
mode. It will trim the range to position_in_partition_view::after_key(full_key) instead of
position_in_partition_view::before_key(key), hence it will include the
key in the resulting range rather than exclude it.

Fixes #12180

after_key() was creating a position which is after all keys prefixed
by a non-full key, rather than a position which is right after that
key.

This will issue will be caught by cql_query_test::test_compact_storage
in debug mode when mutation_partition_v2 merging starts inserting
sentinels at position after_key() on preemption.

It probably already causes problems for such keys as after_key() is used
in various parts in the read path.

Refs #1446

Closes #12234

* github.com:scylladb/scylladb:
  position_in_partition: Make after_key() work with non-full keys
  position_in_partition: Introduce before_key(position_in_partition_view)
  db: Fix trim_clustering_row_ranges_to() for non-full keys and reverse order
  types: Fix comparison of frozen sets with empty values
2022-12-15 10:47:12 +02:00
Pavel Emelyanov
6d10a3448b sstable, storage: Mark dir/temp_dir private
Now all storage access via sstable happens with the help of storage
class API so its internals can be finally made private.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
6296ca3438 sstable: Remove get_dir() (well, almost)
The sstable::get_dir() is now gone, no callers know that sstable lives
in any path on a filesystem. There are only few callers left.

One is several places in code that need sstable datafile, toc and index
paths to print them in logs. The other one is sstable_directory that is
to be patched separately.

For both there's a storage.prefix() method that prepends component name
with where the sstable is "really" located.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
7402787d16 sstable: Add quarantine() method to storage
Moving sstable to quarantine has some specific -- if the sstable is in
staging/ directory it's anyway moved into root/quarantine dir, not into
the quarantine subdir of its current location.

Encapsulate this feture in storage class method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
f507271578 sstable: Use absolute/relative path marking for snapshot()
The snapshotting code uses full paths to files to manipulate snapshotted
sstables. Until this code is patched to use some proper snapshotting API
from sstable/ module, it will continue doing so.

Nowever, to remove the get_dir() method from sstable() the
seal_sstable() needs to put relative "backup" directory to
storage::snapshot() method. This patch adds a temporary bool_class for
this distinguishing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
a46d378bee sstable: Remove temp_... stuff from sstable
There's a bunch of helpers around XFS-specific temp-dir sitting in
publie sstable part. Drop it altogether, no code needs it for real.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
adba24d8ae sstable: Move open_component() on storage
Obtaining a class file object to read/write sstable from/to is now
storage-specific.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
4c22831d23 sstable: Mark rename_new_sstable_component_file() const
It's in fact such. Next patch will need it const to call this method
via const sstable reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
6bf3e3a921 sstable: Print filename(type) on open-component error
The file path is going to disappear soon, so print the filename() on
error. For now it's the same, but the meaning of the filename()
returning string is changing to become "random label for the log
reader".

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
dc72bce6d7 sstable: Reorganize new_sstable_component_file()
The helper consists of three stages:

1. open a file (probably in a temp dir)
2. decorate it with extentions and checked_file
3. optionally rename a file from temp dir

The latter is done to trigger XFS allocate this file in separate block
group if the file was created in temp dir on step 1.

This patch swaps steps 2 and 3 to keep filesystem-specific opening next
to each other.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
e55c740f49 sstable: Mark filename() private
From now on no callers should use this string to access anything on disk

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
5f579eb405 sstable: Introduce index_filename()
Currently the sstable::filename(Index) is used in several places that
get the filename as a printable or throwable string and don't treat is
as a real location of any file.

For those, add the index_filename() helper symmetrical to toc_filename()
and (in some sense) the get_filename() one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
bbbbd6dbfc tests: Disclosure private filename() calls
The sstable::filename() is going to become private method. Lots of tests
call it, but tests do call a lot of other sstable private methods,
that's OK. Make the sstable::filename() yet another one of that kind in
advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
4a91f3d443 sstable: Move wipe_storage() on storage
Now when the filesystem cleaning code is sitting in one method, it can
finally be made the storage class one.

Exception-safe allocation of toc_name (spoiler: it's copied anyway one
step later, so it's "not that safe" actually) is moved into storage as
well. The caller is left with toc_filename() call in its exception
handler.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
c92d45eaa9 sstable: Remove temp dir in wipe_storage()
When unlinking an sstable for whatever reason it's good to check if the
temp dir is handing around. In some cases it's not (compaction), but
keeping the whole wiping code together makes it easier to move it on
storage class in one go.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
88ede71320 sstable: Move unlink parts into wipe_storage
Just move the code. This is to make the next patch smaller.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
0336cb3bdd sstable: Remove get_temp_dir()
Only one private called of it left, it's better to open-code it there

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
3326063b8b sstable: Move write_toc() to storage
This method initiates the sstable creation. Effectively it's the first
step in sstable creation transaction implemented on top of rename()
call. Thus this method is moved onto storage under respective name.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
636d49f1c1 sstable: Shuffle open_sstable()
When an sstable is prepared to be written on disk the .write_toc() is
called on it which created temporary toc file. Prior to this, the writer
code calls generate_toc() to collect components on the sstable.

This patch adds the .open_sstable() API call that does both. This
prepares the write_toc() part to be moved to storage, because it's not
just "write data into TOC file", it's the first step in transaction
implemeted on top of rename()s.

The test need care -- there's rewrite_toc_without_scylla_component()
thing in utils that doesn't want the generate_toc() part to be called.
It's not patched here and continues calling .write_toc().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
d3216b10d6 sstable: Move touch_temp_dir() to storage
The continuation of the previously moved remove_temp_dir() one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00
Pavel Emelyanov
1a34cb98fc sstable: Move move() to storage
The sstable can be "moved" in two cases -- to move from staging or to
move to quarantine. Both operation are sstable API ones, but the
implementation is storage-specific. This patch makes the latter a method
of storage class.

One thing to note is that only quarantine() touched the target directly.
Now also the move_to_new_dir() happenning on load also does it, but
that's harmless.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:47 +03:00
Pavel Emelyanov
18f6165993 sstable: Move create_links() to storage
This method is currently used in two places: sstable::snapshot() and
sstable::seal_sstable(). The latter additionally touches the target
backup/ subdir.

This patch moves the whole thing on storage and adds touch for all the
cases. For snapshots this might be excessive, but harmless.

Tests get their private-disclosure way to access sstable._storage in
few places to call create_links directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:45 +03:00
Pavel Emelyanov
136a8681e0 sstable: Move seal_sstable() to storage
Now the sstable sealing is split into storage part, internal-state part
and the seal-with-backup kick.

This move makes remove_temp_dir() private.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:45 +03:00
Pavel Emelyanov
334d231f56 sstable: Tossing internals of seal_sstable()
There are two of them -- one API call and the other one that just
"seals" it. The latter one also changes the _marked_for_deletion bit on
the sstable.

This patch makes the latter method prepared to be moved onto storage,
because sealing means comitting TOC file on disk with the help of rename
system call which is purely storage thing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:45 +03:00
Pavel Emelyanov
ce3a8a4109 sstable: Move remove_temp_dir() to storage
This one is simple, it just accesses _temp_dir thing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:45 +03:00
Pavel Emelyanov
9027d137d2 sstable: Move create_links_common() to storage
Same as previous patch. This move makes the previously moved
check_create_links_replay() a private method of the storage class.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:45 +03:00
Pavel Emelyanov
990032b988 sstable: Move check_create_links_replay() to storage
It needs to get sstable const reference to get the filename(s) from it.
Other than that it's pure filesystem-accessing method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:45 +03:00
Pavel Emelyanov
041a8c80ad sstable: Remove one of create_links() overloads
There are two -- one that accepts generation and the other one that does
not. The latter is only called by the former, so no need in keeping both.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:45 +03:00
Pavel Emelyanov
f1558b6988 sstable: Remove create_links_and_mark_for_removal()
There's only one user of it, it can document its "and mark for removal"
intention via dedicated bool_class argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:45 +03:00
Pavel Emelyanov
65f40b28e6 sstable: Indentation fix after prevuous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:45 +03:00
Pavel Emelyanov
428adda4a9 sstable: Coroutinize create_links_common()
Looks much shorter and easier-to-patch this way.

The dst_dir argument is made value from const reference, old code copied
it with do_with() anyway.

Indentation is deliberately left broken until next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:45 +03:00
Pavel Emelyanov
ab13a99586 sstable: Rename create_links_common()'s "dir" argument
The whole method is going to move onto newly introduced
filesystem_storage that already has field of the same name onboard. To
avoid confusion, rename the argument to dst_dir.

No functional changes, _just_ s/dir/dst_dir/g throughout the method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:45 +03:00
Pavel Emelyanov
4977c73163 sstable: Make mark_for_removal bool_class
Its meaning is comment-documented anyway. Also, next patches will remove
the create_links_and_mark_for_removal() so callers need some verbose
meaning of this boolean in advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:45 +03:00
Pavel Emelyanov
f53d6804a6 sstable, table: Add sstable::snapshot() and use in table::take_snapshot
The replica/ code now "knows" that snapshotting an sstable means
creating a bunch of hard-links on disk. Abstract that via
sstable::snapshot() method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:44 +03:00
Pavel Emelyanov
2803dcda6d sstable: Move _dir and _temp_dir on filesystem_storage
Those two fields define the way sstable is stored as collection of
on-disk files. First step towards making the storage access abstract is
in moving the paths onto filesystem_storage embedded class.

Both are made public for now, the rest of the code is patched to access
them via _storage.<smth>. The rest of the set moves parts of sstable::
methods into the filesystem_storage, then marks the paths private.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:44 +03:00
Pavel Emelyanov
17c8ba6034 sstable: Use sync_directory() method
The sstable::write_toc() executes sync_directory() by hand. Better to
use the method directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:44 +03:00
Pavel Emelyanov
e934f42402 test, sstable: Use component_basename in test
One case gets full sstable datafile path to get the basename from it.
There's already the basename helper on the class sstable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:44 +03:00
Pavel Emelyanov
376915d406 sstables: Move read_{digest|checksum} on sstable
These methods access sstables as files on disk, in order to hide the
"path on filesystem" meaning of sstables::filename() the whole method
should be made sstable:: one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:13:44 +03:00
Pavel Emelyanov
d561495f0d Merge 'topology: get rid of pending state' from Benny Halevy
Now, with a44ca06906, is_normal_token_owner that replaced is_member
does not rely anymore on the pending status
of endpoints in topology.

With that we can get rid of this state and just keep all endpoints we know about in the topology.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12294

* github.com:scylladb/scylladb:
  topology: get rid of pending state
  topology: debug log update and remove endpoint
2022-12-14 19:28:35 +03:00
Benny Halevy
bdb6550305 view: row_locker: add latency_stats_tracker
Refactor the existing stats tracking and updating
code into struct latency_stats_tracker and while at it,
count lock_acquisitions only on success.

Decrement operations_currently_waiting_for_lock in the destructor
so it's always balanced with the uncoditional increment
in the ctor.

As for updating estimated_waiting_for_lock, it is always
updated in the dtor, both on success and failure since
the wait for the lock happened, whether waiting
timed out or not.

Fixes #12190

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12225
2022-12-14 17:37:22 +02:00
Avi Kivity
9ee78975b7 Merge 'Fix topology mismatch on read-repair handler creation' from Pavel Emelyanov
The schedule_repair() receives a bunch of endpoint:mutations pairs and tries to create handlers for those. When creating the handlers it re-obtains topology from schema->ks->effective_replication_map chain, but this new topology can be outdated as compared to the list of endpoints at hand.

The fix is to carry the e.r.m. pointer used by read executor reconciliation all the way down to repair handlers creation. This requires some manipulations with mutate_internal() and mutate_prepare() argument lists.

fixes: #12050 (it was the same problem)

Closes #12256

* github.com:scylladb/scylladb:
  proxy: Carry replication map with repair mutation(s)
  proxy: Wrap read repair entries into read_repair_mutation
  proxy: Turn ref to forwardable ref in mutations iterator
2022-12-14 17:33:43 +02:00
Tomasz Grabiec
23e4c83155 position_in_partition: Make after_key() work with non-full keys
This fixes a long standing bug related to handling of non-full
clustering keys, issue #1446.

after_key() was creating a position which is after all keys prefixed
by a non-full key, rather than a position which is right after that
key.

This will issue will be caught by cql_query_test::test_compact_storage
in debug mode when mutation_partition_v2 merging starts inserting
sentinels at position after_key() on preemption.

It probably already causes problems for such keys.
2022-12-14 14:47:33 +01:00
Botond Dénes
16c50bed5e Merge 'sstables: coroutinize update_info_for_opened_data' from Avi Kivity
A complicated function (in continuation style) that benefits
from this simplification.

Closes #12289

* github.com:scylladb/scylladb:
  sstables: update_info_for_opened_data: reindent
  sstables: update_info_for_opened_data: coroutinize
2022-12-14 15:12:22 +02:00
Nadav Har'El
92d03be37b materialized view: fix bug in some large modifications to base partitions
Sometimes a single modification to a base partition requires updates to
a large number of view rows. A common example is deletion of a base
partition containing many rows. A large BATCH is also possible.

To avoid large allocations, we split the large amount of work into
batch of 100 (max_rows_for_view_updates) rows each. The existing code
assumed an empty result from one of these batches meant that we are
done. But this assumption was incorrect: There are several cases when
a base-table update may not need a view update to be generated (see
can_skip_view_updates()) so if all 100 rows in a batch were skipped,
the view update stopped prematurely. This patch includes two tests
showing when this bug can happen - one test using a partition deletion
with a USING TIMESTAMP causing the deletion to not affect the first
100 rows, and a second test using a specially-crafed large BATCH.
These use cases are fairly esoteric, but in fact hit a user in the
wild, which led to the discovery of this bug.

The fix is fairly simple: To detect when build_some() is done it is no
longer enough to check if it returned zero view-update rows; Rather,
it explicitly returns whether or not it is done as an std::optional.

The patch includes several tests for this bug, which pass on Cassandra,
failed on Scylla before this patch, and pass with this patch.

Fixes #12297.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12305
2022-12-14 14:50:38 +02:00
Botond Dénes
e7d8855675 Merge 'Revert accidental submodule updates' from Benny Halevy
The abseil and tools/java submodules were accidentally updated in
71bc12eecc
(merged to master in 51f867339e)

This series reverts those changes.

Closes #12311

* github.com:scylladb/scylladb:
  Revert accidental update of tools/java submodule
  Revert accidental update of abseil submodule
2022-12-14 13:20:08 +02:00
Benny Halevy
865193f99a Revert accidental update of tools/java submodule
The tools/java submodule was accidentally updated
in 71bc12eecc
Revert this change.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-14 13:06:30 +02:00
Benny Halevy
9911ba195b Revert accidental update of abseil submodule
The abseil module was accidentally updated
in 71bc12eecc
Revert this change.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-14 13:05:04 +02:00
Pavel Emelyanov
ab8fc0e166 proxy: Carry replication map with repair mutation(s)
The create_write_response_handler() for read repair needs the e.r.m.
from the caller, because it effectively accepts list of endpoints from
it.

So this patch equips all read_repair_mutation-s with the e.r.m. pointer
so that the handler creation can use it. It's the same for all
mutations, so it's a waste of space, but it's not bad -- there's
typically few mutations in this range and the entry passed there is
temporary, so even lots of them won't occupy lots of memory for long.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-14 14:03:39 +03:00
Pavel Emelyanov
140f373e15 proxy: Wrap read repair entries into read_repair_mutation
The schedule_repair() operates on a map of endpoint:mutations pairs.
Next patch will need to extend this entry and it's going to be easier if
the entry is wrapped in a helper structure in advance.

This is where the forwardable reference cursor from the previous patch
gets its user. The schedule_repair() produces a range of rvalue
wrappers, but the create_write_response_handler accepting it is OK, it
copies mutations anyway.

The printing operator is added to facilitate mutations logging from
mutate_internal() method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-14 14:01:12 +03:00
Pavel Emelyanov
014b563ef1 proxy: Turn ref to forwardable ref in mutations iterator
The mutate_prepare() is iterating over range of mutation with 'auto&'
cursor thus accepting only lvalues. This is very restrictive, the caller
of mutate_prepare() may as well provide rvalues if the target
create_write_response_handler() or lambda accepts it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-14 14:00:10 +03:00
Avi Kivity
3fa230fee4 Merge 'cql3: expr: make it possible to prepare and evaluate conjunctions' from Jan Ciołek
This PR implements two things:
* Getting the value of a conjunction of elements separated by `AND` using `expr::evaluate`
* Preparing conjunctions using `prepare_expression`

---

`NULL` is treated as an "unkown value" - maybe `true` maybe `false`.
`TRUE AND NULL` evaluates to `NULL` because it might be `true` but also might be `false`.
`FALSE AND NULL` evaluates to `FALSE` because no matter what value `NULL` acts as, the result will still be `FALSE`.
Unset and empty values are not allowed.

Usually in CQL the rule is that when `NULL` occurs in an operation the whole expression becomes `NULL`, but here we decided to deviate from this behavior.
Treating `NULL` as an "unkown value" is the standard SQL way of handing `NULLs` in conjunctions.
It works this way in MySQL and Postgres so we do it this way as well.

The evaluation short-circuits. Once `FALSE` is encountered the function returns `FALSE` immediately without evaluating any further elements.
It works this way in Postgres as well, for example:
`SELECT true AND NULL AND 1/0 = 0` will throw a division by zero error,
 but `SELECT false AND 1/0 = 0` will successfully evaluate to `FALSE`.

Closes #12300

* github.com:scylladb/scylladb:
  expr_test: add unit tests for prepare_expression(conjunction)
  cql3: expr: make it possible to prepare conjunctions
  expr_test: add tests for evaluate(conjunction)
  cql3: expr: make it possible to evaluate conjunctions
2022-12-14 09:48:26 +02:00
Botond Dénes
122b267478 Merge 'repair: coroutinize to_repair_rows_list' from Avi Kivity
Simplify a somewhat complicated function.

Closes #12290

* github.com:scylladb/scylladb:
  repair: to_repair_rows_list: reindent
  repair: to_repair_rows_list: coroutinize
2022-12-14 09:39:47 +02:00
Avi Kivity
c09583bcef storage_proxy: coroutinize send_truncate_blocking
Not particularly important, but a small simplification.

Closes #12288
2022-12-14 09:39:33 +02:00
Tomasz Grabiec
132d5d4fa1 messaging: Shutdown on stop() if it wasn't shut down earlier
All rpc::client objects have to be stopped before they are
destroyed. Currently this is done in
messaging_service::shutdown(). The cql_test_env does not call
shutdown() currently. This can lead to use-after-free on the
rpc::client object, manifesting like this:

Segmentation fault on shard 0.
Backtrace:
column_mapping::~column_mapping() at schema.cc:?
db::cql_table_large_data_handler::internal_record_large_cells(sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long) const at ./db/large_data_handler.cc:180
operator() at ./db/large_data_handler.cc:123
 (inlined by) seastar::future<void> std::__invoke_impl<seastar::future<void>, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long>(std::__invoke_other, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const*&&, column_definition const&, unsigned long&&, unsigned long&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<seastar::future<void>, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long>, seastar::future<void> >::type std::__invoke_r<seastar::future<void>, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long>(db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const*&&, column_definition const&, unsigned long&&, unsigned long&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:114
 (inlined by) std::_Function_handler<seastar::future<void> (sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long), db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1>::_M_invoke(std::_Any_data const&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const*&&, column_definition const&, unsigned long&&, unsigned long&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<seastar::future<void> (sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long)>::operator()(sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) db::cql_table_large_data_handler::record_large_cells(sstables::sstable const&, sstables::key const&, clustering_key_prefix const*, column_definition const&, unsigned long, unsigned long) const at ./db/large_data_handler.cc:175
seastar::rpc::log_exception(seastar::rpc::connection&, seastar::log_level, char const*, std::__exception_ptr::exception_ptr) at ./build/release/seastar/./seastar/src/rpc/rpc.cc:109
operator() at ./build/release/seastar/./seastar/src/rpc/rpc.cc:788
operator() at ./build/release/seastar/./seastar/include/seastar/core/future.hh:1682
 (inlined by) void seastar::futurize<seastar::future<void> >::satisfy_with_result_of<seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14>(seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14>(seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&) const::{lambda()#1}&&) at ./build/release/seastar/./seastar/include/seastar/core/future.hh:2134
 (inlined by) operator() at ./build/release/seastar/./seastar/include/seastar/core/future.hh:1681
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14>(seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void*, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>::run_and_dispose() at ./build/release/seastar/./seastar/include/seastar/core/future.hh:781
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2319
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2756
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2925
seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2808
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:265
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:156
operator() at ./build/release/seastar/./seastar/src/testing/test_runner.cc:75
 (inlined by) void std::__invoke_impl<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>(std::__invoke_other, seastar::testing::test_runner::start_thread(int, char**)::$_0&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>, void>::type std::__invoke_r<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>(seastar::testing::test_runner::start_thread(int, char**)::$_0&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (), seastar::testing::test_runner::start_thread(int, char**)::$_0>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73

Fix by making sure that shutdown() is called prior to destruction.

Fixes #12244

Closes #12276
2022-12-14 10:28:26 +03:00
Tzach Livyatan
7cd613fc08 Docs: Improve wording on the os-supported page v2
Closes #11871
2022-12-14 08:59:26 +02:00
Botond Dénes
31fcfe62e1 Merge 'doc: add the description of AzureSnitch to the documentation' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/11712

Updates added with this PR:
- Added a new section with the description of AzureSnitch (similar to others + examples and language improvements).
- Fixed the headings so that they render properly.
- Replaced "Scylla" with "ScyllaDB".

Closes #12254

* github.com:scylladb/scylladb:
  docs: replace Scylla with ScyllaDB on the Snitches page
  docs: fix the headings on the Snitches page
  doc: add the description of AzureSnitch to the documentation
2022-12-14 08:58:48 +02:00
Lubos Kosco
3f9dca9c60 doc: print out the generated UUID for sending to support
Closes #12176
2022-12-14 08:57:54 +02:00
guy9
a329fcd566 Updated University monitoring lesson link
Closes #11906
2022-12-14 08:50:26 +02:00
Jan Ciolek
9afa9f0e50 expr_test: add unit tests for prepare_expression(conjunction)
Add unit tests which ensure that preparing conjunctions
works as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-12-13 20:23:17 +01:00
Jan Ciolek
dde86a2da6 cql3: expr: make it possible to prepare conjunctions
prepare_expression used to throw an error
when encountering a conjunction.

Now it's possible to use prepare_expression
to prepare an expression that contains
conjunctions.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-12-13 20:23:17 +01:00
Jan Ciolek
5f5b1c4701 expr_test: add tests for evaluate(conjunction)
Add unit tests which ensure that evaluating
a conjunction behaves as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-12-13 20:23:17 +01:00
Jan Ciolek
b3c16f6bc8 cql3: expr: make it possible to evaluate conjunctions
Previously it was impossible to use expr::evaluate()
to get the value of a conjunction of elements
separated by ANDs.

Now it has been implemented.

NULL is treated as an "unkown value" - maybe true maybe false.
`TRUE AND NULL` evaluates to NULL because it might be true but also might be false.
`FALSE AND NULL` evaluates to FALSE because no matter what value NULL acts as, the result will still be FALSE.
Unset and empty values are not allowed.

Usually in CQL the rule is that when NULL occurs in an operation the whole expression
becomes NULL, but here we decided to deviate from this behavior.
Treating NULL as an "unkown value" is the standard SQL way of handing NULLs in conjunctions.
It works this way in MySQL and Postgres so we do it this way as well.

The evaluation short-circuits. Once FALSE is encountered the function returns FALSE
immediately without evaluating any further elements.
It works this way in Postgres as well, for example:
`SELECT true AND NULL AND 1/0 = 0` will throw a division by zero error
but `SELECT false AND 1/0 = 0` will successfully evaluate to FALSE.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-12-13 20:23:08 +01:00
Benny Halevy
e9e66f3ca7 database: drop_table_on_all_shards: limit truncated_at time
The infinetely high time_point of `db_clock::time_point::max()`
used in ba42852b0e
is too high for some clients that can't represent
that as a date_time string.

Instead, limit it to 9999-12-31T00:00:00+0000,
that is practically sufficient to ensure truncation of
all sstables and should be within the clients' limits.

Fixes #12239

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12273
2022-12-13 16:46:20 +02:00
Avi Kivity
919888fe60 Merge 'docs/dev: Add backport instructions for contributors' from Jan Ciołek
Add instructions on how to backport a feature to on older version of Scylla.

It contains a detailed step-by-step instruction so that people unfamiliar with intricacies of Scylla's repository organization can easily get the hang of it.

This is the guide I wish I had when I had to do my first backport.

I put it in backport.md because that looks like the file responsible for this sort of information.
For a moment I thought about `CONTRIBUTING.md`, but this is a really short file with general information, so it doesn't really fit there. Maybe in the future there will be some sort of unification (see #12126)

Closes #12138

* github.com:scylladb/scylladb:
  dev/docs: add additional git pull to backport docs
  docs/dev: add a note about cherry-picking individual commits
  docs/dev: use 'is merged into' instead of 'becomes'
  docs/dev: mention that new backport instructions are for the contributor
  docs/dev: Add backport instructions for contributors
2022-12-13 16:27:04 +02:00
Pavel Emelyanov
fe4cf231bc snitch: Check http response codes to be OK
Several snitch drivers make http requests to get
region/dc/zone/rack/whatever from the cloud provider. They blindly rely
on the response being successfull and read response body to parse the
data they need from.

That's not nice, add checks for requests finish with http OK statuses.

refs: #12185

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12287
2022-12-13 14:49:18 +02:00
Benny Halevy
68141d0aac topology: get rid of pending state
Now, with a44ca06906,
is_normal_token_owner that replaced is_member
does not rely anymore on the pending status
of endpoints in topology.

With that we can get rid of this state and just keep
all endpoints we know about in the topology.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-13 14:17:18 +02:00
Benny Halevy
f2753eba30 topology: debug log update and remove endpoint
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-13 14:17:13 +02:00
Avi Kivity
c7cee0da40 Merge 'storage_service: handle_state_normal: always update_topology before update_normal_tokens' from Benny Halevy
update_normal_tokens checks that that the endpoint is in topology. Currently we call update_topology on this path only if it's not a normal_token_owner, but there are paths when the endpoint could be a normal token owner but still
be pending in topology so always update it, just in case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12080

* github.com:scylladb/scylladb:
  storage_service: handle_state_normal: always update_topology before update_normal_tokens
  storage_service: handle_state_normal: delete outdated comment regarding update pending ranges race
2022-12-13 13:41:10 +02:00
Avi Kivity
75e469193b Merge 'Use Host ID as Raft ID' from Kamil Braun
Thanks to #12250, Host IDs uniquely identify nodes. We can use them as Raft IDs which simplifies the code and makes reasoning about it easier, because Host IDs are always guaranteed to be present (while Raft IDs may be missing during upgrade).

Fixes: https://github.com/scylladb/scylladb/issues/12204

Closes #12275

* github.com:scylladb/scylladb:
  service/raft: raft_group0: take `raft::server_id` parameter in `remove_from_group0`
  gms, service: stop gossiping and storing RAFT_SERVER_ID
  Revert "gms/gossiper: fetch RAFT_SERVER_ID during shadow round"
  service: use HOST_ID instead of RAFT_SERVER_ID during replace
  service/raft: use gossiped HOST_ID instead of RAFT_SERVER_ID to update Raft address map
  main: use Host ID as Raft ID
2022-12-13 13:39:41 +02:00
Anna Stuchlik
7bc4385551 doc: specify the versions where Alternator TTL is no longer experimental 2022-12-13 11:25:24 +01:00
Andrii Patsula
cd2e786d72 Report a warning when a server's IP cannot be found in ping.
Fixes #12156
Closes #12206
2022-12-13 11:18:59 +01:00
Botond Dénes
51f867339e Merge 'Docs: cleanup add-node-to-cluster' from Benny Halevy
This series improves the add-node-to-cluster document, in particular around the documentation for the associated cleanup procedure, and the prerequisite steps.

It also removes information about outdated releases.

Closes #12210

* github.com:scylladb/scylladb:
  docs: operating-scylla: add-node-to-cluster: deleted instructions for unsupported releases
  docs: operating-scylla: add-node-to-cluster: cleanup: move tips to a note
  docs: operating-scylla: add-node-to-cluster: improve wording of cleanup instructions
  docs: operating-scylla: prerequisites: system_auth is a keyspace, not a table
  docs: operating-scylla: prerequisites: no Authetication status is gathered
  docs: operating-scylla: prerequisites: simplify grep commands
  docs: operating-scylla: add-node-to-cluster: prerequisites: number sub-sections
  docs: operating-scylla: add-node-to-cluster: describe other nodes in plural
2022-12-13 10:54:05 +02:00
Botond Dénes
4122854ae7 Merge 'repair: coroutinize repair_range' from Avi Kivity
Nicer and simpler, but essentially cosmetic.

Closes #12235

* github.com:scylladb/scylladb:
  repair: reindent repair_range
  repair: coroutinize repair_range
2022-12-13 08:16:05 +02:00
Avi Kivity
96890d4120 repair: to_repair_rows_list: reindent 2022-12-12 22:54:07 +02:00
Avi Kivity
e482cb1764 repair: to_repair_rows_list: coroutinize
Simplifying a complicated function. It will also be a
little faster due to fewer allocations, but not significantly.
2022-12-12 22:52:12 +02:00
Avi Kivity
c728de8533 sstables: update_info_for_opened_data: reindent
Recover much-needed indent levels for future use.
2022-12-12 22:38:07 +02:00
Avi Kivity
eace9a226c sstables: update_info_for_opened_data: coroutinize
Nothing special, just simplifying a complicated function.
2022-12-12 22:35:46 +02:00
Michał Jadwiszczak
5985f22841 version: Reverse version increase
Revert version change made by PR #11106, which increased it to `4.0.0`
to enable server-side describe on latest cqlsh.

Turns out that our tooling some way depends on it (eg. `sstableloader`)
and it breaks dtests.
Reverting only the version allows to leave the describe code unchanged
and it fixes the dtests.

cqlsh 6.0.0 will return a warning when running `DESC ...` commands.

Closes #12272
2022-12-12 18:45:32 +02:00
Kamil Braun
a26f62b37b service/raft: raft_group0: take raft::server_id parameter in remove_from_group0
We no longer need to translate from IP to Raft ID using the address map,
because Raft ID is now equal to the Host ID - which is always available
at the call site of `remove_from_group0`.
2022-12-12 15:23:05 +01:00
Kamil Braun
bf6679906f gms, service: stop gossiping and storing RAFT_SERVER_ID
It is equal to (if present) HOST_ID and no longer used for anything.

The application state was only gossiped if `experimental-features`
contained `raft`, so we can free this slot.

Similarly, `raft_server_id`s were only persisted in `system.peers` if
the `SUPPORTS_RAFT` cluster feature was enabled, which happened only
when `experimental-features` contained `raft`. The `raft_server_id`
field in the schema was also introduced recently in `master` and didn't
get to be in a release yet. Given either of these reasons, we can remove
this field safely.
2022-12-12 15:20:30 +01:00
Kamil Braun
5dbe236339 Revert "gms/gossiper: fetch RAFT_SERVER_ID during shadow round"
This reverts commit 60217d7f50.
We no longer need RAFT_SERVER_ID.
2022-12-12 15:20:20 +01:00
Kamil Braun
3e58da0719 service: use HOST_ID instead of RAFT_SERVER_ID during replace
Makes the code simpler because we can assume that HOST_ID is always
there.
2022-12-12 15:18:56 +01:00
Kamil Braun
32c56920b4 service/raft: use gossiped HOST_ID instead of RAFT_SERVER_ID to update Raft address map
With the earlier commit, if gossiped RAFT_SERVER_ID is not empty then
it's the same as HOST_ID.
2022-12-12 15:16:56 +01:00
Calle Wilund
e99626dc10 config: Change wording of "none" in encryption options to maybe reduce user confusion
Fixes /scylladb/scylla-enterprise/issues#1262

Changes the somewhat ambiguous "none" into "not set" to clarify that "none" is not an
option to be written out, but an absense of a choice (in which case you also have made
a choice).

Closes #12270
2022-12-12 16:14:53 +02:00
Kamil Braun
f3243ff674 main: use Host ID as Raft ID
The Host ID now uniquely identifies a node (we no longer steal it during
node replace) and Raft is still experimental. We can reuse the Host ID
of a node as its Raft ID. This will allow us to remove and simplify a
lot of code.

With this we can already remove some dead code in this commit.
2022-12-12 15:14:51 +01:00
Botond Dénes
d44c5f5548 scripts: add open-coredump.sh
Script for "one-click" opening of coredumps.
It extracts the build-id from the coredump, retrieves metadata for that
build, downloads the binary package, the source code and finally
launches the dbuild container, with everything ready to load the
coredump.
The script is idempotent: running it after the prepartory steps will
re-use what is already donwloaded.

The script is not trying to provide a debugging environment that caters
to all the different ways and preferences of debugging. Instead, it just
sets up a minimalistic environment for debugging, while providing
opportunities for the user to customization according to their
preferred.

I'm not entirely sure, coredumps from master branch will work, but we
can address this later when we confirm they don't.

Example:

    $ ~/ScyllaDB/scylla/worktree0/scripts/open-coredump.sh ./core.scylla.113.bac3650b616f4f09a4d1ab160574b6a5.4349.1669185225000000000000
    Build id: 5009658b834aaf68970135bfc84f964b66ea4dee
    Matching build is scylla-5.0.5 0.20221009.5a97a1060 release-x86_64
    Downloading relocatable package from http://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-5.0/scylla-x86_64-package-5.0.5.0.20221009.5a97a1060.tar.gz
    Extracting package scylla-x86_64-package-5.0.5.0.20221009.5a97a1060.tar.gz
    Cloning scylla.git
    Downloading scylla-gdb.py
    Copying scylla-gdb.py from /home/bdenes/ScyllaDB/storage/11961/open-coredump.sh.dir/scylla.repo
    Launching dbuild container.

    To examine the coredump with gdb:

        $ gdb -x scylla-gdb.py -ex 'set directories /src/scylla' --core ./core.scylla.113.bac3650b616f4f09a4d1ab160574b6a5.4349.1669185225000000000000 /opt/scylladb/libexec/scylla

    See https://github.com/scylladb/scylladb/blob/master/docs/dev/debugging.md for more information on how to debug scylla.

    Good luck!
    [root@fedora workdir]#

Closes #12223
2022-12-12 12:55:28 +02:00
Kamil Braun
dcba652013 Merge 'replacenode: do not inherit host_id' from Benny Halevy
We want to always be able to distinguish between
the replacing node and the replacee by using different,
unique, host identifiers.

This will allow us to use the host_id authoritatively
to identify the node (rather then its endpoint ip address)
for token mapping and node operations.

Also, it will be used in the following patch to never allow the
replaced node to rejoin the cluster, as its host_id should never
be reused.

This change does not affect #5523, the replaced node may still steal back its tokens if restarted.

Refs #9839
Refs #12040

Closes #12250

* github.com:scylladb/scylladb:
  docs: replace-dead-node: update host_id of replacing node
  docs: replace-dead-node: fix alignment
  db: system_keyspace: change set_local_host_id to private set_local_random_host_id
  storage_service: do not inherit the host_id of a replaced a node
2022-12-12 11:00:42 +01:00
Benny Halevy
c6f05b30e1 task_manager: task: impl: add virtual destructor
The generic task holds and destroyes a task::impl
but we want the derived class's destructor to be called
when the task is destroyed otherwise, for example,
member like abort_source subscription will not be destroyed
(and auto-unlinked).

Fixes #12183

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12266
2022-12-11 22:10:59 +02:00
Benny Halevy
36a9f62833 repair: repair_module: use mutable capture for func
It is moved into the async thread so the encapsulating
function should be defined mutable to move the func
rather thna copying it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12267
2022-12-11 22:10:28 +02:00
Nadav Har'El
0c26032e70 test/cql-pytest: translate more Cassandra tests
This patch includes a translation of two more test files from
Cassandra's CQL unit test directory cql3/validation/operations.

All tests included here pass on Cassandra. Several test fail on Scylla
and are marked "xfail". These failures discovered two previously-unknown
bugs:

    #12243: Setting USING TTL of "null" should be allowed
    #12247: Better error reporting for oversized keys during INSERT

And also added reproducers for two previously-known bugs:

    #3882: Support "ALTER TABLE DROP COMPACT STORAGE"
    #6447: TTL unexpected behavior when setting to 0 on a table with
           default_time_to_live

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12248
2022-12-11 21:42:57 +02:00
Nadav Har'El
09a3c63345 cross-tree: allow std::source_location in clang 14
We recently (commit 6a5d9ff261) started
to use std::source_location instead of std::experimental::source_location.
However, this does not work on clang 14, because libc++ 12's
<source_location> only works if __builtin_source_location, and that is
not available on clang 14.

clang 15 is just three months old, and several relatively-recent
distributions still carry clang 14 so it would be nice to support it
as well.

So this patch adds a trivial compatibility header file, which, when
included and compiled with clang 14, it aliases the functional
std::experimental::source_location to std::source_location.

It turns out it's enough to include the new header file from three
headers that included <source_location> -  I guess all other uses
of source_location depend on those header files directly or indirectly.
We may later need to include the compatibility header file in additional
places, bug for now we don't.

Refs #12259

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12265
2022-12-11 20:28:49 +02:00
Avi Kivity
e6ffc22053 Merge 'cql3: Server-side DESC statement' from Michał Jadwiszczak
This PR adds server-side `DESCRIBE` statement, which is required in latest cqlsh version.

The only change from the user perspective is the `DESC ...` statement can be used with cqlsh version >= 6.0. Previously the statement was executed from client side, but starting with Cassandra 4.0 and cqlsh 6.0, execution of describe was moved to server side, so the user was unable to do `DESC ...` with Scylla and cqlsh 6.0.

Implemented describe statements:
- `DESC CLUSTER`
- `DESC [FULL] SCHEMA`
- `DESC [ONLY] KEYSPACE`
- `DESC KEYSPACES/TYPES/FUNCTIONS/AGGREGATES/TABLES`
- `DESC TYPE/FUNCTION/AGGREGATE/MATERIALIZED VIEW/INDEX/TABLE`
- `DESC`

[Cassandra's implementation for reference](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/DescribeStatement.java)

Changes in this patch:
- cql3::util: added `single_quite()` function
- added `data_dictionary::keyspace_element` interface
- implemented `data_dictionary::keyspace_element` for:
    - keyspace_metadata,
    - UDT, UDF, UDA
    - schema
- cql3::functions: added `get_user_functions()` and `get_user_aggregates()` to get all UDFs/UDAs in specified keyspace
- data_dictionary::user_types_metadata: added `has_type()` function
- extracted `describe_ring()` from storage_service to standalone helper function in `locator/util.hh`
- storage_proxy: added `describe_ring()` (implemented using helper function mentioned above)
- extended CQL grammar to handle describe statement
- increased version in `version.hh` to 4.0.0, so cqlsh will use server-side describe statement

Referring: https://github.com/scylladb/scylla/issues/9571, https://github.com/scylladb/scylladb/issues/11475

Closes #11106

* github.com:scylladb/scylladb:
  version: Increasing version
  cql-pytest: Add tests for server-side describe statement
  cql-pytest: creating random elements for describe's tests
  cql3: Extend CQL grammar with server-side describe statement
  cql3:statements: server-side describe statement
  data_dictonary: add `get_all_keyspaces()` and `get_user_keyspaces()`
  storage_proxy: add `describe_ring()` method
  storage_service, locator: extract describe_ring()
  data_dictionary:user_types_metadata: add has_type() function
  cql3:functions: `get_user_functions()` and `get_user_aggregates()`
  implement `keyspace_element` interface
  data_dictionary: add `keyspace_element` interface
  cql3: single_quote() util function
  view: row_lock: lock_ck: reindent
  test/topology: enable replace tests
  service/raft: report an error when Raft ID can't be found in `raft_group0::remove_from_group0`
  service: handle replace correctly with Raft enabled
  gms/gossiper: fetch RAFT_SERVER_ID during shadow round
  service: storage_service: sleep 2*ring_delay instead of BROADCAST_INTERVAL before replace
2022-12-11 18:29:36 +02:00
Michał Jadwiszczak
8d88c9721e version: Increasing version
The `current()` version in version.hh has to be increased to at
least 4.0.0, so server-side describe will be used. Otherwise,
cqlsh returns warning that client-side describe is not supported.
2022-12-10 12:51:05 +01:00
Michał Jadwiszczak
3ddde7c5ad cql-pytest: Add tests for server-side describe statement 2022-12-10 12:51:05 +01:00
Michał Jadwiszczak
f91d05df43 cql-pytest: creating random elements for describe's tests
Add helper functions to create random elements (keyspaces, tables, types)
to increase the coverage of describe statment's tests.

This commit also adds `random_seed` fixture. The fixture should be
always used when using random functions. In case of test's failure, the
seed will be present in test's signature and the case can be easili
recreated.
After the test finishes, the fixture restores state of `random` to
before-test state.
2022-12-10 12:51:05 +01:00
Michał Jadwiszczak
c563b2133c cql3: Extend CQL grammar with server-side describe statement 2022-12-10 12:51:05 +01:00
Michał Jadwiszczak
e572d5f111 cql3:statements: server-side describe statement
Starting from cqlsh 6.0.0, execution of the describe statement was moved
from the client to the server.

This patch implements server-side describe statement. It's done by
simply fetching all needed keyspace elements (keyspace/table/index/view/UDT/UDF/UDA)
and generating the desired description or list of names of all elements.
The description of any element has to respect CQL restrictions(like
name's quoting) to allow quickly recreate the schema by simply copy-pasting the descritpion.
2022-12-10 12:51:05 +01:00
Michał Jadwiszczak
673393d88a data_dictonary: add get_all_keyspaces() and get_user_keyspaces()
Adds functions to `data_dictionary::database` in order to obtain names
of all keyspaces/all user keyspaces.
2022-12-10 12:51:05 +01:00
Michał Jadwiszczak
360dbf98f1 storage_proxy: add describe_ring() method
In order to execute `DESC CLUSTER`, there has to be a way to describe
ring. `storage_service` is not available at query execution. This patch
adds `describe_ring()` as a method of `storage_proxy()` (using helper
function from `locator/util.hh`).
2022-12-10 12:51:05 +01:00
Michał Jadwiszczak
dd46a92e23 storage_service, locator: extract describe_ring()
`describe_ring()` was implemented as a method of `storage_service`. This
patch extracts it from there to a standalone helper function in
`locator/util.hh`.
2022-12-10 12:51:05 +01:00
Michał Jadwiszczak
51a02e3bd7 data_dictionary:user_types_metadata: add has_type() function
Adds `has_type()` function to `user_types_metadata`. The functions
determins whether UDT with given name exists.
2022-12-10 12:50:52 +01:00
Michał Jadwiszczak
06cd03d3cd cql3:functions: get_user_functions() and get_user_aggregates()
Helper functions to obtain UDFs/UDAs for certain keyspace.
2022-12-10 12:36:59 +01:00
Michał Jadwiszczak
29ad5a08a8 implement keyspace_element interface
This patch implements `data_dictionary::keyspace_element` interfece
in: `keyspace_metadata`, `user_type_impl`, `user_function`,
`user_aggregate` and schema.
2022-12-10 12:34:09 +01:00
Michał Jadwiszczak
f30378819d data_dictionary: add keyspace_element interface
A common interace for all keyspace elements, which are:
keyspace, UDT, UDF, UDA, tables, views, indexes.
The interface is to have a unified way to describe those elements.
2022-12-10 12:27:38 +01:00
Michał Jadwiszczak
0589116991 cql3: single_quote() util function
`single_quote()` takes a string and transforms it to a string
which can be safely used in CQL commands.
Single quoting involves wrapping the name in single-quotes ('). A sigle-quote
character itself is quoted by doubling it.
Single quoting is necessary for dates, IP addresses or string literals.
2022-12-10 12:27:22 +01:00
Benny Halevy
9c2a5a755f view: row_lock: lock_ck: reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-10 12:27:22 +01:00
Kamil Braun
c43e64946a test/topology: enable replace tests
Also add some TODOs for enhancing existing tests.
2022-12-10 12:27:22 +01:00
Kamil Braun
b01cba8206 service/raft: report an error when Raft ID can't be found in raft_group0::remove_from_group0
Also simplify the code and improve logging in general.

The previous code did this: search for the ID in the address map. If it
couldn't be found, perform a read barrier and search again. If it again
couldn't be found, return.

This algorithm depended on the fact that IP addresses were stored in
group 0 configuration. The read barrier was used to obtain the most
recent configuration, and if the IP was not a part of address map after
the read barrier, that meant it's simply not a member of group 0.

This logic no longer applies so we can simplify the code.

Furthermore, when I was fixing the replace operation with Raft enabled,
at some point I had a "working" solution with all tests passing. But I
was suspicious and checked if the replaced node got removed from
group 0. It wasn't. So the replace finished "successfully", but we had
an additional (voting!) member of group 0 which didn't correspond to
a token ring member.

The last version of my fixes ensure that the node gets removed by the
replacing node. But the system is fragile and nothing prevents us from
breaking this again. At least log an error for now. Regression tests
will be added later.
2022-12-10 12:27:22 +01:00
Kamil Braun
c65f4ae875 service: handle replace correctly with Raft enabled
We must place the Raft ID obtained during the shadow round in the
address map. It won't be placed by the regular gossiping route if we're
replacing using the same IP, because we override the application state
of the replaced node. Even if we replace a node with a different IP, it
is not guaranteed that background gossiping manages to update the
address map before we need it, especially in tests where we set
ring_delay to 0 and disable wait_for_gossip_to_settle. The shadow round,
on the other hand, performs a synchronous request (and if it fails
during bootstrap, bootstrap will fail - because we also won't be able to
obtain the tokens and Host ID of the replaced node).

Fetch the Raft ID of the replaced node in `prepare_replacement_info`,
which runs the shadow round. Return it in `replacement_info`. Then
`join_token_ring` passes it to `setup_group0`, which stores it in the
address map. It does that after `join_group0` so the entry is
non-expiring (the replaced node is a member of group 0). Later in the
replace procedure, we call `remove_from_group0` for the replaced node.
`remove_from_group0` will be able to reverse-translate the IP of the
replaced node to its Raft ID using the address map.
2022-12-10 12:27:22 +01:00
Kamil Braun
60217d7f50 gms/gossiper: fetch RAFT_SERVER_ID during shadow round
During the replace operation we need the Raft ID of the replaced node.
The shadow round is used for fetching all necessary information before
the replace operation starts.
2022-12-10 12:27:22 +01:00
Kamil Braun
b424cc40fa service: storage_service: sleep 2*ring_delay instead of BROADCAST_INTERVAL before replace
Most of the sleeps related to gossiping are based on `ring_delay`,
which is configurable and can be set to lower value e.g. during tests.

But for some reason there was one case where we slept for a hardcoded
value, `service::load_broadcaster::BROADCAST_INTERVAL` - 60 seconds.

Use `2 * get_ring_delay()` instead. With the default value of
`ring_delay` (30 seconds) this will give the same behavior.
2022-12-10 12:27:22 +01:00
Anna Stuchlik
8d1050e834 docs: replace Scylla with ScyllaDB on the Snitches page 2022-12-09 13:34:18 +01:00
Anna Stuchlik
5cb191d5b0 docs: fix the headings on the Snitches page 2022-12-09 13:26:36 +01:00
Anna Stuchlik
a699904374 doc: add the description of AzureSnitch to the documentation 2022-12-09 13:22:01 +01:00
Nadav Har'El
e47794ed98 test/cql-pytest: regression test for index scan with start token
When we have a table with partition key p and an indexed regular column
v, the test included in this patch checks the query

     SELECT p FROM table WHERE v = 1 AND TOKEN(p) > 17

This can work and not require ALLOW FILTERING, because the secondary index
posting-list of "v=1" is ordered in p's token order (to allow SELECT with
and without an index to return the same order - this is explained in
issue #7443). So this test should pass, and indeed it does on both current
Scylla, and Cassandra.

However, it turns out that this was a bug - issue #7043 - in older
versions of Scylla, and only fixed in Scylla 4.6. In older versions,
the SELECT wasn't accepted, claiming it requires ALLOW FILTERING,
and if ALLOW FILTERING was added, the TOKEN(p) > 17 part was silently
ignored.

The fix for issue #7043 actually included regression tests, C++ tests in
test/boost/secondary_index_test.cc. But in this patch we also add a Python
test in test/cql-pytest.

One of the benefits of cql-pytest is that we can (and I did) run the same
test on Cassandra to verify we're not implementing a wrong feature.
Another benefit is that we can run a new test on an old version, and
not even require re-compilation: You can run this new test on any
existing installation of Scylla to check if it still has issue #7043.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12237
2022-12-09 09:33:16 +02:00
Benny Halevy
018dedcc0c docs: replace-dead-node: update host_id of replacing node
The replacing node no longer assumes the host_id
of the replacee.  It will continue to use a random,
unique host_id.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-09 08:23:31 +02:00
Benny Halevy
37d75e5a21 docs: replace-dead-node: fix alignment 2022-12-09 08:23:31 +02:00
Benny Halevy
89920d47d6 db: system_keyspace: change set_local_host_id to private set_local_random_host_id
Now that the local host_id is never changed externally
(by the storage_service upon replace-node),
the method can be made private and be used only for initializing the
local host_id to a random one.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-09 08:23:31 +02:00
Benny Halevy
9942c60d93 storage_service: do not inherit the host_id of a replaced a node
We want to always be able to distinguish between
the replacing node and the replacee by using different,
unique, host identifiers.

This will allow us to use the host_id authoritatively
to identify the node (rather then its endpoint ip address)
for token mapping and node operations.

Also, it will be used in the following patch to never allow the
replaced node to rejoin the cluster, as its host_id should never
be reused.

Refs #9839
Refs #12040

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-09 08:23:31 +02:00
Pavel Emelyanov
7197757750 broadcast_tables: Forward-declare storage_proxy in lang.hh
Currently the header includes storage_proxy.hh and spreads this over the
code via raft_group0_client.hh -> group0_state_machine.hh -> lang.hh

Forward declaring proxy class it eliminates ~100 indirect dependencies on
storage_proxy.hh via this chain.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12241
2022-12-09 01:23:51 +02:00
Pavel Emelyanov
6075e01312 test/lib: Remove sstable_utils.hh from simple_schema.hh
The latter is pretty popular test/lib header that disseminates the
former one over whole lot of unit tests. The former, in turn, naturally
includes sstables.hh thus making tons of unrelated tests depend on
sstables class unused by them.

However, simple removal doesn't work, becase of local_shard_only bool
class definition in sstable_utils.hh used in simple_schema.hh. This
thing, in turn, is used in keys making helpers that don't belong to
sstable utils, so these are moved into simple_schema as well.

When done, this affects the mutation_source_test.hh, which needs the
local_shard_only bool class (and helps spreading the sstables.hh
throughout more unrelated tests) and a bunch of .cc test sources that
used sstable_utils.hh to indirectly include various headers of their
demand.

After patching, sstables.hh touches 2x times less tests. As a side
effect the sstables_manager.hh also becomes 2x times less dependent
on by tests.

Continuation of 9bdea110a6

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12240
2022-12-08 15:37:33 +02:00
Tomasz Grabiec
4e7ddb6309 position_in_partition: Introduce before_key(position_in_partition_view) 2022-12-08 13:41:28 +01:00
Tomasz Grabiec
536c0ab194 db: Fix trim_clustering_row_ranges_to() for non-full keys and reverse order
trim_clustering_row_ranges_to() is broken for non-full keys in reverse
mode. It will trim the range to
position_in_partition_view::after_key(full_key) instead of
position_in_partition_view::before_key(key), hence it will include the
key in the resulting range rather than exclude it.

Fixes #12180
Refs #1446
2022-12-08 13:41:28 +01:00
Tomasz Grabiec
232ce699ab types: Fix comparison of frozen sets with empty values
A frozen set can be part of the clustering key, and with compact
storage, the corresponding key component can have an empty value.

Comparison was not prepared for this, the iterator attempts to
deserialize the item count and will fail if the value is empty.

Fixes #12242
2022-12-08 13:41:11 +01:00
Nadav Har'El
4cdaba778d Merge 'Secondary indexes on static columns' from Piotr Dulikowski
This pull request introduces support for global secondary indexes based on static columns.

Local secondary indexes based on secondary columns are not planned to be supported and are explicitly forbidden. Because there is only one static row per partition and local indexes require full partition key when querying, such indexes wouldn't be very useful and would only waste resources.

The index table for secondary indexes on static columns, unlike other secondary indexes, do not contain clustering keys from the base table. A static column's value determines a set of full partitions, so the clustering keys would only be unnecessary.

The already existing logic for querying using secondary indexes works after introducing minimal notifications. The view update generation path now works on a common representation of static and clustering rows, but the new representation allowed to keep most of the logic intact.

New cql-pytests are added. All but one of the existing tests for secondary indexes on static columns - ported from Cassandra - now work and have their `xfail` marks lifted; the remaining test requires support for collection indexing, so it will start working only after #2962 is fixed.

Materialized view with static rows as a key are __not__ implemented in this PR.

Fixes: #2963

Closes #11166

* github.com:scylladb/scylladb:
  test_materialized_view: verify that static columns are not allowed
  test_secondary_index: add (currently failing) test for static index paging
  test_secondary_index: add more tests for secondary indexes on static columns
  cassandra_tests: enable existing tests for static columns
  create_index_statement: lift restriction on secondary indexes on static rows
  db/view: fetch and process static rows when building indexes
  gms/feature_service: introduce SECONDARY_INDEXES_ON_STATIC_COLUMNS cluster feature
  create_index_statement: disallow creation of local indexes with static columns
  select_statement: prepare paging for indexes on static columns
  select_statement: do not attempt to fetch clustering columns from secondary index's table
  secondary_index_manager: don't add clustering key columns to index table of static column index
  replica/table: adjust the view read-before-write to return static rows when needed
  db/view: process static rows in view_update_builder::on_results
  db/view: adjust existing view update generation path to use clustering_or_static_row
  column_computation: adjust to use clustering_or_static_row
  db/view: add clustering_or_static_row
  deletable_row: add column_kind parameter to is_live
  view_info: adjust view_column to accept column_kind
  db/view: base_dependent_view_info: split non-pk columns into regular and static
2022-12-08 09:54:05 +02:00
Konstantin Osipov
02c30ab5d6 build: fix link error (abseil) on ubuntu toolchain with clang 15
abseil::hash depends on abseil::city and declareds CityHash32
as an external symbol. The city library static library, however,
precedes hash in the link list, which apparently makes the linker
simply drop it from the object list, since its symbols are not
used elsewhere.

Fix the linker ordering to help linker see that CityHash32
is used.

Closes #12231
2022-12-08 09:47:16 +02:00
Avi Kivity
d6457778f1 Merge 'Coroutinize some table functions in preparation to static compaction groups' from Raphael "Raph" Carvalho
Extracted from https://github.com/scylladb/scylladb/pull/12139

Closes #12236

* github.com:scylladb/scylladb:
  replica: table: Fix indentation
  replica: coroutinize table::discard_sstables()
  replica: Coroutinize table::flush()
2022-12-08 09:29:58 +02:00
Piotr Dulikowski
4883e43677 test_materialized_view: verify that static columns are not allowed
Adds a test which verifies that static columns are not allowed in
materialized views. Although we added support for static columns in
secondary indexes, which share a lot of code with materialized views,
static columns in materialized views are not yet ready to use.
2022-12-08 07:41:33 +01:00
Piotr Dulikowski
f864944dcb test_secondary_index: add (currently failing) test for static index paging
Currently, when executing queries accelerated by an index on a static
column, paging is unable to break base table partitions across pages and
is forced to return them in whole. This will cause problems if such a
query must return a very large base table partition because it will have
to be loaded into memory.

Fixing this issue will require a more sophisticated approach than what
was done in the PR. For the time being, an xfailing pytest is added
which should start passing after paging is improved.
2022-12-08 07:41:33 +01:00
Piotr Dulikowski
4f836115fd test_secondary_index: add more tests for secondary indexes on static columns
Adds cql-pytests which test the secondary index on static columns
feature.
2022-12-08 07:41:32 +01:00
Botond Dénes
897b501ba3 Merge 'doc: update the 5.1 upgrade guide with the mode-related information' from Anna Stuchlik
This PR adds the link to the KB article about updating the mode after the upgrade to the 5.1 upgrade guide.
In addition, I have:
- updated the KB article to include the versions affected by that change.
- fixed the broken link to the page about metric updates (it is not related to the KB article, but I fixed it in the same PR to limit the number of PRs that need to be backported).

Related: https://github.com/scylladb/scylladb/pull/11122

Closes #12148

* github.com:scylladb/scylladb:
  doc: update the releases in the KB about updating the mode after upgrade
  doc: fix the broken link in the 5.1 upgrade guide
  doc: add the link to the 5.1-related KB article to the 5.1 upgrade guide
2022-12-08 07:32:10 +02:00
Tomasz Grabiec
992a73a861 row_cache: Destroy coroutine under region's allocator
The reason is alloc-dealloc mismatch of position_in_partition objects
allocated by cursors inside coroutine object stored in the update
variable in row_cache::do_update()

It is allocated under cache region, but in case of exception it will
be destroyed under the standard allocator. If update is successful, it
will be cleared under region allocator, so there is not problem in the
normal case.

Fixes #12068

Closes #12233
2022-12-07 21:44:21 +02:00
Raphael S. Carvalho
9ae0d8ba28 replica: table: Fix indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-07 15:53:22 -03:00
Raphael S. Carvalho
b9a33d5a91 replica: coroutinize table::discard_sstables()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-07 15:52:36 -03:00
Raphael S. Carvalho
192b64a5ac replica: Coroutinize table::flush()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-07 15:52:27 -03:00
Benny Halevy
a076ceef97 view: row_lock: lock_ck: reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-07 19:27:30 +02:00
Avi Kivity
909fbfdd2f repair: reindent repair_range 2022-12-07 18:17:21 +02:00
Avi Kivity
796ec5996f repair: coroutinize repair_range 2022-12-07 18:13:10 +02:00
Benny Halevy
78c5961114 docs: operating-scylla: add-node-to-cluster: deleted instructions for unsupported releases
2.3 and 2018.1 ended their life and are long gone.
No need to have instructions for them in the master version of this
document.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-07 17:07:35 +02:00
Benny Halevy
adeb03e60f docs: operating-scylla: add-node-to-cluster: cleanup: move tips to a note
And be more verbose about why the tips are recommended and their
ramifications.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-07 17:07:18 +02:00
Benny Halevy
6e324137bd docs: operating-scylla: add-node-to-cluster: improve wording of cleanup instructions
"use `nodetool cleanup` cleanup command" repeats words, change to
"run the `nodetool cleanup` command".

Also, improve the description of the cleanup action
and how it relate to the bootstrapping process.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-07 17:07:08 +02:00
Benny Halevy
eeed330647 docs: operating-scylla: prerequisites: system_auth is a keyspace, not a table
Fix the phrase referring to it as a table respectively.
Also, do some minor phrasing touch-ups in this area.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-07 17:06:54 +02:00
Benny Halevy
5d840d4232 docs: operating-scylla: prerequisites: no Authetication status is gathered
Authetication status isn't gathered from scylla.yaml,
only the authenticator, so change the caption respectively.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-07 17:06:48 +02:00
Benny Halevy
9cb7056d3e docs: operating-scylla: prerequisites: simplify grep commands
Writing `cat X | grep Y` is both inefficient and somewhat
unprofessional.  The grep command works very well on a file argument
so `grep Y X` will do the job perfectly without the need for a pipe.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-07 17:06:36 +02:00
Benny Halevy
71bc12eecc docs: operating-scylla: add-node-to-cluster: prerequisites: number sub-sections
To improve their readability.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-07 17:06:35 +02:00
Benny Halevy
16db7bea82 docs: operating-scylla: add-node-to-cluster: describe other nodes in plural
Typically data will be streamed from multiple existing nodes
to the new node, not from a single one.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-12-07 17:03:23 +02:00
Tomasz Grabiec
a46b2e4e4c Merge 'Make node replace procedure work with Raft' from Kamil Braun
We need to obtain the Raft ID of the replaced node during the shadow round and
place it in the address map. It won't be placed by the regular gossiping route
if we're replacing using the same IP, because we override the application state
of the replaced node. Even if we replace a node with a different IP, it is not
guaranteed that background gossiping manages update the address map before we
need it, especially in tests where we set ring_delay to 0 and disable
wait_for_gossip_to_settle. The shadow round, on the other hand, performs a
synchronous request (and if it fails during bootstrap, bootstrap will fail -
because we also won't be able to obtain the tokens and Host ID of the replaced
node).

Fetch the Raft ID of the replaced node in `prepare_replacement_info`,
which runs the shadow round. Return it in `replacement_info`. Then
`join_token_ring` passes it to `setup_group0`, which stores it in the
address map. It does that after `join_group0` so the entry is
non-expiring (the replaced node is a member of group 0). Later in the
replace procedure, we call `remove_from_group0` for the replaced node.
`remove_from_group0` will be able to reverse-translate the IP of the
replaced node to its Raft ID using the address map.

Also remove an unconditional 60 seconds sleep from the replace code. Make it
dependent on ring_delay.

Enable the replace tests.

Modify some code related to removing servers from group 0 which depended on
storing IP addresses in the group 0 configuration.

Closes #12172

* github.com:scylladb/scylladb:
  test/topology: enable replace tests
  service/raft: report an error when Raft ID can't be found in `raft_group0::remove_from_group0`
  service: handle replace correctly with Raft enabled
  gms/gossiper: fetch RAFT_SERVER_ID during shadow round
  service: storage_service: sleep 2*ring_delay instead of BROADCAST_INTERVAL before replace
2022-12-07 15:30:27 +01:00
Pavel Emelyanov
9bdea110a6 code: Reduce fanout of sstables(_manager)?.hh over headers
This change removes sstables.hh from some other headers replacing it
with version.hh and shared_sstable.hh. Also this drops
sstables_manager.hh from some more headers, because this header
propagates sstables.hh via self. That change is pretty straightforward,
but has a recochet in database.hh that needs disk-error-handler.hh.

Without the patch touch sstables/sstable.hh results in 409 targets
recompillation, with the patch -- 299 targets.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12222
2022-12-07 14:34:19 +02:00
Botond Dénes
57a4971962 Merge 'dirty_memory_manager: tidy up' from Avi Kivity
Tidy up namespaces, move code to the right file, and
move the whole thing to the replica module where it
belongs.

Closes #12219

* github.com:scylladb/scylladb:
  dirty_memory_manager: move implementaton from database.cc
  dirty_memory_manager: move to replica module
  test: dirty_memory_manager_test: disambiguate classes named 'test_region_group'
  dirty_memory_manager: stop using using namespace
2022-12-07 14:25:59 +02:00
Avi Kivity
f7f5700289 dirty_memory_manager: move implementaton from database.cc
A few leftover method implementations were left in database.cc
when dirty_memory_manager.cc was created, move them to their
correct place now.
2022-12-06 22:28:54 +02:00
Avi Kivity
444de2831e dirty_memory_manager: move to replica module
It's a replica-side thing, so move it there. The related
flush_permit and sstable_write_permit are moved alongside.
2022-12-06 22:24:17 +02:00
Avi Kivity
a038a35ad6 test: dirty_memory_manager_test: disambiguate classes named 'test_region_group'
There are two similarly named classes: ::test_region_group and
dirty_memory_manager_logalloc::test_region_group. Rename the
former to ::raii_region_group (that's what it's for) and the
latter to ::test_region_group, to reduce confusion.
2022-12-06 22:20:38 +02:00
Avi Kivity
dfdae5ffa9 dirty_memory_manager: stop using using namespace
`using namespace` is pretty bad, especially in a header, as it
pollutes the namespace for everyone. Stop using it and qualify
names instead.
2022-12-06 21:37:38 +02:00
Avi Kivity
47a8fad2a2 Merge 'scylla-types: add serialize action' from Botond Dénes
Serializes the value that is an instance of a type. The opposite of `deserialize` (previously known as `print`).
All other actions operate on serialized values, yet up to now we were missing a way to go from human readable values to serialized ones. This prevented for example using `scylla types tokenof $pk` if one only had the human readable key value.
Example:

```
$ scylla types serialize -t Int32Type -- -1286905132
b34b62d4
$ scylla types serialize --prefix-compound -t TimeUUIDType -t Int32Type -- d0081989-6f6b-11ea-0000-0000001c571b 16
0010d00819896f6b11ea00000000001c571b000400000010
$ scylla types serialize --prefix-compound -t TimeUUIDType -t Int32Type -- d0081989-6f6b-11ea-0000-0000001c571b
0010d00819896f6b11ea00000000001c571b
```

Closes #12029

* github.com:scylladb/scylladb:
  docs: scylla-types.rst: add mention of per-operation --help
  tools/scylla-types: add serialize operation
  tools/scylla-types: prepare for action handlers with string arguments
  tools/scylla-types: s/print/deserialize/ operation
  docs: scylla-types.rst: document tokenof and shardof
  docs: scylla-types.rst: fix typo in compare operation description
2022-12-06 19:27:15 +02:00
Nadav Har'El
f275bfd57b Update CODEOWNERS file
Update the CODEOWNERS file with some people who joined different parts
of the project, and one person that left.

Note that despite is name, CODEOWNERS does not list "ownership" in any
strict sense of the word - it is more about who is willing and/or
knowledgeable enough to participate in reviewing changes to particular
files or directories. Github uses this file to automatically suggest
who should review a pull request.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12216
2022-12-06 19:26:03 +02:00
Benny Halevy
5007ded2c1 view: row_lock: lock_ck: serialize partition and row locking
The problematic scenario this patch fixes might happen due to
unfortunate serialization of locks/unlocks between lock_pk and lock_ck,
as follows:

    1. lock_pk acquires an exclusive lock on the partition.
    2.a lock_ck attempts to acquire shared lock on the partition
        and any lock on the row. both cases currently use a fiber
        returning a future<rwlock::holder>.
    2.b since the partition is locked, the lock_partition times out
        returning an exceptional future.  lock_row has no such problem
        and succeeds, returning a future holding a rwlock::holder,
        pointing to the row lock.
    3.a the lock_holder previously returned by lock_pk is destroyed,
        calling `row_locker::unlock`
    3.b row_locker::unlock sees that the partition is not locked
        and erases it, including the row locks it contains.
    4.a when_all_succeeds continuation in lock_ck runs.  Since
        the lock_partition future failed, it destroyes both futures.
    4.b the lock_row future is destroyed with the rwlock::holder value.
    4.c ~holder attempts to return the semaphore units to the row rwlock,
        but the latter was already destroyed in 3.b above.

Acquiring the partition lock and row lock in parallel
doesn't help anything, but it complicates error handling
as seen above,

This patch serializes acquiring the row lock in lock_ck
after locking the partition to prevent the above race.

This way, erasing the unlocked partition is never expected
to happen while any of its rows locks is held.

Fixes #12168

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12208
2022-12-06 16:29:46 +02:00
Botond Dénes
f017e9f1c6 docs: document the reader concurrency semaphore diagnostics dump
The diagnostics dumped by the reader concurrency semaphore are pretty
common-sight in logs, as soon as a node becomes problematic. The reason
is that the reader concurrency semaphore acts as the canary in the coal
mine: it is the first that starts screaming when the node or workload is
unhealthy. This patch adds documentation of the content of the
diagnostics and how to diagnose common problems based on it.

Fixes: #10471

Closes #11970
2022-12-06 16:24:44 +02:00
Botond Dénes
c35cee7e2b docs: scylla-types.rst: add mention of per-operation --help 2022-12-06 14:47:28 +02:00
Botond Dénes
4f9799ce4f tools/scylla-types: add serialize operation
Takes human readable values and converts them to serialized hex encoded
format. Only regular atomic types are supported for now, no
collection/UDT/tuple support, not even in frozen form.
2022-12-06 14:46:53 +02:00
Botond Dénes
7c87655b4b tools/scylla-types: prepare for action handlers with string arguments
Currently all action handlers have bytes arguments, parsed from
hexadecimal string representations. We plan on adding a serialize
command which will require raw string arguments. Prepare the
infrastructure for supporting both types of action handlers.
2022-12-06 14:45:30 +02:00
Botond Dénes
15452730fb tools/scylla-types: s/print/deserialize/ operation
Soon we will have a serialize operation. Rename the current print
operation to deserialize in preparation to that. We want the two
operations (serialize and deserialize) to reflect their relation in
their names too.
2022-12-06 14:45:30 +02:00
Botond Dénes
f98e6552b4 docs: scylla-types.rst: document tokenof and shardof
These new actions were added recently but without the accompanying
documentation change. Make up for this now.
2022-12-06 14:45:30 +02:00
Botond Dénes
30c047cae6 docs: scylla-types.rst: fix typo in compare operation description 2022-12-06 14:45:23 +02:00
Piotr Dulikowski
680423ad9d cassandra_tests: enable existing tests for static columns
Removes the "xfail" marker from the now-passing tests related to
secondary indexes on static columns.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
cc3af3190d create_index_statement: lift restriction on secondary indexes on static rows
Secondary indexes on static columns should work now. This commit lifts
the existing restriction after the cluster is fully upgraded to a
version which supports such indexes.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
86dad30b66 db/view: fetch and process static rows when building indexes
This commit modifies the view builder and its consumer so that static
rows are always fetched and properly processed during view build.

Currently, the view builder will always fetch both static and clustering
rows, regardless of the type of indexes being built. For indexes on
static columns this is wasteful and could be improved so that only the
types of rows relevant to indexes being built are fetched - however,
doing this sounds a bit complicated and I would rather start with
something simpler which has a better chance of working.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
25fec0acce gms/feature_service: introduce SECONDARY_INDEXES_ON_STATIC_COLUMNS cluster feature
The new feature will prevent secondary indexes on static columns from
being created unless the whole cluster is ready to support them.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
9f14f0ac09 create_index_statement: disallow creation of local indexes with static columns
Local indexes on static columns don't make sense because there is only
one static row per partition. It's always better to just run SELECT
DISTINCT on the base table. Allowing for such an index would only make
such queries slower (due to double lookup), would take unnecessary space
and could pose potential consistency problems, so this commit explicitly
forbids them.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
8c4cdfc2db select_statement: prepare paging for indexes on static columns
When performing a query on a table which is accelerated by a secondary
index, the paging state returned along with the query contains a
partition key and a clustering key of the secondary index table. The
logic wasn't prepared to handle the case of secondary indexes on static
columns - notably, it tried to put base table's clustering key columns
into the paging state which caused problems in other places.

This commit fixes the paging logic so that the PK and CK of a secondary
index table is calculated correctly. However, this solution has a major
drawback: because it is impossible to encode clustering key of the base
table in the paging state, partitions returned by queries accelerated by
secondary indexes on static columns will _not_ be split by paging. This
can be problematic in case there are large partitions in the base table.

The main advantage of this fix is that it is simple. Moreover, the
problem described above is not unique to static column indexes, but also
happens e.g. in case of some indexes on clustering columns (see case 2
of scylladb/scylla#7432). Fixing this issue will require a more
sophisticated solution and may affect more than only secondary indexes
on static columns, so this is left for a followup.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
ba390072c5 select_statement: do not attempt to fetch clustering columns from secondary index's table
The previous commit made sure that the index table for secondary indexes
on static tables don't have columns corresponding to clustering rows in
the base table - therefore, we must make sure that we don't try to fetch
them when querying the index table.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
983b440a81 secondary_index_manager: don't add clustering key columns to index table of static column index
The implementation of secondary indexes on static columns relies on the
fact that the index table only includes partition key columns of the
base table, but not clustering key columns. A static column's value
determines a set of full partitions, so including the clustering key
would only be redundant. It would also generate more work as a single
static column update would require a large portion of the index to be
updated.

This commit makes sure that clustering columns are not included in the
index table for indexes based on a static column.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
6ab41d76e6 replica/table: adjust the view read-before-write to return static rows when needed
Adjusts the read-before-write query issued in
`table::do_push_view_replica_updates` so that, when needed, requests
static columns and makes sure that the static row is present.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
18be90b1e6 db/view: process static rows in view_update_builder::on_results
The `view_update_builder::on_results()` function is changed to react to
static rows when comparing read-before-write results with the base table
mutation.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
2dd95d76f1 db/view: adjust existing view update generation path to use clustering_or_static_row
The view update path is modified to use `clustering_or_static_row`
instead of just `clustering_row`.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
b0a31bb7a7 column_computation: adjust to use clustering_or_static_row
Adjusts the column_computation interface so that it is able to accept
both clustering and static rows through the common
db::view::clustering_or_static_row interface.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
986ab6034c db/view: add clustering_or_static_row
Adds a `clustering_or_static_row`, which is a common, immutable
representation of either a static or clustering row. It will allow to
handle view update generation based on static or clustering rows in a
uniform way.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
05d4328f02 deletable_row: add column_kind parameter to is_live
While deletable_row is used to hold regular columns of a clustering row,
its name or implementation doesn't suggest that it is a requirement. In
fact, some of its methods already take a column_kind parameter which is
used to interpret the kind of columns held in the row.

This commit removes the assumption about the column kind from the
`deletable_row::is_live` method.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
27c81432cd view_info: adjust view_column to accept column_kind
The `view_info::view_column()` and `view_column` in view.cc allow to get
a view's column definition which corresponds to given base table's
column. They currently assume that the given column id corresponds to a
regular column. In preparation for secondary indexes based on static
columns, this commit adjusts those functions so that they accept other
kinds of columns, including static columns.
2022-12-06 11:21:16 +01:00
Piotr Dulikowski
f7b7724eaf db/view: base_dependent_view_info: split non-pk columns into regular and static
Currently, `base_dependent_view_info::_base_non_pk_columns_in_view_pk`
field keeps a list of non-primary-key columns from the base table which
are a part of the view's primary key. Because the current code does not
allow indexes on static columns yet, the columns kept in the
aforementioned field are always assumed to be regular columns of the
base table and are kept as `column_id`s which do not contain information
about the column kind.

This commit splits the `_base_non_pk_columns_in_view_pk` field into two,
one for regular columns and the other for static columns, so that it is
possible to keep both kinds of columns in `base_dependent_view_info` and
the structure can be used for secondary indexes on static columns.
2022-12-06 11:21:16 +01:00
Botond Dénes
681bd62424 Update tools/java submodule
* tools/java ecab7cf7d6...1c4e1e7a7d (2):
  > Merge "Cqlsh serverless v2" from Karol Baryla
  > Update Java Driver version to 3.11.2.4
2022-12-06 09:06:09 +02:00
Botond Dénes
6a1dbffaaa Merge 'compaction_manager: coroutinize postponed_compactions_reevaluation' from Avi Kivity
Three lambdas were removed, simplifying the code.

Closes #12207

* github.com:scylladb/scylladb:
  compaction_manager: reindent postponed_compactions_reevaluation()
  compaction_manager: coroutinize postponed_compactions_reevaluation()
  compaction_manager: make postponed_compactions_reevaluation() return a future
2022-12-06 08:08:36 +02:00
Avi Kivity
2339a3fa06 database: remove continuation for updating statistics
update_write_metrics() is a continuation added solely for updating
statistics. Fold it into do_update to reduce an allocation in the
write path.

```console
$ ./artifacts/before --write --smp 1  2<&1 | grep insn
189930.77 tps ( 57.2 allocs/op,  13.2 tasks/op,   50994 insns/op,        0 errors)
189954.18 tps ( 57.2 allocs/op,  13.2 tasks/op,   51086 insns/op,        0 errors)
188623.86 tps ( 57.2 allocs/op,  13.2 tasks/op,   51083 insns/op,        0 errors)
190115.01 tps ( 57.2 allocs/op,  13.2 tasks/op,   51092 insns/op,        0 errors)
190173.71 tps ( 57.2 allocs/op,  13.2 tasks/op,   51083 insns/op,        0 errors)
median 189954.18 tps ( 57.2 allocs/op,  13.2 tasks/op,   51086 insns/op,        0 errors)
```

vs

```console
$ ./artifacts/after --write --smp 1  2<&1 | grep insn
190358.38 tps ( 56.2 allocs/op,  12.2 tasks/op,   50754 insns/op,        0 errors)
185222.78 tps ( 56.2 allocs/op,  12.2 tasks/op,   50789 insns/op,        0 errors)
184508.09 tps ( 56.2 allocs/op,  12.2 tasks/op,   50842 insns/op,        0 errors)
142099.47 tps ( 56.2 allocs/op,  12.2 tasks/op,   50825 insns/op,        0 errors)
190447.22 tps ( 56.2 allocs/op,  12.2 tasks/op,   50811 insns/op,        0 errors)
```

One allocation and ~300 cycles saved.

update_write_metrics() is still called from other call sites, so it is
not removed.

Closes #12108
2022-12-06 07:04:17 +02:00
Botond Dénes
6daa1e973f Merge 'alternator: fix hangs related to TTL scanning' from Nadav Har'El
The first patch in this small series fixes a hang during shutdown when the expired-item scanning thread can hang in a retry loop instead of quitting.  These hangs were seen in some test runs (issue #12145).

The second patch is a failsafe against additional bugs like those solved by the first patch: If any bugs causes the same page fetch to repeatedly time out, let's stop the attempts after 10 retries instead of retrying for ever. When we stop the retries, a warning will be printed to the log, Scylla will wait until the next scan period and start a new scan from scratch - from a random position in the database, instead of hanging potentially-forever waiting for the same page.

Closes #12152

* github.com:scylladb/scylladb:
  alternator ttl: in scanning thread, don't retry the same page too many times
  alternator: fix hang during shutdown of expiration-scanning thread
2022-12-06 06:44:22 +02:00
Botond Dénes
c5da96e6f7 Merge 'cql3: batch_statement: coroutinize get_mutations()' from Avi Kivity
As it has a do_with(), coroutinizing it is an automatic win.

Closes #12195

* github.com:scylladb/scylladb:
  cql3: batch_statement: reindent get_mutations()
  cql3: batch_statement: coroutinize get_mutations()
2022-12-06 06:41:44 +02:00
Avi Kivity
d2b1d2f695 compaction_manager: reindent postponed_compactions_reevaluation() 2022-12-05 22:02:27 +02:00
Avi Kivity
1669025736 compaction_manager: coroutinize postponed_compactions_reevaluation()
So much nicer.
2022-12-05 22:01:41 +02:00
Avi Kivity
d2c44cba77 compaction_manager: make postponed_compactions_reevaluation() return a future
postponed_compactions_reevaluation() runs until compaction_manager is
stopped, checking if it needs to launch new compactions.

Make it return a future instead of stashing its completion somewhere.
This makes is easier to convert it to a coroutine.
2022-12-05 21:58:48 +02:00
Avi Kivity
fe4d7fbdf2 Update abseil submodule
* abseil 7f3c0d78...4e5ff155 (125):
  > Add a compilation test for recursive hash map types
  > Add AbslStringify support for enum types in Substitute.
  > Use a c++14-style constexpr initialization if c++14 constexpr is available.
  > Move the vtable into a function to delay instantiation until the function is called. When the variable is a global the compiler is allowed to instantiate it more aggresively and it might happen before the types involved are complete. When it is inside a function the compiler can't instantiate it until after the functions are called.
  > Cosmetic reformatting in a test.
  > Reorder base64 unescape methods to be below the escaping methods.
  > Fixes many compilation issues that come from having no external CI coverage of the accelerated CRC implementation and some differences bewteen the internal and external implementation.
  > Remove static initializer from mutex.h.
  > Import of CCTZ from GitHub.
  > Remove unused iostream include from crc32c.h
  > Fix MSVC builds that reject C-style arrays of size 0
  > Remove deprecated use of absl::ToCrc32c()
  > CRC: Make crc32c_t as a class for explicit control of operators
  > Convert the full parser into constexpr now that Abseil requires C++14, and use this parser for the static checker. This fixes some outstanding bugs where the static checker differed from the dynamic one. Also, fix `%v` to be accepted with POSIX syntax.
  > Write (more) directly into the structured buffer from StringifySink, including for (size_t, char) overload.
  > Avoid using the non-portable type __m128i_u.
  > Reduce flat_hash_{set,map} generated code size.
  > Use ABSL_HAVE_BUILTIN to fix -Wundef __has_builtin warning
  > Add a TODO for the deprecation of absl::aligned_storage_t
  > TSAN: Remove report_atomic_races=0 from CI now that it has been fixed
  > absl: fix Mutex TSan annotations
  > CMake: Remove trailing commas in `AbseilDll.cmake`
  > Fix AMD cpu detection.
  > CRC: Get CPU detection and hardware acceleration working on MSVC x86(_64)
  > Removing trailing period that can confuse a url in str_format.h.
  > Refactor btree iterator generation code into a base class rather than using ifdefs inside btree_iterator.
  > container.h: fix incorrect comments about the location of <numeric> algorithms.
  > Zero encoded_remaining when a string field doesn't fit, so that we don't leave partial data in the buffer (all decoders should ignore it anyway) and to be sure that we don't try to put any subsequent operands in either (there shouldn't be enough space).
  > Improve error messages when comparing btree iterators when generations are enabled.
  > Document the WebSafe* and *WithPadding variants more concisely, as deltas from Base64Encode.
  > Drop outdated comment about LogEntry copyability.
  > Release structured logging.
  > Minor formatting changes in preparation for structured logging...
  > Update absl::make_unique to reflect the C++14 minimum
  > Update Condition to allocate 24 bytes for MSVC platform pointers to methods.
  > Add missing include
  > Refactor "RAW: " prefix formatting into FormatLogPrefix.
  > Minor formatting changes due to internal refactoring
  > Fix typos
  > Add a new API for `extract_and_get_next()` in b-tree that returns both the extracted node and an iterator to the next element in the container.
  > Use AnyInvocable in internal thread_pool
  > Remove absl/time/internal/zoneinfo.inc.  It was used to guarantee availability of a few timezones for "time_test" and "time_benchmark", but (file-based) zoneinfo is now secured via existing Bazel data/env attributes, or new CMake environment settings.
  > Updated documentation on use of %v Also updated documentation around FormatSink and PutPaddedString
  > Use the correct Bazel copts in crc targets
  > Run the //absl/time timezone tests with a data dependency on, and a matching ${TZDIR} setting for, //absl/time/internal/cctz:zoneinfo.
  > Stop unnecessary clearing of fields in ~raw_hash_set.
  > Fix throw_delegate_test when using libc++ with shared libraries
  > CRC: Ensure SupportsArmCRC32PMULL() is defined
  > Improve error messages when comparing btree iterators.
  > Refactor the throw_delegate test into separate test cases
  > Replace std::atomic_flag with std::atomic<bool> to avoid the C++20 deprecation of ATOMIC_FLAG_INIT.
  > Add support for enum types with AbslStringify
  > Release the CRC library
  > Improve error messages when comparing swisstable iterators.
  > Auto increase inlined capacity whenever it does not affect class' size.
  > drop an unused dep
  > Factor out the internal helper AppendTruncated, which is used and redefined in a couple places, plus several more that have yet to be released.
  > Fix some invalid iterator bugs in btree_test.cc for multi{set,map} emplace{_hint} tests.
  > Force a conservative allocation for pointers to methods in Condition objects.
  > Fix a few lint findings in flags' usage.cc
  > Narrow some _MSC_VER checks to not catch clang-cl.
  > Small cleanups in logging test helpers
  > Import of CCTZ from GitHub.
  > Merge pull request abseil/abseil-cpp#1287 from GOGOYAO:patch-1
  > Merge pull request abseil/abseil-cpp#1307 from KindDragon:patch-1
  > Stop disabling some test warnings that have been fixed
  > Support logging of user-defined types that implement `AbslStringify()`
  > Eliminate span_internal::Min in favor of std::min, since Min conflicts with a macro in a third-party library.
  > Fix -Wimplicit-int-conversion.
  > Improve error messages when dereferencing invalid swisstable iterators.
  > Cord: Avoid leaking a node if SetExpectedChecksum() is called on an empty cord twice in a row.
  > Add a warning about extract invalidating iterators (not just the iterator of the element being extracted).
  > CMake: installed artifacts reflect the compiled ABI
  > Import of CCTZ from GitHub.
  > Import of CCTZ from GitHub.
  > Support empty Cords with an expected checksum
  > Move internal details from one source file to another more appropriate source file.
  > Removes `PutPaddedString()` function
  > Return uint8_t from CappedDamerauLevenshteinDistance.
  > Remove the unknown CMAKE_SYSTEM_PROCESSOR warning when configuring ABSL_RANDOM_RANDEN_COPTS
  > Enforce Visual Studio 2017 (MSVC++ 15.0) minumum
  > `absl::InlinedVector::swap` supports non-assignable types.
  > Improve b-tree error messages when dereferencing invalid iterators.
  > Mutex: Fix stall on single-core systems
  > Document Base64Unescape() padding
  > Fix sign conversion warnings in memory_test.cc.
  > Fix a sign conversion warning.
  > Fix a truncation warning on Windows 64-bit.
  > Use btree iterator subtraction instead of std::distance in erase_range() and count().
  > Eliminate use of internal interfaces and make the test portable and expose it to OSS.
  > Fix various warnings for _WIN32.
  > Disables StderrKnobsDefault due to order dependency
  > Implement btree_iterator::operator-, which is faster than std::distance for btree iterators.
  > Merge pull request abseil/abseil-cpp#1298 from rpjohnst:mingw-cmake-build
  > Implement function to calculate Damerau-Levenshtein distance between two strings.
  > Change per_thread_sem_test from size medium to size large.
  > Support stringification of user-defined types in AbslStringify in absl::Substitute.
  > Fix "unsafe narrowing" warnings in absl, 12/12.
  > Revert change to internal 'Rep', this causes issues for gdb
  > Reorganize InlineData into an inner Rep structure.
  > Remove internal `VLOG_xxx` macros
  > Import of CCTZ from GitHub.
  > `absl::InlinedVector` supports move assignment with non-assignable types.
  > Change Cord internal layout, which reduces store-load penalties on ARM
  > Detects accidental multiple invocations of AnyInvocable<R(...)&&>::operator()&& by producing an error in debug mode, and clarifies that the behavior is undefined in the general case.
  > Fix a bug in StrFormat. This issue would have been caught by any compile-time checking but can happen for incorrect formats parsed via ParsedFormat::New. Specifically, if a user were to add length modifiers with 'v', for example the incorrect format string "%hv", the ParsedFormat would incorrectly be allowed.
  > Adds documentation for stringification extension
  > CMake: Remove check_target calls which can be problematic in case of dependency cycle
  > Changes mutex unlock profiling
  > Add static_cast<void*> to the sources for trivial relocations to avoid spurious -Wdynamic-class-memaccess errors in the presence of other compilation errors.
  > Configure ABSL_CACHE_ALIGNED for clang-like and MSVC toolchains.
  > Fix "unsafe narrowing" warnings in absl, 11/n.
  > Eliminate use of internal interfaces
  > Merge pull request abseil/abseil-cpp#1289 from keith:ks/fix-more-clang-deprecated-builtins
  > Merge pull request abseil/abseil-cpp#1285 from jun-sheaf:patch-1
  > Delete LogEntry's copy ctor and assignment operator.
  > Make sinks provided to `AbslStringify()` usable with `absl::Format()`.
  > Cast unused variable to void
  > No changes in OSS.
  > No changes in OSS
  > Replace the kPower10ExponentTable array with a formula.
  > CMake: Mark absl::cord_test_helpers and absl::spy_hash_state PUBLIC
  > Use trivial relocation for transfers in swisstable and b-tree.
  > Merge pull request abseil/abseil-cpp#1284 from t0ny-peng:chore/remove-unused-class-in-variant
  > Removes the legacy spellings of the thread annotation macros/functions by default.

Closes #12201
2022-12-05 21:07:16 +02:00
Eliran Sinvani
5a5514d052 cql server: Only parallelize relevant cql requests
The cql server uses an execution stage to process and execute queries,
however, processing stage is best utilized when having a recurrent flow
that needs to be called repeatedly since it better utilizes the
instruction cache.
Up until now, every request was sent through the processing stage, but
most requests are not meant to be executed repeatedly with high volume.
This change processes and executes the data queries asynchronously,
through an execution stage, and all of the rest are processed one by
one, only continuing once the request has been done end to end.

Tests:
Unit tests in dev and debug.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes #12202
2022-12-05 21:06:58 +02:00
Takuya ASADA
b7851ab1ec docker: fix locale on SSH shell
4ecc08c broke locale settings on SSH shell, since we dropped "update-locale".
To fix this without installing locales package, we need to manually specify
LANG=C.UTF-8 in /etc/default/locale.

see https://github.com/scylladb/scylla-cluster-tests/pull/5519

Closes #12197
2022-12-05 20:02:18 +02:00
Avi Kivity
6f2d060d12 Merge 'Make sstable_directory call sstable_manager for sstables' components' from Pavel Emelyanov
This PR hits two goals for "object storage" effort

1. Sstables loader "knows" that sstables components are stored in a Linux directory and uses utils/lister to access it. This is not going to work with sstables over object storage, the loader should be abstracted from the underlying storage.

2. Currently class keyspace and class column_family carry "datadir" and "all_datadirs" on board which are path on local filesystem where sstable files are stored (those usually started with /var/lib/scylla/data). The paths include subsdirs like "snapshots", "staging", etc. This is not going to look nice for obejct storage, the /var/lib/ prefix is excessive and meaningless in this case. Instead, ks and cf should know their "location" and some other component should know the directory where in which the files are stored.

Said that, this PR prepares distributed_loader and sstables_directly to stop using Linux paths explicitly by making both call sstables_manager to list and open sstables object. After it will be possible to teach manager to list sstables from object storage. Also this opens the way to removing paths from keyspace and column_family classes and replacing those with relative "location"s.

Closes #12128

* github.com:scylladb/scylladb:
  sstable_directory: Get components lister from manager
  sstable_directory: Extract directory lister
  sstable_directory: Remove sstable creation callback
  sstable_directory: Call manager to make sstables
  sstable_directory: Keep error handler generator
  sstable_directory: Keep schema_ptr
  sstable_directory: Use directory semaphore from manager
  sstable_directory: Keep reference on manager
  tests: Use sstables creation helper in some cases
  sstables_manager: Keep directory semaphore reference
  sstables, code: Wrap directory semaphore with concurrency
2022-12-05 18:54:17 +02:00
Gleb Natapov
022a825b33 raft: introduce not_a_member error and return it when non member tries to do add/modify_config
Currently if a node that is outside of the config tries to add an entry
or modify config transient error is returned and this causes the node
to retry. But the error is not transient. If a node tries to do one of
the operations above it means it was part of the cluster at some point,
but since a node with the same id should not be added back to a cluster
if it is not in the cluster now it will never be.

Return a new error not_a_member to a caller instead.

Message-Id: <Y42mTOx8bNNrHqpd@scylladb.com>
2022-12-05 17:11:04 +01:00
Benny Halevy
c61083852c storage_service: handle_state_normal: calculate candidates_for_removal when replacing tokens
We currently try to detect a replaced node so to insert it to
endpoints_to_remove when it has no owned tokens left.
However, for each token we first generate a multimap using
get_endpoint_to_token_map_for_reading().

There are 2 problems with that:

1. unless the replaced node owns a single token, this map will not
   be empty after erasing one token out of it, since the
   token metadata has not changed yet (this is done later with
   update_normal_tokens(owned_tokens, endpoint)).
2. generating this map for each token is inefficient, turning this
   algorithm complexity to quadratic in the number of tokens...

This change copies the current token_to_endpoint map
to temporary map and erases replaced tokens from it,
while maintaining a set of candidates_for_removal.

After traversing all replaced tokens, we check again
the `token_to_endpoint_map` erasing from `candidates_for_removal`
any endpoint that still owns tokens.
The leftover candidates are endpoints the own no tokens
and so they are added to `hosts_to_remove`.

Fixes #12082

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12141
2022-12-05 16:17:18 +01:00
Botond Dénes
3d620378d4 Merge 'view: coroutinize maybe_mark_view_as_built' from Avi Kivity
Simplifying it a little.

Closes #12171

* github.com:scylladb/scylladb:
  view: reindent maybe_mark_view_as_built
  view: coroutinize maybe_mark_view_as_built
2022-12-05 13:43:34 +02:00
Kamil Braun
3f8aaeeab9 test/topology: enable replace tests
Also add some TODOs for enhancing existing tests.
2022-12-05 11:50:07 +01:00
Kamil Braun
ee19411783 service/raft: report an error when Raft ID can't be found in raft_group0::remove_from_group0
Also simplify the code and improve logging in general.

The previous code did this: search for the ID in the address map. If it
couldn't be found, perform a read barrier and search again. If it again
couldn't be found, return.

This algorithm depended on the fact that IP addresses were stored in
group 0 configuration. The read barrier was used to obtain the most
recent configuration, and if the IP was not a part of address map after
the read barrier, that meant it's simply not a member of group 0.

This logic no longer applies so we can simplify the code.

Furthermore, when I was fixing the replace operation with Raft enabled,
at some point I had a "working" solution with all tests passing. But I
was suspicious and checked if the replaced node got removed from
group 0. It wasn't. So the replace finished "successfully", but we had
an additional (voting!) member of group 0 which didn't correspond to
a token ring member.

The last version of my fixes ensure that the node gets removed by the
replacing node. But the system is fragile and nothing prevents us from
breaking this again. At least log an error for now. Regression tests
will be added later.
2022-12-05 11:50:07 +01:00
Kamil Braun
4429885543 service: handle replace correctly with Raft enabled
We must place the Raft ID obtained during the shadow round in the
address map. It won't be placed by the regular gossiping route if we're
replacing using the same IP, because we override the application state
of the replaced node. Even if we replace a node with a different IP, it
is not guaranteed that background gossiping manages to update the
address map before we need it, especially in tests where we set
ring_delay to 0 and disable wait_for_gossip_to_settle. The shadow round,
on the other hand, performs a synchronous request (and if it fails
during bootstrap, bootstrap will fail - because we also won't be able to
obtain the tokens and Host ID of the replaced node).

Fetch the Raft ID of the replaced node in `prepare_replacement_info`,
which runs the shadow round. Return it in `replacement_info`. Then
`join_token_ring` passes it to `setup_group0`, which stores it in the
address map. It does that after `join_group0` so the entry is
non-expiring (the replaced node is a member of group 0). Later in the
replace procedure, we call `remove_from_group0` for the replaced node.
`remove_from_group0` will be able to reverse-translate the IP of the
replaced node to its Raft ID using the address map.
2022-12-05 11:50:07 +01:00
Kamil Braun
45bb5bfb52 gms/gossiper: fetch RAFT_SERVER_ID during shadow round
During the replace operation we need the Raft ID of the replaced node.
The shadow round is used for fetching all necessary information before
the replace operation starts.
2022-12-05 11:50:07 +01:00
Kamil Braun
7222c2f9a1 service: storage_service: sleep 2*ring_delay instead of BROADCAST_INTERVAL before replace
Most of the sleeps related to gossiping are based on `ring_delay`,
which is configurable and can be set to lower value e.g. during tests.

But for some reason there was one case where we slept for a hardcoded
value, `service::load_broadcaster::BROADCAST_INTERVAL` - 60 seconds.

Use `2 * get_ring_delay()` instead. With the default value of
`ring_delay` (30 seconds) this will give the same behavior.
2022-12-05 11:50:07 +01:00
Pavel Emelyanov
b5ede873f2 sstable_directory: Get components lister from manager
For now this is almost a no-op because manager just calls
sstables_directory code back to create the lister.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
3f9b8c855d sstable_directory: Extract directory lister
Currently the utils/lister.cc code is in use to list regular files in a
directory. This patch wraps the lister into more abstract components
lister class.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
abd3602b10 sstable_directory: Remove sstable creation callback
It's no longer used.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
3d559391df sstable_directory: Call manager to make sstables
Now the directory code has everyhting it needs to create sstable object
and can stop using the external lambda.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
db657a8d1c sstable_directory: Keep error handler generator
Yet another continuation to previous patch -- IO error handlers
generator is also needed to create sstables.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
4281f4af42 sstable_directory: Keep schema_ptr
Continuation of one-before-previous patch. In order to create sstable
without external lambda the directory code needs schema.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
8df1bcb907 sstable_directory: Use directory semaphore from manager
After previous patch sstables_directory code may no longer require for
semaphore argument, because it can get one from manager. This makes the
directory API shorter and simpler.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
4da941e159 sstable_directory: Keep reference on manager
The sstables_directly accesses /var/lib/scylla/data in two ways -- lists
files in it and opens sstables. The latter is abdtracted with the help
of lambdas passed around, but the former (listing) is done by using
directory liters from utils.

Listing sstables components with directlry lister won't work for object
storage, the directory code will need to call some abstraction layer
instead. Opening sstables with the help of a lambda is a bit of
overkill, having sstables manager at hand could make it much simpler.

Said that, this patch makes sstables_directly reference sstables_manager
on start.

This change will also simplify directory semaphore usage (next patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
784d78810a tests: Use sstables creation helper in some cases
Several test cases push sstables creation lambda into
with_sstables_directory helper. There's a ready to use helper class that
does the same. Next patch will make additional use of that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:19 +03:00
Pavel Emelyanov
5e13ce2619 sstables_manager: Keep directory semaphore reference
Preparational patch. The semaphore will be used by sstables_directory in
next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 12:03:18 +03:00
Pavel Emelyanov
be8512d7cc sstables, code: Wrap directory semaphore with concurrency
Currently this is a sharded<semaphore> started/stopped in main and
referenced by database in order to be fed into sstables code. This
semaphore always comes with the "concurrency" parameter that limits the
parallel_for_each parallelizm.

This patch wraps both together into directory_semaphore class. This
makes its usage simpler and will allow extending it in the future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-05 11:59:30 +03:00
Asias He
c6087cf3a0 repair: Reduce repair reader eviction with diff shard count
When repair master and followers have different shard count, the repair
followers need to create multi-shard readers. Each multi-shard reader
will create one local reader on each shard, N (smp::count) local readers
in total.

There is a hard limit on the number of readers who can work in parallel.
When there are more readers than this limit. The readers will start to
evict each other, causing buffers already read from disk to be dropped
and recreating of readers, which is not very efficient.

To optimize and reduce reader eviction overhead, a global reader permit
is introduced which considers the multi-shard reader bloats.

With this patch, at any point in time, the number of readers created by
repair will not exceed the reader limit.

Test Results:

1) with stream sem 10, repair global sem 10, 5 ranges in parallel, n1=2
shards, n2=8 shards, memory wanted =1

1.1)
[asias@hjpc2 mycluster]$ time nodetool -p 7200 repair ks2  (repair on n2)
[2022-11-23 17:45:24,770] Starting repair command #1, repairing 1
ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true)
[2022-11-23 17:45:53,869] Repair session 1
[2022-11-23 17:45:53,869] Repair session 1 finished

real    0m30.212s
user    0m1.680s
sys     0m0.222s

1.2)
[asias@hjpc2 mycluster]$ time nodetool  repair ks2  (repair on n1)
[2022-11-23 17:46:07,507] Starting repair command #1, repairing 1
ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true)
[2022-11-23 17:46:30,608] Repair session 1
[2022-11-23 17:46:30,608] Repair session 1 finished

real    0m24.241s
user    0m1.731s
sys     0m0.213s

2) with stream sem 10, repair global sem no_limit, 5 ranges in
parallel, n1=2 shards, n2=8 shards, memory wanted =1

2.1)
[asias@hjpc2 mycluster]$ time nodetool -p 7200 repair ks2 (repair on n2)
[2022-11-23 17:49:49,301] Starting repair command #1, repairing 1
ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true)
[2022-11-23 17:52:01,414] Repair session 1
[2022-11-23 17:52:01,415] Repair session 1 finished

real    2m13.227s
user    0m1.752s
sys     0m0.218s

2.2)
[asias@hjpc2 mycluster]$ time nodetool  repair ks2 (repair on n1)
[2022-11-23 17:52:19,280] Starting repair command #1, repairing 1
ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true)
[2022-11-23 17:52:42,387] Repair session 1
[2022-11-23 17:52:42,387] Repair session 1 finished

real    0m24.196s
user    0m1.689s
sys     0m0.184s

Comparing 1.1) and 2.1), it shows the eviction played a major role here.
The patch gives 73s / 30s = 2.5X speed up in this setup.

Comparing 1.1 and 1.2, it shows even if we limit the readers, starting
on the lower shard is faster 30s / 24s = 1.25X (the total number of
multishard readers is lower)

Fixes #12157

Closes #12158
2022-12-05 10:47:36 +02:00
Botond Dénes
1e20095547 Update tools/java submodule
* tools/java 1c06006447...ecab7cf7d6 (1):
  > Add VSCode files to gitignore
2022-12-05 09:54:51 +02:00
Botond Dénes
c4d72c8dd0 Merge 'cql3: select_statement: split and coroutinize process_results()' from Avi Kivity
Split the simple (and common) case from the complex case,
and coroutinize the latter. Hopefully this generates better
code for the simple case, and it makes the complex case a
little nicer.

Closes #12194

* github.com:scylladb/scylladb:
  cql3: select_statement: reindent process_results_complex()
  cql3: select_statement: coroutinize process_results_complex()
  cql3: select_statement: split process_results() into fast path and complex path
2022-12-05 08:16:22 +02:00
Avi Kivity
a0a4711b74 snapshot: protect list operations against the lambda coroutine fiasco
run_snapshot_list_operation() takes a continuation, so passing it
a lambda coroutine without protection is dangerous.

Protect the coroutine with coroutine::lambda so it doesn't lost its
contents.

Fixes #12192.

Closes #12193
2022-12-05 08:14:39 +02:00
guy9
cb842b2729 Replacing the Docs top bar message from the LIVE event to the community forum announcement
Closes #12189
2022-12-05 08:05:04 +02:00
Avi Kivity
6326be5796 cql3: batch_statement: reindent get_mutations() 2022-12-04 21:47:22 +02:00
Avi Kivity
2d74360de3 cql3: batch_statement: coroutinize get_mutations()
It has a do_with(), so an automatic win.
2022-12-04 21:45:10 +02:00
Avi Kivity
0834bb0365 cql3: select_statement: reindent process_results_complex() 2022-12-04 21:36:17 +02:00
Avi Kivity
a63f98e3fc cql3: select_statement: coroutinize process_results_complex()
Not a huge gain, since it's just a do_with, but still a little better.

Note the inner lambda is not a coroutine, so isn't susceptibe to
the lambda coroutine fiasco.
2022-12-04 21:34:51 +02:00
Avi Kivity
7f29efa0ad cql3: select_statement: split process_results() into fast path and complex path
This will allow us to coroutinize the complex path without adding an
allocation to the fast path.
2022-12-04 21:30:45 +02:00
Avi Kivity
02b66bb31a Merge 'Mark sstable::<directory accessing methods> private' from Pavel Emelyanov
One of the prerequisites to make sstables reside on object-storage is not to let the rest of the code "know" the filesystem path they are located on (because sometimes they will not be on any filesystem path). This patch makes the methods that can reveal this path back private so that later they can be abstracted out.

Closes #12182

* github.com:scylladb/scylladb:
  sstable: Mark some methods private
  test: Don't get sstable dir when known
  test: Use move_to_quarantine() helper
  test: Use sstable::filename() overload without dir name
  sstables: Reimplement batch directory sync after move
  table, tests: Make use of move_to_new_dir() default arg
  sstables: Remove fsync_directory() helper
  table: Simplify take_snapshot()'s collecting sstables names
2022-12-04 17:45:37 +02:00
Kamil Braun
b551cd254c test: test_raft_upgrade: fix test_recover_stuck_raft_upgrade flakiness
The test enables an error injection inside the Raft upgrade procedure
on one of the nodes which will cause the node to throw an exception
before entering `synchronize` state. Then it restarts other nodes with
Raft enabled, waits until they enter `synchronize` state, puts them in
RECOVERY mode, removes the error-injected node and creates a new Raft
group 0.

As soon as the other nodes enter `synchronize`, the test disabled the
error injection (the rest of the test was outside the `async with
inject_error(...)` block). There was a small chance that we disabled the
error injection before the node reached it. In that case the node also
entered `synchronize` and the cluster managed to finish the upgrade
procedure. We encountered this during next promotion.

Eliminate this possibility by extending the scope of the `async with
inject_error(...)` block, so that the RECOVERY mode steps on the other
nodes are performed within that block.

Closes #12162
2022-12-02 21:26:44 +01:00
Avi Kivity
94f18b5580 test: sstable_conforms_to_mutation_source: use do_with_async() where needed
The test clearly needs a thread (it converts a reader to a mutation
without waiting), so give it one.

Closes #12178
2022-12-02 20:48:37 +01:00
Pavel Emelyanov
084522d9eb sstable: Mark some methods private
There are several class sstable methods that reveal internal directory
path to caller. It's not object-storage-friendly. Fortunately, all the
callers of those methods had been patched not to work with full paths,
so these can be marked private.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:15:02 +03:00
Pavel Emelyanov
fb63850f2c test: Don't get sstable dir when known
The sstable_move_test creates sstables in its own temp directories and
the requests these dirs' paths back from sstables. Test can come with
the paths it has at hand, no need to call sstables for it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:13:58 +03:00
Pavel Emelyanov
4c742a658d test: Use move_to_quarantine() helper
Two places in tests move sstable to quarantine subdir by hand. There's
the class sstable method that does the same, so use it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:13:19 +03:00
Pavel Emelyanov
d6244b7408 test: Use sstable::filename() overload without dir name
The dir this place currently uses is the directory where the sstable was
created, so dropping this argument would just render the same path.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:12:21 +03:00
Pavel Emelyanov
a702affd4d sstables: Reimplement batch directory sync after move
There's a table::move_sstables_from_staging() method that gets a bunch
of sstables and moves them from staging subdit into table's root
datadir. Not to flush the root dir for every sstable move, it asks the
sstable::move_to_new_dir() not to flush, but collects staging dir names
and flushes them and the root dir at the end altothether.

In order to make it more friendly to object-storage and to remove one
more caller of sstable::get_dir() the delayed_commit_changes struct is
introduced. It collects _all_ the affected dir names in unordered_set,
then allows flushing them. By default the move_to_new_dir() doesn't
receive this object and flushes the directories instantly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:08:47 +03:00
Pavel Emelyanov
1b42d5fce3 table, tests: Make use of move_to_new_dir() default arg
The method in question accepts boolean bit whether or not it should sync
directories at the end. It's always true but in one case, so there's the
default value for it. Make use of it.

Anticipating the suggestion to replace bool with bool_class -- next
patch will replace it with something else.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:07:16 +03:00
Pavel Emelyanov
339feb4205 sstables: Remove fsync_directory() helper
The one effectively wraps existing seastar sync_directory() helper into
two io_check-s. It's simpler just to call the latter directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:05:43 +03:00
Pavel Emelyanov
80f5d7393f table: Simplify take_snapshot()'s collecting sstables names
The method in question "snapshots" all sstables it can find, then writes
their Datafile names into the manifest file. To get the list of file
names it iterates over sstables list again and does silly conversion of
full file path to file name with the help of the directory path length.

This all can be made much simpler if just collecting component names
directly at the time sstable is hardlinked.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-02 21:02:37 +03:00
Raphael S. Carvalho
d61b4f9dfb compaction_manager: Delete compaction_state's move constructor
compaction_state shouldn't be moved once emplaced. moving it could
theoretically cause task's gate holder to have a dangling pointer to
compaction_state's gate, but turns out gate's move ctor will actually
fail under this assertion:
assert(!_count && "gate reassigned with outstanding requests");

Cannot happen today, but let's make it more future proof.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12167
2022-12-02 20:56:57 +03:00
Tomasz Grabiec
1a6bf2e9ca Merge 'service/raft: specialized verb for failure detector pinger' from Kamil Braun
We used GOSSIP_ECHO verb to perform failure detection. Now we use
a special verb DIRECT_FD_PING introduced for this purpose.

There are multiple reasons to do so.

One minor reason: we want to use the same connection as other Raft
verbs: if we can't deliver Raft append_entries or vote messages
somewhere, that endpoint should be marked dead; if we can, the
endpoint should be marked alive. So putting pings on the same
connection as the other Raft verbs is important when dealing with
weird situations where some connections are available but others are
not. Observe that in `do_get_rpc_client_idx`, we put the new verb in
the right place.

Another minor reason: we remove the awkward gossiper `echo_pinger`
abstraction which required storing and updating gossiper generation
numbers. This also removes one dependency from Raft service code to
gossiper.

Major reason 1: the gossip echo handler has a weird mechanism where a
replacing node returns errors during the replace operation to some of
the nodes. In Raft however, we want to mark servers as alive when they
are alive, including a server running on a node that's replacing
another node.

Major reason 2, related to the previous one: when server B is
replacing server A with the same IP, the failure detector will try to
ping both servers. Both servers are mapped to the same IP by the
address map, so pings to both servers will reach server B. We want
server B to respond to the pings destined for server B, but not to
pings destined for server A, so the sender can mark B alive but keep A
marked dead.

To do this, we include the destination's Raft ID in our RPCs. The
destination compares the received ID with its own. If it's different,
it returns a `wrong_destination` response, and the failure detector
knows that the ping did not reach the destination (it reached someone
else).

Yet another reason: removes "Not ready to respond gossip echo
message" log spam during replace.

Closes #12107

* github.com:scylladb/scylladb:
  service/raft: specialized verb for failure detector pinger
  db: system_keyspace: de-staticize `{get,set}_raft_server_id`
  service/raft: make this node's Raft ID available early in group registry
2022-12-02 13:54:02 +01:00
Pavel Emelyanov
71179ff5ab distributed_loader: Use coroutine::lambda in sleeping coroutine
According to seastar/doc/lambda-coroutine-fiasco.md lambda that
co_awaits once loses its capture frame. In distrobuted_loader
code there's at least one of that kind.

fixes: #12175

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12170
2022-12-02 13:06:33 +02:00
Pavel Emelyanov
1d91914166 sstables: Drop set_generation() method
The method became unused since 70e5252a (table: no longer accept online
loading of SSTable files in the main directory) and the whole concept of
reshuffling sstables was dropped later by 7351db7c (Reshape upload files
and reshard+reshape at boot).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12165
2022-12-01 22:17:10 +02:00
Avi Kivity
2978052113 view: reindent maybe_mark_view_as_built
Several identation levels were harmed during the preparation
of this patch.
2022-12-01 22:09:21 +02:00
Avi Kivity
ac2e2f8883 view: coroutinize maybe_mark_view_as_built
Somewhat simplifies complicated logic.
2022-12-01 22:04:51 +02:00
Kamil Braun
cbdcc944b5 service/raft: specialized verb for failure detector pinger
We used GOSSIP_ECHO verb to perform failure detection. Now we use
a special verb DIRECT_FD_PING introduced for this purpose.

There are multiple reasons to do so.

One minor reason: we want to use the same connection as other Raft
verbs: if we can't deliver Raft append_entries or vote messages
somewhere, that endpoint should be marked dead; if we can, the
endpoint should be marked alive. So putting pings on the same
connection as the other Raft verbs is important when dealing with
weird situations where some connections are available but others are
not. Observe that in `do_get_rpc_client_idx`, we put the new verb in
the right place.

Another minor reason: we remove the awkward gossiper `echo_pinger`
abstraction which required storing and updating gossiper generation
numbers. This also removes one dependency from Raft service code to
gossiper.

Major reason 1: the gossip echo handler has a weird mechanism where a
replacing node returns errors during the replace operation to some of
the nodes. In Raft however, we want to mark servers as alive when they
are alive, including a server running on a node that's replacing
another node.

Major reason 2, related to the previous one: when server B is
replacing server A with the same IP, the failure detector will try to
ping both servers. Both servers are mapped to the same IP by the
address map, so pings to both servers will reach server B. We want
server B to respond to the pings destined for server B, but not to
pings destined for server A, so the sender can mark B alive but keep A
marked dead.

To do this, we include the destination's Raft ID in our RPCs. The
destination compares the received ID with its own. If it's different,
it returns a `wrong_destination` response, and the failure detector
knows that the ping did not reach the destination (it reached someone
else).

Yet another reason: removes "Not ready to respond gossip echo
message" log spam during replace.
2022-12-01 20:54:18 +01:00
Kamil Braun
02c64becdc db: system_keyspace: de-staticize {get,set}_raft_server_id
Part of the anti-globals war.
2022-12-01 20:54:18 +01:00
Kamil Braun
99fe580068 service/raft: make this node's Raft ID available early in group registry
Raft ID was loaded or created late in the boot procedure, in
`storage_service::join_token_ring`.

Create it earlier, as soon as it's possible (when `system_keyspace`
is started), pass it to `raft_group_registry::start` and store it inside
`raft_group_registry`.

We will use this Raft ID stored in group registry in following patches.
Also this reduces the number of disk accesses for this node's Raft ID.
It's now loaded from disk once, stored in `raft_group_registry`, then
obtained from there when needed.

This moves `raft_group_registry::start` a bit later in the startup
procedure - after `system_keyspace` is started - but it doesn't make
a difference.
2022-12-01 20:54:18 +01:00
Nadav Har'El
6fcb5302a6 alternator-test: xfail a flaky test exposing a known bug
In a recent commit 757d2a4, we removed the "xfail" mark from the test
test_manual_requests.py::test_too_large_request_content_length
because it started to pass on more modern versions of Python, with a
urllib3 bug fixed.

Unfortunately, the celebration was premature: It turns out that although
the test now *usually* passes, it sometimes fails. This is caused by
a Seastar bug scylladb/seastar#1325, which I opened #12166 to track
in this project. So unfortunately we need to add the "xfail" mark back
to this test.

Note that although the test will now be marked "xfail", it will actually
pass most of the time, so will appear as "xpass" to people run it.
I put a note in the xfail reason string as a reminder why this is
happening.

Fixes #12143
Refs #12166
Refs scylladb/seastar#1325

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12169
2022-12-01 20:00:46 +02:00
Kamil Braun
3cd035d1b9 test/pylib: scylla_cluster: remove ScyllaCluster.decommissioned field
The field was not used for anything. We can keep decommissioned server
in `stopped` field.

In fact it caused us a problem: since recently, we're using
`ScyllaCluster.uninstall` to clean-up servers after test suite finishes
(previously we were using `ScyllaServer.uninstall` directly). But
`ScyllaCluster.uninstall` didn't look into the `decommissioned` field,
so if a server got decommissioned, we wouldn't uninstall it, and it left
us some unnecessary artifacts even for successful tests. This is now
fixed.

Closes #12163
2022-12-01 19:07:26 +02:00
Avi Kivity
a4b77a5691 Merge 'Cleanup sstables::test_env's manager usage' from Pavel Emelyanov
Mainly this PR removes global db::config and feature service that are used by sstables::test_env as dependencies for embedded sstables_manager. Other than that -- drop unused methods, remove nested test_env-s and relax few cases that use two temp dirs at a time for no gain.

Closes #12155

* github.com:scylladb/scylladb:
  test, utils: Use only one tempdir
  sstable_compaction_test: Dont create nested envs
  mutation_reader_test: Remove unused create_sstable() helper
  tests, lib: Move globals onto sstables::test_env
  tests: Use sstables::test_env.db_config() to access config
  features: Mark feature_config_from_db_config const
  sstable_3_x_test: Use env method to create sst
  sstable_3_x_test: Indentation fix after previous patch
  sstable_3_x_test: Use sstable::test_env
  test: Add config to sstable::test_env creation
  config: Add constexpr value for default murmur ignore bits
2022-12-01 17:47:25 +02:00
Pavel Emelyanov
4c6bfc078d code: Use http::re(quest|ply) instead of httpd:: ones
Recent seastar update deprecated those from httpd namespace.

fixes: #12142

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12161
2022-12-01 17:33:35 +02:00
Pavel Emelyanov
adc6ee7ea8 test, utils: Use only one tempdir
There's a do_with_cloned_tmp_directory that makes two temp dirs to toss
sstables between them. Make it go with just one, all the more so it
would resemble existing manipulations aroung staging/ subdir

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:39:57 +03:00
Pavel Emelyanov
15a7b9cafa sstable_compaction_test: Dont create nested envs
The "compact" test case runs in sstables::test_env and additionally
wraps it with another instance provided by do_with_tmp_directory helper.
It's simpler to create the temp dir by hand and use outter env.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:39:56 +03:00
Pavel Emelyanov
69fe5fd054 mutation_reader_test: Remove unused create_sstable() helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:39:54 +03:00
Pavel Emelyanov
400bc2c11d tests, lib: Move globals onto sstables::test_env
There's a bunch of objects that are used by test_env as sstables_manager
dependencies. Now when no other code needs those globals they better sit
on the test_env next to the manager

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:39:36 +03:00
Pavel Emelyanov
6a294b9ad6 tests: Use sstables::test_env.db_config() to access config
Currently some places use global test config, but it's going to be
removed soon, so switch to using config from environment

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:39:30 +03:00
Pavel Emelyanov
b4e31ad359 features: Mark feature_config_from_db_config const
It's in fact such. Other than that, next patch will call it with const
config at hand and fail to compile without this fix

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:39:27 +03:00
Pavel Emelyanov
8178845ef3 sstable_3_x_test: Use env method to create sst
Just to make it shorter and conform to other sst env tests

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:39:19 +03:00
Pavel Emelyanov
8d5d05012e sstable_3_x_test: Indentation fix after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:39:09 +03:00
Pavel Emelyanov
6628d801f2 sstable_3_x_test: Use sstable::test_env
There are several cases there that construct sstables_manager by hand
with the help of a bunch of global dependencies. It's nicer to use
existing wrapper.

(indentation left broken until next patch)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:38:46 +03:00
Pavel Emelyanov
1d8c76164f test: Add config to sstable::test_env creation
To make callers (tests) construct it with different options. In
particular, one test will soon want to construct it with custom large
data handler of its own.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:38:18 +03:00
Pavel Emelyanov
6d0c8fb6e2 config: Add constexpr value for default murmur ignore bits
... and use in some places of sstable_compaction_test. This will allow
getting rid of global test_db_config thing later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-01 13:38:15 +03:00
Botond Dénes
dbd00fd3e9 Merge 'Task manager shard repair tasks' from Aleksandra Martyniuk
The PR introduces shard_repair_task_impl which represents a repair task
that spans over a single shard repair.

repair_info is replaced with shard_repair_task_impl, since both serve
similar purpose.

Closes #12066

* github.com:scylladb/scylladb:
  repair: reindent
  repair: replace repair_info with shard_repair_task_impl
  repair: move repair_info methods to shard_repair_task_impl
  repair: rename methods of repair_module
  repair: change type of repair_module::_repairs
  repair: keep a reference to shard_repair_task_impl in row_level_repair
  repair: move repair_range method to shard_repair_task_impl
  repair: make do_repair_ranges a method of shard_repair_task_impl
  repair: copy repair_info methods to shard_repair_task_impl
  repair: corutinize shard task creation
  repair: define run for shard_repair_task_impl
  repair: add shard_repair_task_impl
2022-12-01 10:04:31 +02:00
Nadav Har'El
5eda8ce4fd alternator ttl: in scanning thread, don't retry the same page too many times
Since fixing issue #11737, when the expiration scanner times out reading
a page of data, it retries asking for the same page instead of giving up
on the scan and starting anew later. This retry was infinite - which can
cause problems if we have a bug in the code or several nodes down, which
can lead to getting hung in the same place in the scan for a very long
(potentially infinite) time without making any progress.

An example of such a bug was issue #12145, where we forgot to handle
shutdowns, so on shutdown of the cluster we just hung forever repeating
the same request that will never succeed. It's better in this case to
just give up on the current scan, and start it anew (from a random
position) later.

Refs #12145 (that issue was already fixed, by a different patch which
stops the iteration when shutting down - not waiting for an infinite
number of iterations and not even one more).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-11-30 18:42:37 +02:00
Nadav Har'El
d08eef5a30 alternator: fix hang during shutdown of expiration-scanning thread
The expiration-scanning thread is a long-running thread which can scan
data for hours, but checks for its abort-source before fetching each
page to allow for timely shutdown. Recently, we added the ability to
retry the page fetching in case of timeout, for forgot to check the
abort source in this new retry loop - which lead to an infinitely-long
shutdown in some tests while the retry loop retries forever.

In this patch we fix this bug by using sleep_abortable() instead of
sleep(). sleep_abortable() will throw an exception if the abort source
was triggered before or during the sleep - and this exception will
stop the scan immediately.

Fixes #12145

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-11-30 18:38:17 +02:00
Jan Ciolek
05ea0c1d60 dev/docs: add additional git pull to backport docs
Botond noted that an additional git pull
might be needed here:
https://github.com/scylladb/scylladb/pull/12138#discussion_r1035857007

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-30 16:14:02 +01:00
Jan Ciolek
e74873408b docs/dev: add a note about cherry-picking individual commits
Some people prefer to cherry-pick individual commits
so that they have less conflicts to resolve at once.

Add a comment about this possibility.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-30 16:06:39 +01:00
Kamil Braun
0f9d0dd86e Merge 'raft: support IP address change' from Konstantin Osipov
This is the core of dynamic IP address support in Raft, moving out the
IP address sourcing from Raft Group 0 configuration to gossip. At start
of Raft, the raft id <> IP address translation map is tuned into the
gossiper notifications and learns IP addresses of Raft hosts from them.

The series intentionally doesn't contain the part which speeds up the
initial cluster assembly by persisting the translation cache and using
more sources besides gossip (discovery, RPC) to show correctness of the
approach.

Closes #12035

* github.com:scylladb/scylladb:
  raft: (rpc) do not throw in case of a missing IP address in RPC
  raft: (address map) actively maintain ip <-> raft server id map
2022-11-30 15:40:18 +01:00
Aleksandra Martyniuk
78a6193c01 repair: reindent 2022-11-30 13:53:52 +01:00
Aleksandra Martyniuk
b4ad914fe1 repair: replace repair_info with shard_repair_task_impl
repair_info is deleted and all its attributes are moved to
shard_repair_task_impl.
2022-11-30 13:53:52 +01:00
Aleksandra Martyniuk
f6ec2cec92 repair: move repair_info methods to shard_repair_task_impl 2022-11-30 13:53:18 +01:00
Jan Ciolek
32663e6adb docs/dev: use 'is merged into' instead of 'becomes'
The backport instructions said that after passing
the tests next `becomes` master, but it's more
exact to say that next `is merged into` master.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-30 13:25:10 +01:00
Jan Ciolek
28cf8a18de docs/dev: mention that new backport instructions are for the contributor
Previously the section was called:
"How to backport a patch", which could be interpreted
as instructions for the maintainer.

The new title clearly states that these instructions
are for the contributor in case the maintainer couldn't
backport the patch by themselves.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-30 13:23:15 +01:00
Takuya ASADA
4ecc08c4fe docker: switch default locale to C.UTF-8
Since we switched scylla-machine-image locale to C.UTF-8 because
ubuntu-minimal image does not have en_US.UTF-8 by default, we should
do same on our docker image to reduce image size.

Verified #9570 does not occur on new image, since it is still UTF-8
locale.

Closes #12122
2022-11-30 13:58:43 +02:00
Anna Stuchlik
15cc3ecf64 doc: update the releases in the KB about updating the mode after upgrade 2022-11-30 12:53:13 +01:00
Anna Stuchlik
242a3916f0 doc: fix the broken link in the 5.1 upgrade guide 2022-11-30 12:49:20 +01:00
Alejo Sanchez
f7aa08ef25 test.py: don't stop cluster's site if not started
The site member is created in ScyllaCluster.start(), for startup failure
this might not be initialized, so check it's present before stop()ing
it. And delete it as it's not running and proper initialization should
call ScyllaCluster.start().

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #11939
2022-11-30 13:47:18 +02:00
Anna Stuchlik
1575d96856 doc: add the link to the 5.1-related KB article to the 5.1 upgrade guide 2022-11-30 12:40:49 +01:00
Nadav Har'El
ce347f4b67 test/cql-pytest: add test for meaning of fetch_size with filtering
A question was raised on what fetch_size (the requested page size
in a paged scan) counts when there is a filter: does it count the
rows before filtering (as scanned from disk) or after filter (as
will be returned to the client)?

This patch adds a test which demonstrates that Cassandra and Scylla
behave differently in this respect: Cassandra counts post-filtering -
so fetch_size results are actually returned, while Scylla currently
counts pre-filtering.

It is arguable which behavior is the "correct" one - we discuss this in
issue #12102. But we have already had several users (such as #11340)
who complained about Scylla's behavior and expected Cassandra's behavior,
so if we decide to keep Scylla's behavior we should at least explain and
justify this decision in our documentation. Until then, let's have this
test which reminds us of this incompatibility. This test currently passes
on Cassandra and fails (xfail) on Scylla.

Refs #11340
Refs #12102

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12103
2022-11-30 12:27:06 +02:00
Nadav Har'El
8bd8ef3d03 test/cql-pytest: add regression test for old issue
This patch adds a regression test for the old issue #65 which is about
a multi-column (tuple) clustering-column relation in a SELECT when one
these columns has reversed order. It turns out that we didn't notice,
but this issue was already solved - but we didn't have a regression test
for it. So this patch adds just a regression test. The test confirms that
Scylla now behaves like was desired when that issue was opened. The test
also passes on Cassandra, confirming that Scylla and Cassandra behave
the same for such requests.

Fixes #65

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12130
2022-11-30 12:22:21 +02:00
Michał Jadwiszczak
8e64e18b80 forward_service: add debug logs
Adds a few debug logs to see what is happening in https://github.com/scylladb/scylladb/issues/11684

Wrapped `forward_result::printer` into `seastar::value_of` to lazy
evaluate the printer

Closes #12113
2022-11-30 12:15:26 +02:00
Yaniv Kaul
b66ca3407a doc: Typo - then -> than
Fix a typo.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes #12140
2022-11-30 12:03:56 +02:00
Botond Dénes
50aea9884b Merge 'Improve the Raft upgrade procedure' from Kamil Braun
Better logging, less code, a minor fix.

Closes #12135

* github.com:scylladb/scylladb:
  service/raft: raft_group0: less repetitive logging calls
  service/raft: raft_group0: fix sleep_with_exponential_backoff
2022-11-30 11:24:20 +02:00
Avi Kivity
6a5d9ff261 treewide: use non-experimental std::source_location
Now that we use libstdc++ 12, we can use the standardized
source_location.

Closes #12137
2022-11-30 11:06:43 +02:00
Jan Ciolek
56a802c979 docs/dev: Add backport instructions for contributors
Add instructions on how to backport a feature
to on older version of Scylla.

It contains a detailed step-by-step instruction
so that people unfamiliar with intricacies
of Scylla's repository organization can
easily get the hang of it.

This is the guide I wish I had when I had
to do my first backport.

I put it in backport.md because that
looks like the file responsible
for this sort of information.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-29 22:10:27 +01:00
Konstantin Osipov
fbe7886cc0 raft: (rpc) do not throw in case of a missing IP address in RPC
Remove raft_address_map::get_inet_address()

While at it, coroutinize some rpc mehtods.

To propagate up the event of missing IP address, use coroutine::exception(
with a proper type (raft::transport_error) and a proper error message.

This is a building block from removing
raft_address_map::get_inet_address() which is too generic, and shifting
the responsibility of handling missing addresses to the address map
clients. E.g. one-way RPC shouldn't throw if an address is missing, but
just drop the message.

PS An attempt to use a single template function rendered to be too
complex:
- some functions require a gate, some don't
- some return void, some future<> and some future<raft::data_type>
2022-11-29 19:55:48 +03:00
Konstantin Osipov
73e5298273 raft: (address map) actively maintain ip <-> raft server id map
1) make address map API flexible

Before this patch:
- having a mapping without an actual IP address was an
  internal error
- not having a mapping for an IP address was an internal
  error
- re-mapping to a new IP address wasn't allowed

After this patch:

- the address map may contain a mapping
  without an actual IP address, and the caller must be prepared for it:
  find() will return a nullopt. This happens when we first add an entry
  to Raft configuration and only later learn its IP address, e.g.  via
  gossip.

- it is allowed to re-map an existing entry to a new address;
2) subscribe to gossip notifications

Learning IP addresses from gossip allows us to adjust
the address map whenever a node IP address changes.
Gossiper is also the only valid source of re-mapping, other sources
(RPC) should not re-map, since otherwise a packet from a removed
server can remap the id to a wrong address and impact liveness of a Raft
cluster.

3) prompt address map state with app state

Initialize the raft address map with initial
gossip application state, specifically IPs of members
of the cluster. With this, we no longer need to store
these IPs in Raft configuration (and update them when they change).

The obvious drawback of this approach is that a node
may join Raft config before it propagates its IP address
to the cluster via gossip - so the boot process has to
wait until it happens.

Gossip also doesn't tell us which IPs are members of Raft configuration,
so we subscribe to Group0 configuration changes to mark the
members of Raft config "non-expiring" in the address translation
map.

Thanks to the changes above, Raft configuration no longer
stores IP addresses.

We still keep the 'server_info' column in the raft_config system table,
in case we change our mind or decide to store something else in there.
2022-11-29 19:55:43 +03:00
Kamil Braun
3dbcff435f service/raft: raft_group0: less repetitive logging calls
Some log messages in retry loops in the Raft upgrade procedure included
a sentence like "sleeping before retrying..."; but not all of them.

With the recently added `sleep_with_exponential_backoff` abstraction we
can put this "sleeping..." message in a single place, and it's also easy
to say how long we're going to sleep.

I also enjoy using this `source_location` thing.
2022-11-29 17:42:43 +01:00
Nadav Har'El
c5121cf273 cql: fix column-name aliases in SELECT JSON
The SELECT JSON statement, just like SELECT, allows the user to rename
selected columns using an "AS" specification. E.g., "SELECT JSON v AS foo".
This specification was not honored: We simply forgot to look at the
alias in SELECT JSON's implementation (we did it correctly in regular
SELECT). So this patch fixes this bug.

We had two tests in cassandra_tests/validation/entities/json_test.py
that reproduced this bug. The checks in those tests now pass, but these
two tests still continue to fail after this patch because of two other
unrelated bugs that were discovered by the same tests. So in this patch
I also add a new test just for this specific issue - to serve as a
regression test.

Fixes #8078

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12123
2022-11-29 18:16:19 +02:00
Avi Kivity
faf11587fa Update seastar submodule
* seastar 4f4cc00660...3a5db04197 (16):
  > tls: add missing include <map>
  > Merge 'util/process: use then_unpack to help automatically unpack tuple.' from Jianyong Chen
  > HTTP: define formatter for status_type to fix build.
  > fsnotifier: move it into namespace experimental and add docs.
  > Move fsnotify.hh to the 'include' directory for public use.
  > Merge 'reactor: define make_pipe() and use make_pipe() in reactor::spawn()' from Kefu Chai
  > Merge 'Fix: error when compiling http_client_demo' from Amossss
  > util/process: using `data_sink_impl::put`
  > Merge 'dns: serialize UDP sends.' from Calle Wilund
  > build: use correct version when finding liburing
  > Merge 'Add simple http client' from Pavel Emelyanov
  > future: use invoke_result instead of nested requirements
  > Merge 'reactor: use separate calls in reactor and reactor_backend for read/write/sendmsg/recvmsg' from Kefu Chai
  > util, core: add spawn_process() helper
  > parallel utils: add note about shard-local parallelism
  > shared_mutex: return typed exceptional future in with_* error handlers

Closes #12131
2022-11-29 18:10:06 +02:00
Kamil Braun
580bdec875 service/raft: raft_group0: fix sleep_with_exponential_backoff
It was immediately jumping to _max_retry_period.
2022-11-29 16:27:59 +01:00
Nadav Har'El
6bc3075bbd test/alternator: increase timeout on TTL tests
Some of the tests in test/alternator/test_ttl.py need an expiration scan
pass to complete and expire items. In development builds on developer
machines, this usually takes less than a second (our scanning period is
set to half a second). However, in debug builds on Jenkins each scan
often takes up to 100 (!) seconds (this is the record we've seen so far).
This is why we set the tests' timeout to 120.

But recently we saw another test run failing. I think the problem is
that in some case, we need not one, but *two* scanning passes to
complete before the timeout: It is possible that the test writes an
item right after the current scan passed it, so it doesn't get expired,
and then we a second scan at a random position, possibly making that
item we mention one of the last items to be considered - so in total
we need to wait for two scanning periods, not one, for the item to
expire.

So this patch increases the timeout from 120 seconds to 240 seconds -
more than twice the highest scanning time we ever saw (100 seconds).

Note that this timeout is just a timeout, it's not the typical test
run time: The test can finish much more quickly, as little as one
second, if items expire quickly on a fast build and machine.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12106
2022-11-29 16:37:54 +03:00
Nadav Har'El
1f8adda4b2 Merge 'treewide: improve compatibility with gcc 12' from Avi Kivity
Fix some issues found with gcc 12. Note we can't fully compile with gcc yet, due to [1].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056

Closes #12121

* github.com:scylladb/scylladb:
  utils: observer: qualify seastar::noncopyable_function
  sstables: generation_type: forgo constexpr on hash of generation_type
  logalloc: disambiguate types and non-type members
  task_manager: disambiguate types and non-type members
  direct_failure_detector: don't change meaning of endpoint_liveness
  schema: abort on illegal per column computation kind
  database: abort on illegal per partition rate limit operation
  mutation_fragment: abort on illegal fragment type
  per_partition_rate_limit_options: abort on illegal operation type
  schema: drop unused lambda
  mutation_partition: drop unused lambda
  cql3: create_index_statement: remove unused lambda
  transport: prevent signed and unsigned comparison
  database: don't compare signed and unsigned types
  raft: don't compare signed and unsigned types
  compaction: don't compare signed and unsigned compaction counts
  bytes_ostream: don't take reference to packed variable
2022-11-29 13:57:24 +02:00
Avi Kivity
ea99750de7 test: give tests less-unique identifiers
Test identifiers are very unique, but this makes them less
useful in Jenkins Test Result Analyzer view. For example,
counter_test can be counter_test.432 in one run and counter_test.442
in another. Jenkins considers them different and so we don't see
a trend.

Limit the id uniqueness within a test case, so that we'll have
counter_test.{1, 2, 3} consistently. Those test will be grouped
together so we can see pass/fail trends.

Closes #11946
2022-11-29 13:14:14 +02:00
Yaniv Kaul
fef8e43163 doc: cluster management: Replace a misplaced period with a a bulleted list of items
Signed-Off-By: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes #12125
2022-11-29 12:42:24 +02:00
Botond Dénes
e9fec761a2 Merge 'doc: document the procedure for updating the mode after upgrade' from Anna Stuchlik
Fix https://github.com/scylladb/scylla-docs/issues/4126

Closes #11122

* github.com:scylladb/scylladb:
  doc: add info about the time-consuming step due to resharding
  doc: add the new KB to the toctree
  doc: doc: add a KB about updating the mode in perftune.yaml after upgrade
2022-11-29 12:41:46 +02:00
Avi Kivity
ea901fdb9d cql3: expr: fold null into untyped_constant/constant
Our `null` expression, after the prepare stage, is redundant with a
`constant` expression containing the value NULL.

Remove it. Its role in the unprepared stage is taken over by
untyped_constant, which gains a new type_class enumeration to
represent it.

Some subtleties:
 - Usually, handling of null and untyped_constant, or null and constant
   was the same, so they are just folded into each other
 - LWT "like" operator now has to discriminate between a literal
   string and a literal NULL
 - prepare and test_assignment were folded into the corresponing
   untyped_constant functions. Some care had to be taken to preserve
   error messages.

Closes #12118
2022-11-29 11:02:18 +02:00
Aleksandra Martyniuk
8bc0af9e34 repair: fix double start of data sync repair task
Currently, each data sync repair task is started (and hence run) twice.
Thus, when two running operations happen within a time frame long
enough, the following situation may occur:
- the first run finishes
- after some time (ttl) the task is unregistered from the task manager
- the second run finishes and attempts to finish the task which does
  not exist anymore
- memory access causes a segfault.

The second call to start is deleted. A check is added
to the start method to ensure that each task is started at most once.

Fixes: #12089

Closes #12090
2022-11-29 00:00:10 +02:00
Avi Kivity
9765b2e3bc cql3: expr: drop remnants of bool component from expression
In ad3d2ee47d, we replaced `bool` as an expression element
(representing a boolean constant) with `constant`. But a comment
and a concept continue to mention it.

Remove the comment and the concept fragment.

Closes #12119
2022-11-28 23:18:26 +02:00
Pavel Emelyanov
ae79669fd2 topology: Be less restrictive about missing endpoints
Recent changes in topology restricted the get_dc/get_rack calls. Older
code was trying to locate the endpoint in gossiper, then in system
keyspace cache and if the endpoint was not found in both -- returned
"default" location.

New code generates internal error in this case. This approach already
helped to spot several BUGs in code that had been eventually fixed, but
echoes of that change still pop up.

This patch relaxes the "missing endpoint" case by printing a warning in
logs and returning back the "default" location like old code did.

tests: update_cluster_layout_tests.py::*
       hintedhandoff_additional_test.py::TestHintedHandoff::test_hintedhandoff_rebalance
       bootstrap_test.py::TestBootstrap::test_decommissioned_wiped_node_can_join
       bootstrap_test.py::TestBootstrap::test_failed_bootstap_wiped_node_can_join
       materialized_views_test.py::TestMaterializedViews::test_decommission_node_during_mv_insert_4_nodes

refs: #11900
refs: #12054
fixes: #11870

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12067
2022-11-28 22:01:09 +02:00
Avi Kivity
3a6eafa8c6 utils: observer: qualify seastar::noncopyable_function
gcc checks name resolution eagerly, and can't find noncopyable_function
as this header doesn't include "seastarx.hh". Qualify the name
so it finds it.
2022-11-28 21:58:30 +02:00
Avi Kivity
5ae98ab3de sstables: generation_type: forgo constexpr on hash of generation_type
std::hash isn't constexpr, so gcc refuses to make hash of generation_type
constexpr. It's pointless anyway since we never have a compile-time
sstable generation.
2022-11-28 21:58:30 +02:00
Avi Kivity
a2d43bb851 logalloc: disambiguate types and non-type members
logalloc::tracker has some members with the same names as types from
namespace scope. gcc (rightfully) complains that this changes
the meaning of the name. Qualify the types to disambiguate.
2022-11-28 21:58:30 +02:00
Avi Kivity
ed5da87930 task_manager: disambiguate types and non-type members
task_manager has some members with the same names as types from
namespace scope. gcc (rightfully) complains that this changes
the meaning of the name. Qualify the types to disambiguate.
2022-11-28 21:58:30 +02:00
Avi Kivity
27be1670d1 direct_failure_detector: don't change meaning of endpoint_liveness
It's used both as a type and as a member. Qualify the type so they
have different names.
2022-11-28 21:58:30 +02:00
Avi Kivity
735c46cb63 schema: abort on illegal per column computation kind
Without memory corruption it's not possible for the switch to
fall through, and the compiler will error if we forget to add
a case. The compiler however is obliged to consider that we might
store some other value in the variable.
2022-11-28 21:58:30 +02:00
Avi Kivity
f73a51250c database: abort on illegal per partition rate limit operation
Without memory corruption it's not possible for the switch to
fall through, and the compiler will error if we forget to add
a case. The compiler however is obliged to consider that we might
store some other value in the variable.
2022-11-28 21:58:30 +02:00
Avi Kivity
f469885b41 mutation_fragment: abort on illegal fragment type
Without memory corruption it's not possible for the switch to
fall through, and the compiler will error if we forget to add
a case. The compiler however is obliged to consider that we might
store some other value in the variable.
2022-11-28 21:58:30 +02:00
Avi Kivity
a3c89cedbd per_partition_rate_limit_options: abort on illegal operation type
Without memory corruption it's not possible for the switch to
fall through, and the compiler will error if we forget to add
a case. The compiler however is obliged to consider that we might
store some other value in the variable.
2022-11-28 21:58:30 +02:00
Avi Kivity
7ec28a81bf schema: drop unused lambda
get_cell is defined but not used.
2022-11-28 21:58:30 +02:00
Avi Kivity
c493a2379a mutation_partition: drop unused lambda
should_purge_row_tombstone is defined but not used.
2022-11-28 21:58:30 +02:00
Avi Kivity
e25bf62871 cql3: create_index_statement: remove unused lambda
throw_exception is defined but not used.
2022-11-28 21:58:30 +02:00
Avi Kivity
5dedf85288 transport: prevent signed and unsigned comparison
This can lead to undefined behavior. Cast to unsigned, after
we've verified the value is indeed positive.
2022-11-28 21:58:30 +02:00
Avi Kivity
77be69b600 database: don't compare signed and unsigned types
gcc warns it can lead to undefined behavior, though 2G entries
in a list of mutations are unlikely. Use the correct type for iteration.
2022-11-28 21:58:30 +02:00
Avi Kivity
fb6804e7a4 raft: don't compare signed and unsigned types
gcc warns it can lead to undefined behavior, though 2G entries
in a list of mutations are unlikely. Use the correct type for iteration.
2022-11-28 21:58:30 +02:00
Avi Kivity
f565db75ce compaction: don't compare signed and unsigned compaction counts
gcc warns as this can lead to incorrect results. Cast the threshold
to an unsigned type (we know it's positive at this point) to avoid
the warning.
2022-11-28 21:41:56 +02:00
Avi Kivity
23b94ac391 bytes_ostream: don't take reference to packed variable
bytes_ostream is packed, so its _begin member is packed as well.
gcc (correctly) disallows taking a reference to an unaligned variable
in an aligned refernce, and complains.

Make it happy by open-coding the exchange operation.
2022-11-28 21:40:18 +02:00
Nadav Har'El
5480211061 Merge 'test.py: support node replace operation' from Kamil Braun
The `add_server` function now takes an optional `ReplaceConfig` struct
(implemented using `NamedTuple`), which specifies the ID of the replaced
server and whether to reuse the IP address.

If we want to reuse the IP address, we don't allocate one using the host
registry. This required certain refactors: moving the code responsible
for allocation of IPs outside `ScyllaServer`, into `ScyllaCluster`.

Add two tests, but they are now skipped: one of them is failing (unability
for new node to join group 0) and both suffer from a hardcoded 60-second sleep
in Scylla.

Closes #12032

* github.com:scylladb/scylladb:
  test/topology: simple node replace tests (currently disabled)
  test/pylib: scylla_cluster: support node replace operation
  test/pylib: scylla_cluster: move members initialization to constructor
  test/pylib: scylla_cluster: (re)lease IP addr outside ScyllaServer
  test/pylib: scylla_cluster: refactor create_server parameters to a struct
  test.py: stop/uninstall clusters instead of servers when cleaning up
  test/pylib: artifact_registry: replace `Awaitable` type with `Coroutine`
  test.py: prepare for adding extra config from test when creating servers
  test/pylib: manager_client: convert `add_server` to use `put_json`
  test/pylib: rest_client: allow returning JSON data from `put_json`
  test/pylib: scylla_cluster: don't import from manager_client
2022-11-28 16:06:39 +02:00
Takuya ASADA
4d8fb569a1 install.sh: drop locale workaround from python3 thunk
Since #7408 does not occur on current python3 version (3.11.0), let's drop
the workarond.

Closes #12097
2022-11-28 13:07:03 +02:00
Anna Stuchlik
452915cef6 doc: set the documentation version 5.1 as default (latest)
Closes #12105
2022-11-28 12:02:13 +01:00
Avi Kivity
380da0586c Update tools/python3 submodule (drop locale workaround)
* tools/python3 773070e...548e860 (1):
  > install.sh: drop locale workaround from python3 thunk
2022-11-28 12:24:13 +02:00
Avi Kivity
0da66371a5 storage_proxy: coroutinize inner continuation of create_hint_sync_point()
It is part of a coroutine::parallel_for_each(), which is safe for lambda coroutines.

Closes #12057
2022-11-28 11:30:00 +02:00
Avi Kivity
d12d42d1a6 Revert "configure: temporarily disable wasm support for aarch64"
This reverts commit e2fe8559ca. I
ran all the release mode tests on aarch64 with it reverted, and
it passes. So it looks like whatever problems we had with it
were fixed.

Closes #12072
2022-11-28 11:30:00 +02:00
Nadav Har'El
99a72a9676 Merge 'cql3: expr: make it possible to evaluate expr::binary_operator' from Jan Ciołek
As a part of CQL rewrite we want to be able to perform filtering by calling `evaluate()` on an expression and checking if it evaluates to `true`. Currently trying to do that for a binary operator would result in an error.

Right now checking if a binary operation like `col1 = 123` is true is done using `is_satisfied_by`, which is able to check if a binary operation evaluates to true for a small set of predefined cases.

Eventually once the grammar is relaxed we will be able to write expressions like: `(col1 < col2) = (1 > ?)`, which doesn't fit with what `is_satisfied_by` is supposed to do.
Additionally expressions like `1 = NULL` should evaluate to `NULL`, not `true` or `false`. `is_satsified_by` is not able to express that properly.

The proper way to go is implementing `evaluate(binary_operator)`, which takes a binary operation and returns what the result of it would be.

Implementing `prepare_expression` for `binary_operator` requires us to be able to evaluate it first. In the next PR I will add support for `prepare_expression`.

Closes #12052

* github.com:scylladb/scylladb:
  cql-pytest: enable two unset value tests that pass now
  cql-pytest: reduce unset value error message
  cql3: expr: change unset value error messages to lowercase
  cql_pytest: ensure that where clauses like token(p) = 0 AND p = 0 are rejected
  cql3: expr: remove needless braces around switch cases
  cql3: move evaluation IS_NOT NULL to a separate function
  expr_test: test evaluating LIKE binary_operator
  expr_test: test evaluating IS_NOT binary_operator
  expr_test: test evaluating CONTAINS_KEY binary_operator
  expr_test: test evaluating CONTAINS binary_operator
  expr_test: test evaluating IN binary_operator
  expr_test: test evaluating GTE binary_operator
  expr_test: test evaluating GT binary_operator
  expr_test: test evaluating LTE binary_operator
  expr_test: test evaluating LT binary_operator
  expr_test: test evaluating NEQ binary_operator
  expr_test: test evaluating EQ binary_operator
  cql3: expr properly handle null in is_one_of()
  cql3: expr properly handle null in like()
  cql3: expr properly handle null in contains_key()
  cql3: expr properly handle null in contains()
  cql3: expr: properly handle null in limits()
  cql3: expr: remove unneeded overload of limits()
  cql3: expr: properly handle null in equality operators
  cql3: expr: remove unneeded overload of equal()
  cql3: expr: use evaluate(binary_operator) in is_satisfied_by
  cql3: expr: handle IS NOT NULL when evaluating binary_operator
  cql3: expr: make it possible to evaluate binary_operator
  cql3: expr: accept expression as lhs argument to like()
  cql3: expr: accept expression as lhs in contains_key
  cql3: expr: accept expression as lhs argument to contains()
2022-11-28 11:30:00 +02:00
Nadav Har'El
1e59c3f9ef alternator: if TTL scan times out, continue immediately
The Alternator TTL expiration scanner scans an entire table using many
small pages. If any of those pages time out for some reason (e.g., an
overload situation), we currently consider the entire scan to have failed
and wait for the next scan period (which by default is 24 hours) when
we start the scan from scratch (at a random position). There is a risk
that if these timeouts are common enough to occur once or more per
scan, the result is that we double or more the effective expiration lag.

A better solution, done in this patch, is to retry from the same position
if a single page timed out - immediately (or almost immediately, we add
a one-second sleep).

Fixes #11737

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12092
2022-11-28 11:30:00 +02:00
Avi Kivity
45a57bf22d Update tools/java submodule (revert scylla-driver)
scylla-driver causes dtests to fail randomly (likely
due to incorrect handling of the USE statement). Revert
it.

* tools/java 73422ee114...1c06006447 (2):
  > Revert "Add Scylla Cloud serverless support"
  > Revert "Switch cqlsh to use scylla-driver"
2022-11-28 11:29:08 +02:00
Benny Halevy
8f584a9a80 storage_service: handle_state_normal: always update_topology before update_normal_tokens
update_normal_tokens checks that that the endpoint is in topology.
Currently we call update_topology on this path only if it's
not a normal_token_owner, but there are paths when the
endpoint could be a normal token owner but still
be pending in topology so always update it, just in case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-28 11:25:36 +02:00
Benny Halevy
6b13fd108a storage_service: handle_state_normal: delete outdated comment regarding update pending ranges race
asias@scylladb.com said:
> This comments was moved up to the wrong place when tmptr->update_topology was added.
> There is no race now since we use the copy-update-replace method to update token_metadada.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-28 11:25:36 +02:00
Kefu Chai
af011aaba1 utils/variant_element: simplify is_variant_element with right fold
for better readability than the recursive approach.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>

Closes #12091
2022-11-27 16:34:34 +02:00
Avi Kivity
78222ea171 Update tools/java submodule (cqlsh system_distributed_everywhere is a system keyspace)
* tools/java 874e2d529b...73422ee114 (1):
  > Mark "system_distributed_everywhere" as system ks
2022-11-27 15:37:57 +02:00
Aleksandra Martyniuk
9a3d114349 tasks: move methods from task_manager to source file
Methods from tasks::task_manager and nested classes are moved
to source file.

Closes #12064
2022-11-27 15:09:28 +02:00
Piotr Dulikowski
22fbf2567c utils/abi: don't use the deprecated std::unexpected_handler
Recently, clang started complaining about std::unexpected_handler being
deprecated:

```
In file included from utils/exceptions.cc:18:
./utils/abi/eh_ia64.hh:26:10: warning: 'unexpected_handler' is deprecated [-Wdeprecated-declarations]
    std::unexpected_handler unexpectedHandler;
         ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/exception:84:18: note: 'unexpected_handler' has been explicitly marked deprecated here
  typedef void (*_GLIBCXX11_DEPRECATED unexpected_handler) ();
                 ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/x86_64-redhat-linux/bits/c++config.h:2343:32: note: expanded from macro '_GLIBCXX11_DEPRECATED'
                               ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/x86_64-redhat-linux/bits/c++config.h:2334:46: note: expanded from macro '_GLIBCXX_DEPRECATED'
                                             ^
1 warning generated.
```

According to cppreference.com, it was deprecated in C++11 and removed in
C++17 (!).

This commit gets rid of the warning by inlining the
std::unexpected_handler typedef, which is defined as a pointer a
function with 0 arguments, returning void.

Fixes: #12022

Closes #12074
2022-11-27 12:25:20 +02:00
Alejo Sanchez
5ff4b8b5f8 pytest: catch rare exception for random tables test
On rare occassions a SELECT on a DROPpped table throws
cassandra.ReadFailure instead of cassandra.InvalidRequest. This could
not be reproduced locally.

Catch both exceptions as the table is not present anyway and it's
correctly marked as a failure.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #12027
2022-11-27 10:26:55 +02:00
Michał Chojnowski
a75e4e1b23 db: config: disable global index page caching by default
Global index page caching, as introduced in 4.6
(078a6e422b and 9f957f1cf9) has proven to be misdesigned,
because it poses a risk of catastrophic performance regressions in
common workloads by flooding the cache with useless index entries.
Because of that risk, it should be disabled by default.

Refs #11202
Fixes #11889

Closes #11890
2022-11-26 14:27:26 +02:00
Aleksandra Martyniuk
c2ea3f49e6 repair: rename methods of repair_module
Methods of repair_module connected with repair_module::_repairs
are renamed to match repair_module::_repairs type.
2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk
13dbd75ba8 repair: change type of repair_module::_repairs
As a preparation to replacing repair_info with shard_repair_task_impl,
type of _repairs in repair module is changed from
std::unordered_map<int, lw_shared_ptr<repair_info>> to
std::unordered_map<int, tasks::task_id>.
2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk
55c01a1beb repair: keep a reference to shard_repair_task_impl in row_level_repair
As a part of replacing repair_info with shard_repair_task_impl,
instead of a reference to repair_info, row_level_repair keeps
a reference to shard_repair_task_impl.
2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk
9b664570f0 repair: move repair_range method to shard_repair_task_impl 2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk
3ac5ba7b28 repair: make do_repair_ranges a method of shard_repair_task_impl
Function do_repair_ranges is directly connected to shard repair tasks.
Turning it into shard_repair_task_impl method enables an access to tasks'
members with no additional intermediate layers.
2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk
a09dfcdacd repair: copy repair_info methods to shard_repair_task_impl
Methods of repair_info are copied to shard_repair_task_impl. They are
not used yet, it's a preparation for replacing repair_info with
shard_repair_task_impl.
2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk
a4b1bdb56c repair: corutinize shard task creation 2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk
996c0f3476 repair: define run for shard_repair_task_impl
Operations performed as a part of shard repair are moved
to shard_repair_task_impl run method.
2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk
ba9770ea02 repair: add shard_repair_task_impl
Create a task spanning over a repair performed on a given shard.
2022-11-25 16:40:49 +01:00
Anna Stuchlik
d5f676106e doc: remove the LWT page from the index of Enterprise features
Closes #12076
2022-11-24 21:59:05 +02:00
Aleksandra Martyniuk
dcc17037c7 repair: fix bad cast in tasks::task_id parsing
In system_keyspace::get_repair_history value of repair_uuid
is got from row as tasks::task_id.
tasks::task_id is represented by an abstract_type specific
for utils::UUID. Thus, since their typeids differ, bad_cast
is thrown.

repair_uuid is got from row as utils::UUID and then cast.
Since no longer needed, data_type_for<tasks::task_id> is deleted.

Fixes: #11966

Closes #12062
2022-11-24 19:37:44 +02:00
Jan Ciolek
77c7d8b8f6 cql-pytest: enable two unset value tests that pass now
While implementing evaluate(binary_operator)
missing checks for unset value were added
for comparisons in filtering code.

Because of that some tests for unset value
started passing.

There are still other tests for unset value
that are failing because Scylla doesn't
have all the checks that it should.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-24 17:07:17 +01:00
Jan Ciolek
5bc0bc6531 cql-pytest: reduce unset value error message
When unset value appears in an invalid place
both Cassandra and Scylla throw an error.

The tests were written with Cassandra
and thus the expected error messages were
exactly the same as produced by Cassandra.

Scylla produces different error messages,
but both databases return messages with
the text 'unset value'.

Reduce the expected message text
from the whole message to something
that contains 'unset value'.

It would be hard to mimic Cassandra's
error messages in Scylla. There is no
point in spending time on that.
Instead it's better to modify the tests
so that they are able to work with
both Cassandra and Scylla.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-24 17:04:07 +01:00
Jan Ciolek
08f40a116d cql3: expr: change unset value error messages to lowercase
The messages used to contain UNSET_VALUE
in capital letters, but the tests
expect messages with 'unset value'.

Change the message so that it can
match the expected error text in tests.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-24 17:02:44 +01:00
Kamil Braun
fda6403b29 test/topology: simple node replace tests (currently disabled)
Add two node replace tests using the freshly added infrastructure.

One test replaces a node while using a different IP. It is disabled
because the replace operation has an unconditional 60-seconds sleep
(it doesn't depend on the ring_delay setting for some reason). The sleep
needs to be fixed before we can enable this test.

The other test replaces while reusing the replaced node's IP.
Additionally to the sleep, the test fails because the node cannot join
group 0; it's stuck in an infinite loop of trying to join:
```
INFO  2022-11-18 15:56:19,933 [shard 0] raft_group0 - server 8de951fd-a528-4a82-ac54-592ea269537f found no local group 0. Discovering...
INFO  2022-11-18 15:56:19,933 [shard 0] raft_group0 - server 8de951fd-a528-4a82-ac54-592ea269537f found group 0 with group id 25d2b050-6751-11ed-b534-c3c40c275dd3, leader b7047f7e-03e6-4797-a723-24054201f91d
INFO  2022-11-18 15:56:19,934 [shard 0] raft_group0 - Server 8de951fd-a528-4a82-ac54-592ea269537f is starting group 0 with id 25d2b050-6751-11ed-b534-c3c40c275dd3
WARN  2022-11-18 15:56:20,935 [shard 0] raft_group0 - failed to modify config at peer b7047f7e-03e6-4797-a723-24054201f91d: seastar::rpc::timeout_error (rpc call timed out). Retrying.
INFO  2022-11-18 15:56:21,937 [shard 0] raft_group0 - server 8de951fd-a528-4a82-ac54-592ea269537f found group 0 with group id 25d2b050-6751-11ed-b534-c3c40c275dd3, leader ee0175ea-6159-4d4c-9d7c-95c934f8a408
WARN  2022-11-18 15:56:22,937 [shard 0] raft_group0 - failed to modify config at peer ee0175ea-6159-4d4c-9d7c-95c934f8a408: seastar::rpc::timeout_error (rpc call timed out). Retrying.
INFO  2022-11-18 15:56:23,938 [shard 0] raft_group0 - server 8de951fd-a528-4a82-ac54-592ea269537f found group 0 with group id 25d2b050-6751-11ed-b534-c3c40c275dd3, leader ee0175ea-6159-4d4c-9d7c-95c934f8a408
WARN  2022-11-18 15:56:24,939 [shard 0] raft_group0 - failed to modify config at peer ee0175ea-6159-4d4c-9d7c-95c934f8a408: seastar::rpc::timeout_error (rpc call timed out). Retrying.
```
and so on.
2022-11-24 16:26:23 +01:00
Kamil Braun
2f60550ff3 test/pylib: scylla_cluster: support node replace operation
The `add_server` function now takes an optional `ReplaceConfig` struct
(implemented using `NamedTuple`), which specifies the ID of the replaced
server and whether to reuse the IP address.

If we want to reuse the IP address, we don't allocate one using the host
registry.

Since now multiple servers can have the same IP, introduce a
`leased_ips` set to `ScyllaCluster` which is used when `uninstall`ing
the cluster - to make sure we don't `release_host` the same host twice.
2022-11-24 16:26:23 +01:00
Kamil Braun
d80247f912 test/pylib: scylla_cluster: move members initialization to constructor
Previously some members had to be initialized in `install` because
that's when we first knew the IP address.

Now we know the IP address during construction, which allows us to make
the code a bit shorter and simpler, and establish invariants: some
members (such as `self.config`) are now valid for the entire lifetime of
the server object.

`install()` is reduced to performing only side effects (creating
directories, writing config files), all calculation is done inside the
constructor.
2022-11-24 16:26:23 +01:00
Kamil Braun
3934eefd20 test/pylib: scylla_cluster: (re)lease IP addr outside ScyllaServer
`ScyllaServer`s were constructed without IP addresses. They leased an IP
address from `HostRegistry` and released them in `uninstall`.

This responsibility was now moved into `ScyllaCluster`, which leases an
IP address for a server before constructing it, and passes it to the
constructor. It releases the addresses of its serverswhen uninstalling
itself.

This will allow the cluster to reuse the IP address of an existing
server in that cluster when adding a new server which wants to replace
the existing one. Instead of leasing a new address, it will pass
the existing IP address to the new server's constructor.

The refactor is also nice in that it establishes an invariant for
`ScyllaServer`, simplifying reasoning about the class: now it has
an `ip_addr` field at all times.

`host_registry` was moved from `ScyllaServer` to `ScyllaCluster`.
2022-11-24 16:26:23 +01:00
Kamil Braun
9d5e1191da test/pylib: scylla_cluster: refactor create_server parameters to a struct
`ScyllaCluster` constructor takes a function `create_server` which
itself takes 3 parameters now. Soon it will take a 4th. The list of
parameters is repeated at the constructor definition and the call site
of the constructor, with many parameters it begins being tiresome.
Refactor the list of parameters to a `NamedTuple`.
2022-11-24 16:26:23 +01:00
Kamil Braun
d582666293 test.py: stop/uninstall clusters instead of servers when cleaning up
`self.artifacts` was calling `ScyllaServer.stop` and
`ScyllaServer.uninstall`. Now it calls `ScyllaCluster.stop` and
`ScyllaCluster.uninstall`, which underneath stops/uninstalls
servers in this cluster.

We must be a bit more careful now in case installing/starting a
server inside a cluster fails: there are no server cleanup artifacts,
and a server is added to cluster's `running` map only after
`install_and_start` finishes (until that happens,
`ScyllaCluster.stop/uninstall` won't catch this server).
So handle failures explicitly in `install_and_start`.

This commit does not logically change how the tests are running - every
started server belongs to some cluster, so it will be cleaned up
- but it's an important refactor.

It will allow us to move IP address (de)allocation code outside
`ScyllaServer`, into `ScyllaCluster`, which in turn will allow us to
implement node replace operation for the case where we want to reuse
the replaced node's IP.

Also, `ScyllaCluster.uninstall` was unused before this change, now it's
used.
2022-11-24 16:26:17 +01:00
Avi Kivity
29a4b662f8 Merge 'doc: document the Alternator TTL feature as GA' from Anna Stuchlik
Currently, TTL is listed as one of the experimental features: https://docs.scylladb.com/stable/alternator/compatibility.html#experimental-api-features

This PR moves the feature description from the Experimental Features section to a separate section.
I've also added some links and improved the formatting.

@tzach I've relied on your release notes for RC1.

Refs: https://github.com/scylladb/scylladb/issues/5060

Closes #11997

* github.com:scylladb/scylladb:
  Update docs/alternator/compatibility.md
  doc: update the link to Enabling Experimental Features
  doc: remove the note referring to the previous ScyllaDB versions and add the relevant limitation to the paragraph
  doc: update the links to the Enabling Experimental Features section
  doc: add the link to the Enabling Experimental Features section
  doc: move the TTL Alternator feature from the Experimental Features section to the production-ready section
2022-11-24 17:22:05 +02:00
Nadav Har'El
2dedb5ea75 alternator: make Alternator TTL feature no longer "experimental"
Until now, the Alternator TTL feature was considered "experimental",
and had to be manually enabled on all nodes of the cluster to be usable.

This patch removes this requirement and in essence GAs this feature.

Even after this patch, Alternator TTL is still a "cluster feature",
i.e., for this feature to be usable every node in the cluster needs
to support it. If any of the nodes is old and does not yet support this
feature, the UpdateTimeToLive request will not be accepted, so although
the expiration-scanning threads may exist on the newer nodes, they will
not do anything because none of the tables can be marked as having
expiration enabled.

This patch does not contain documentation fixes - the documentation
still suggests that the Alternator TTL feature is experimental.
The documentation patch will come separately.

Fixes #12037

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12049
2022-11-24 17:21:39 +02:00
Tzach Livyatan
e96d31d654 docs: Add Authentication and Authorization as a prerequisite for Auditing.
Closes #12058
2022-11-24 17:21:23 +02:00
Kamil Braun
df731a5b0c test/pylib: artifact_registry: replace Awaitable type with Coroutine
The `cleanup_before_exit` method of `ArtifactRegistry` calls `close()`
on artifacts. mypy complains that `Awaitable` has no such method. In
fact, the `artifact` objects that we pass to `ArtifactRegistry`
(obtained by calling `async def` functions) do have a `close()` method,
and they are a particular case of `Awaitable`s, but in general not
all `Awaitable`s have `close()`.

Replace `Awaitable` with one of its subtypes: `Coroutine`. `Coroutine`s
have a `close()` method, and `async def` functions return objects of
this type. mypy no longer complains.
2022-11-24 16:17:05 +01:00
Nadav Har'El
c6bb64ab0e Merge 'Fix LWT insert crash if clustering key is null' from Gusev Petr
[PR](https://github.com/scylladb/scylladb/pull/9314) fixed a similar issue with regular insert statements
but missed the LWT code path.

It's expected behaviour of
`modification_statement::create_clustering_ranges` to return an
empty range in this case, since `possible_lhs_values` it
uses explicitly returns `empty_value_set` if it evaluates `rhs`
to null, and it has a comment about it (All NULL
comparisons fail; no column values match.) On the other hand,
all components of the primary key are required to be set,
this is checked at the prepare phase, in
`modification_statement::process_where_clause`. So the only
problem was `modification_statement::execute_with_condition`
was not expecting an empty `clustering_range` in case of
a null clustering key.

Also this patch contains a fix for the problem with wrong
column name in Scylla error messages. If `INSERT` or `DELETE`
statement is missing a non-last element of
the primary key, the error message generated contains
an invalid column name.

The problem occurs if the query contains a column with the list type,
otherwise
`statement_restrictions::process_clustering_columns_restrictions`
checks that all the components of the key are specified.

Closes #12047

* github.com:scylladb/scylladb:
  cql: refactor, inline modification_statement::validate_primary_key_restrictions
  cql: DELETE with null value for IN parameter should be forbidden
  cql: add column name to the error message in case of null primary key component
  cql: batch statement, inserting a row with a null key column should be forbidden
  cql: wrong column name in error messages
  modification_statement: fix LWT insert crash if clustering key is null
2022-11-24 16:15:27 +02:00
Nadav Har'El
6e9f739f19 Merge 'doc: add the links to the per-partition rate limit extension ' from Anna Stuchlik
Release 5.1. introduced a new CQL extension that applies to the CREATE TABLE and ALTER TABLE statements. The ScyllaDB-specific extensions are described on a separate page, so the CREATE TABLE and ALTER TABLE should include links to that page and section.

Note: CQL extensions are described with Markdown, while the Data Definition page is RST. Currently, there's no way to link from an RST page to an MD subsection (using a section heading or anchor), so a URL is used as a temporary solution.

Related: https://github.com/scylladb/scylladb/pull/9810

Closes #12070

* github.com:scylladb/scylladb:
  doc: move the info about per-partition rate limit for the ALTER TABLE statemet from the paragraph to the list
  doc: add the links to the per-partition rate limit extention to the CREATE TABLE and ALTER TABLE sections
2022-11-24 16:03:30 +02:00
Anna Stuchlik
8049670772 doc: move the info about per-partition rate limit for the ALTER TABLE statemet from the paragraph to the list 2022-11-24 14:42:11 +01:00
Anna Stuchlik
57a58b17a8 doc: enable publishing the documentation for version 5.1
Closes #12059
2022-11-24 13:55:25 +02:00
Kamil Braun
2f99f27c14 docs/dev: building.md: mention node-exporter packages 2022-11-24 12:49:34 +01:00
Kamil Braun
b12f331fe6 docs/dev: building.md: replace dev with <mode> in list of debs 2022-11-24 12:47:09 +01:00
Benny Halevy
243dc2efce hints: host_filter: check topology::has_endpoint if enabled_selectively
Don't call get_datacenter(ep) without checking
first has_endpoint(ep) since the former may abort
on internal error if the endpoint is not listed
in topology.

Refs #11870

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12054
2022-11-24 14:33:06 +03:00
Anna Stuchlik
f158d31e24 doc: add the links to the per-partition rate limit extention to the CREATE TABLE and ALTER TABLE sections 2022-11-24 11:26:33 +01:00
Petr Gusev
b95305ae2b cql: refactor, inline modification_statement::validate_primary_key_restrictions
The function didn't add much value, just forwarded to _restrictions.
Removed it and called _restrictions->validate_primary_key directly.
2022-11-23 21:56:12 +04:00
Petr Gusev
f9936bb0cb cql: DELETE with null value for IN parameter should be forbidden
If a DELETE statement contains an IN operator and the
parameter value for it is NULL, this should also trigger
an error. This is in line with how Cassandra
behaves in this case.
2022-11-23 21:39:23 +04:00
Petr Gusev
c123f94110 cql: add column name to the error message in case of null primary key component
It's more user-friendly and the error message
corresponds to what Cassandra provides in this case.
2022-11-23 21:39:23 +04:00
Petr Gusev
7730c4718e cql: batch statement, inserting a row with a null key column should be forbidden
Regular INSERT statements with null values for primary key
components are rejected by Scylla since #9286 and #9314.
Batch statements missed a similar check, this patch
fixes it.

Fixes: #12060
2022-11-23 21:39:23 +04:00
Petr Gusev
89a5397d7c cql: wrong column name in error messages
If INSERT or DELETE statement is missing a non-last element of
the primary key, the error message generated contains
an invalid column name.

The problem occurs if the query contains a column with the list type,
otherwise
statement_restrictions::process_clustering_columns_restrictions
checks that all the components of the key are specified.

Fixes: #12046
2022-11-23 21:39:16 +04:00
Benny Halevy
996eac9569 topology: add get_datacenters
Returns an unordered set of datacenter names
to be used by network_topology_replication_strategy
and for ks_prop_defs.

The set is kept in sync with _dc_endpoints.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12023
2022-11-23 18:39:36 +02:00
Takuya ASADA
9acdd3af23 dist: drop deprecated AMI parameters on setup scripts
Since we moved all IaaS code to scylla-machine-image, we nolonger need
AMI variable on sysconfig file or --ami parameter on setup scripts,
and also never used /etc/scylla/ami_disabled.
So let's drop all of them from Scylla core core.

Related with scylladb/scylla-machine-image#61

Closes #12043
2022-11-23 17:56:13 +02:00
Avi Kivity
7c66fdcad1 Merge 'Simplify sstable_directory configuration' from Pavel Emelyanov
When started the sstable_directory is constructed with a bunch of booleans that control the way its process_sstable_dir method works. It's shorter and simpler to pass these booleans into method directly, all the more so there's another flag that's already passed like this.

Closes #12005

* github.com:scylladb/scylladb:
  sstable_directory: Move all RAII booleans onto flags
  sstable_directory: Convert sort-sstables argument to flags struct
  sstable_directory: Drop default filter
2022-11-23 16:16:04 +02:00
Avi Kivity
70bfa708f5 storage_proxy: coroutinize change_hints_host_filter()
Trivial straight-line code, no performance implications.

Closes #12056
2022-11-23 15:34:24 +02:00
Jan Ciolek
84501851eb cql_pytest: ensure that where clauses like token(p) = 0 AND p = 0 are rejected
Scylla doesn't support combining restrictions
on token with other restrictions on partition key columns.

Some pieces of code depend on the assumption
that such combinations are allowed.
In case they were allowed in the future
these functions would silently start
returning wrong results, and we would
return invalid rows.

Add a test that will start failing once
this restriction is removed. It will
warn the developer to change the
functions that used to depend
on the assumption.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 13:09:22 +01:00
Botond Dénes
602dfdaf98 Merge 'Task manager top level repair tasks' from Aleksandra Martyniuk
The PR introduces top level repair tasks representing repair and node operations
performed with repair. The actions performed as a part of these operations are
moved to corresponding tasks' run methods.

Also a small change to repair module is added.

Closes #11869

* github.com:scylladb/scylladb:
  repair: define run for data_sync_repair_task_impl
  repair: add data_sync_repair_task_impl
  tasks: repair: add noexcept to task impl constructor
  repair: define run for user_requested_repair_task_impl
  repair: add user_requested_repair_task_impl
  repair: allow direct access to max_repair_memory_per_range
2022-11-23 14:02:30 +02:00
Jan Ciolek
338af848a8 cql3: expr: remove needless braces around switch cases
Originally put braces around the cases because
there were local variables that I didn't want
to be shadowed.

Now there are no variables so the braces
can be removed without any problems.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:30 +01:00
Jan Ciolek
e8a46d34c2 cql3: move evaluation IS_NOT NULL to a separate function
When evaluating a binary operation with
operations like EQUAL, LESS_THAN, IN
the logic of the operation is put
in a separate function to keep things clean.

IS_NOT NULL is the only exception,
it has its evaluate implementation
right in the evaluate(binary_operator)
function.

It would be cleaner to have it in
a separate dedicated function,
so it's moved to one.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:30 +01:00
Jan Ciolek
b6cf6e6777 expr_test: test evaluating LIKE binary_operator
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:29 +01:00
Jan Ciolek
6774272fd6 expr_test: test evaluating IS_NOT binary_operator
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:29 +01:00
Jan Ciolek
e6c78bb6c2 expr_test: test evaluating CONTAINS_KEY binary_operator
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:29 +01:00
Jan Ciolek
4f250609ab expr_test: test evaluating CONTAINS binary_operator
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:29 +01:00
Jan Ciolek
3ca04cfcc2 expr_test: test evaluating IN binary_operator
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:28 +01:00
Jan Ciolek
41f452b73f expr_test: test evaluating GTE binary_operator
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:28 +01:00
Jan Ciolek
1fe9a9ce2a expr_test: test evaluating GT binary_operator
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:28 +01:00
Jan Ciolek
ef2a77a3e0 expr_test: test evaluating LTE binary_operator
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:28 +01:00
Jan Ciolek
3cbb2d44e8 expr_test: test evaluating LT binary_operator
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:27 +01:00
Jan Ciolek
9feee70710 expr_test: test evaluating NEQ binary_operator
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:27 +01:00
Jan Ciolek
e77dba0b0b expr_test: test evaluating EQ binary_operator
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:27 +01:00
Jan Ciolek
63a89776a1 cql3: expr properly handle null in is_one_of()
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:27 +01:00
Jan Ciolek
214dab9c77 cql3: expr properly handle null in like()
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:26 +01:00
Jan Ciolek
2ce9c95a9d cql3: expr properly handle null in contains_key()
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:26 +01:00
Jan Ciolek
336ad61aa3 cql3: expr properly handle null in contains()
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:26 +01:00
Jan Ciolek
e2223be1ec cql3: expr: properly handle null in limits()
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:26 +01:00
Jan Ciolek
d1abf2e168 cql3: expr: remove unneeded overload of limits()
There is a more general version of limits()
which takes expressions as both the lhs and rhs
arguments.

There is no need for a specialized overload.
This specialized overload takes a tuple_constructor
as lhs, but we call evaluate() on both sides
of a binary operator before checking equality,
so this won't be useful at all.

Having multiple functions increases the risk
that one of them has a bug, while giving
dubious benfit.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:25 +01:00
Jan Ciolek
0609a425e6 cql3: expr: properly handle null in equality operators
Expressions like:
123 = NULL
NULL = 123
NULL = NULL
NULL != 123

should be tolerated, but evaluate to NULL.
The current code assumes that a binary operator
can only evaluate to a boolean - true or false.

Now a binary operator can also evaluate to NULL.
This should happen in cases when one of the
operator's sides is NULL.

A special class is introduced to represent a value
that can be one of three things: true, false or null.
It's better than using std::optional<bool>,
because optional has implicit conversions to bool
that could cause confusion and bugs.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 12:44:22 +01:00
Aleksandra Martyniuk
a3016e652f repair: define run for data_sync_repair_task_impl
Operations performed as a part of data sync repair are moved
to data_sync_repair_task_impl run method.
2022-11-23 10:44:19 +01:00
Aleksandra Martyniuk
42239c8fed repair: add data_sync_repair_task_impl
Create a task spanning over whole node operation. Tasks of that type
are stored on shard 0.
2022-11-23 10:19:53 +01:00
Aleksandra Martyniuk
9e108a2490 tasks: repair: add noexcept to task impl constructor
Add noexcept to constructor of tasks::task_manager::task::impl
and inheriting classes.
2022-11-23 10:19:53 +01:00
Aleksandra Martyniuk
4a4e9c12df repair: define run for user_requested_repair_task_impl
Operations performed as a part of user requested repair are
moved to user_requested_repair_task_impl run method.
2022-11-23 10:19:51 +01:00
Aleksandra Martyniuk
3800b771fc repair: add user_requested_repair_task_impl
Create a task spanning over whole user requested repair.
Tasks of that type are stored on shard 0.
2022-11-23 10:11:09 +01:00
Aleksandra Martyniuk
0256ede089 repair: allow direct access to max_repair_memory_per_range
Access specifier of constexpr value max_repair_memory_per_range
in repair_module is changed to public and its getter is deleted.
2022-11-23 10:11:09 +01:00
Anna Stuchlik
16e2b9acd4 Update docs/alternator/compatibility.md
Co-authored-by: Daniel Lohse <info@asapdesign.de>
2022-11-23 09:51:04 +01:00
Avi Kivity
d7310fd083 gdb: messaging: print tls servers too
Many systems have most traffic on tls servers, so print them.

Closes #12053
2022-11-23 07:59:02 +02:00
Avi Kivity
aec9faddb1 Merge 'storage_proxy: use erm topology' from Benny Halevy
When processing a query, we keep a pointer to an effective_replication_map.
In a couple places we used the latest topology instead of the one held by the effective_replication_map
that the query uses and that might lead to inconsistencies if, for example, a node is removed from topology after decommission that happens concurrently to the query.

This change gets the topology& from the e_r_m in those cases.

Fixes #12050

Closes #12051

* github.com:scylladb/scylladb:
  storage_proxy: pass topology& to sort_endpoints_by_proximity
  storage_proxy: pass topology& to is_worth_merging_for_range_query
2022-11-22 20:04:41 +02:00
Botond Dénes
49ec7caf27 mutation_fragment_stream_validator: avoid allocation when stream is correct
Currently the ctor of said class always allocates as it copies the
provided name string and it creates a new name via format().
We want to avoid this, now that the validator is used on the read path.
So defer creating the formatted name to when we actually want to log
something, which is either when log level is debug or when an error is
found. We don't care about performance in either case, but we do care
about it on the happy path.
Further to the above, provide a constructor for string literal names and
when this is used, don't copy the name string, just save a view to it.

Refs: #11174

Closes #12042
2022-11-22 19:19:18 +02:00
Nadav Har'El
ce7c1a6c52 Merge 'alternator: fix wrong 'where' condition for GSI range key' from Marcin Maliszkiewicz
Contains fixes requested in the issue (and some tiny extras), together with analysis why they don't affect the users (see commit messages).

Fixes [ #11800](https://github.com/scylladb/scylladb/issues/11800)

Closes #11926

* github.com:scylladb/scylladb:
  alternator: add maybe_quote to secondary indexes 'where' condition
  test/alternator: correct xfail reason for test_gsi_backfill_empty_string
  test/alternator: correct indentation in test_lsi_describe
  alternator: fix wrong 'where' condition for GSI range key
2022-11-22 17:46:52 +02:00
Pavel Emelyanov
22133a3949 sstable_directory: Move all RAII booleans onto flags
There's a bunch of booleans that control the behavior of sstable
directory scanning. Currently they are described as verbose
bool_class<>-es and are put into sstable_directory construction time.

However, these are not used outside of .process_sstable_dir() method and
moving them onto recently added flags struct makes the code much
shorter (29 insertions(+), 121 deletions(-))

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-22 18:30:00 +03:00
Pavel Emelyanov
7ca5e143d7 sstable_directory: Convert sort-sstables argument to flags struct
The sstable_directory::process_sstable_dir() accepts a boolean to
control its behavior when collecting sstables. Turn this boolean into a
structure of flags. The intention is to extend this flags set in the
future (next patch).

This boolean is true all the time, but one place sets it to true in a
"verbose" manner, like this:

        bool sort_sstables_according_to_owner = false;
        process_sstable_dir(directory, sort_sstables_according_to_owner).get();

the local variable is not used anymore. Using designated initializers
solves the verbosity in a nicer manner.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-22 18:19:23 +03:00
Pavel Emelyanov
7c7017d726 sstable_directory: Drop default filter
It's used as default argument for .reshape() method, but callers specify
it explicitly. At the same time the filter is simple enough and is only
used in one place so that the caller can just use explicit lambda.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-22 18:19:23 +03:00
Jan Ciolek
6be142e3a0 cql3: expr: remove unneeded overload of equal()
There is a more general version of equal()
which takes expressions as both the lhs and rhs
arguments.

There is no need for a specialized overload.
This specialized overload takes a tuple_constructor
as lhs, but we call evaluate() on both sides
of a binary operator before checking equality,
so this won't be useful at all.

Having multiple functions increases the risk
that one of them has a bug, while giving
dubious benfit.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-22 14:28:10 +01:00
Benny Halevy
731a74c71f storage_proxy: pass topology& to sort_endpoints_by_proximity
It mustn't use the latest topology that may differ from the
one used by the query as it may be missing nodes
(e.g. after concurrent decommission).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-22 15:02:40 +02:00
Benny Halevy
ab3fc1e069 storage_proxy: pass topology& to is_worth_merging_for_range_query
It mustn't use the latest topology that may differ from the
one used by the query as it may be missing nodes
(e.g. after concurrent decommission).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-22 15:01:58 +02:00
Petr Gusev
0d443dfd16 modification_statement: fix LWT insert crash if clustering key is null
PR #9314 fixed a similar issue with regular insert statements
but missed the LWT code path.

It's expected behaviour of
modification_statement::create_clustering_ranges to return an
empty range in this case, since possible_lhs_values it
uses explicitly returns empty_value_set if it evaluates rhs
to null, and it has a comment about it (All NULL
comparisons fail; no column values match.) On the other hand,
all components of the primary key are required to be set,
this is checked at the prepare phase, in
modification_statement::process_where_clause. So the only
problem was modification_statement::execute_with_condition
was not expecting an empty clustering_range in case of
a null clustering key.

Fixes: #11954
2022-11-22 16:45:16 +04:00
Marcin Maliszkiewicz
2bf2ffd3ed alternator: add maybe_quote to secondary indexes 'where' condition
This bug doesn't affect anything, the reason is descibed in the commit:
'alternator: fix wrong 'where' condition for GSI range key'.

But it's theoretically correct to escape those key names and
the difference can be observed via CQL's describe table. Before
the patch 'where' condition is missing one double quote in variable
name making it mismatched with corresponding column name.
2022-11-22 11:08:23 +01:00
Marcin Maliszkiewicz
4389baf0d9 test/alternator: correct xfail reason for test_gsi_backfill_empty_string
Previously cited issue is closed already.
2022-11-22 11:08:23 +01:00
Marcin Maliszkiewicz
59eca20af1 test/alternator: correct indentation in test_lsi_describe
Otherwise I think assert is not executed in a loop. And I am not sure why lsi variable can be bound
to anything. As I tested it was pointing to the last element in lsis...
2022-11-22 11:08:23 +01:00
Marcin Maliszkiewicz
d6d20134de alternator: fix wrong 'where' condition for GSI range key
This bug doesn't manifest in a visible way to the user.

Adding the index to an existing table via GlobalSecondaryIndexUpdates is not supported
so we don't need to consider what could happen for empty values of index range key.
After the index is added the only interesting value user can set is omitting
the value (null or empty are not allowed, see test_gsi_empty_value and
test_gsi_null_value).

In practice no matter of 'where' condition the underlaying materialized
view code is skipping row updates with missing keys as per this comment:
'If one of the key columns is missing, set has_new_row = false
meaning that after the update there will be no view row'.

Thats why the added test passes both before and after the patch.
But it's still usefull to include it to exercise those code paths.

Fixes #11800
2022-11-22 11:08:23 +01:00
Nadav Har'El
ff617c6950 cql-pytest: translate a few small Cassandra tests
This patch includes a translation of several additional small test files
from Cassandra's CQL unit test directory cql3/validation/operations.

All tests included here pass on both Cassandra and Scylla, so they did
not discover any new Scylla bugs, but can be useful in the future as
regression tests.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12045
2022-11-22 07:54:13 +02:00
Botond Dénes
f3eecb47f6 Merge 'Optimize cleanup compaction get ranges for invalidation' from Benny Halevy
Take advantage of the facts that both the owned ranges
and the initial non_owned_ranges (derived from the set of sstables)
are deoverlapped and sorted by start token to turn
the calculation of the final non_owned_ranges from
quadratic to linear.

Fixes #11922

Closes #11903

* github.com:scylladb/scylladb:
  dht: optimize subtract_ranges
  compaction: refactor dht::subtract_ranges out of get_ranges_for_invalidation
  compaction_manager: needs_cleanup: get first/last tokens from sstable decorated keys
2022-11-22 06:45:01 +02:00
Jan Ciolek
a1407ef576 cql3: expr: use evaluate(binary_operator) in is_satisfied_by
is_satisfied_by has to check if a binary_operator is satisfied
by some values. It used to be impossible to evaluate
a binary_operator, so is_satisfied had code to check
if its satisfied for a limited number of cases
occuring when filtering queries.

Now evaluate(binary_operator) has been implemented
and is_satisfied_by can use it to check if a binary_operator
evaluates to true.
This is cleaner and reduces code duplication.
Additionally cql tests will test the new evalute() implementation.

There is one special case with token().
When is_satisfied_by sees a restriction on token
it assumes that it's satisfied because it's
sure that these token restrictions were used
to generate partition ranges.

I had to leave this special case in because it's impossible
to evaluate(token). Once this is implemented I will remove
the special case because it's risky and prone to cause
bugs.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-21 20:40:06 +01:00
Jan Ciolek
9c4889ecc3 cql3: expr: handle IS NOT NULL when evaluating binary_operator
The code to evaluate binary operators
was copied from is_satisfied_by.
is_satisfied_by wasn't able to evaluate
IS NOT NULL restrictions, so when such restriction
is encountered it throws an exception.

Implement proper handling for IS NOT NULL binary operators.

The switch ensures that all variants of oper_t are handled,
otherwise there would be a compilation error.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-21 20:40:00 +01:00
Avi Kivity
bf2e54ff85 Merge 'Move deletion log code to sstable_directory.cc' from Pavel Emelyanov
In order to support different storage kinds for sstable files (e.g. -- s3) it's needed to localize all the places that manipulate files on a POSIX filesystem so that custom storage could implement them in its own way. This set moves the deletion log manipulations to the sstable_directory.cc, which already "knows" that it works over a directory.

Closes #12020

* github.com:scylladb/scylladb:
  sstables: Delete log file in replay_pending_delete_log()
  sstables: Move deletion log manipulations to sstable_directory.cc
  sstables: Open-code delete_sstables() call
  sstables: Use fs::path in replay_pending_delete_log()
  sstables: Indentation fix after previous patch
  sstables: Coroutinize replay_pending_delete_log
  sstables: Read pending delete log with one line helper
  sstables: Dont write pending log with file_writer
2022-11-21 21:22:59 +02:00
Jan Ciolek
b4cc92216b cql3: expr: make it possible to evaluate binary_operator
evaluate() takes an expression and evaluates it
to a constant value. It wasn't possible to evalute
binary operators before, so it's added.

The code is based on is_satisfied_by,
which is currently used to check
whether a binary operator evaluates
to true or false.

It looks like is_satisfied_by and evalate()
do pretty much the same thing, one could be
implemented using the other.
In the future they might get merged
into a single function.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-21 17:48:23 +01:00
Jan Ciolek
8d81eaa68f cql3: expr: accept expression as lhs argument to like()
like() used to only accept column_value as the lhs
to evaluate. Changed it to accept any generic expression.
This will allow to evaluate a more diverse set of
binary operators.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-21 16:33:18 +01:00
Jan Ciolek
b1a12686dc cql3: expr: accept expression as lhs in contains_key
contains_key() used to only accept column_value as the lhs
to evaluate. Changed it to accept any generic expression.
This will allow to evaluate a more diverse set of
binary operators.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-21 16:33:02 +01:00
Jan Ciolek
79cd9cd956 cql3: expr: accept expression as lhs argument to contains()
contains() used to only accept column_value as the lhs
to evaluate. Changed it to accept any generic expression.
This will allow to evaluate a more diverse set of
binary operators.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-21 16:32:44 +01:00
Benny Halevy
57ff3f240f dht: optimize subtract_ranges
Take advantage of the fact that both ranges and
ranges_to_subtract are deoverlapped and sorted by
to reduce the calculation complexity from
quadratic to linear.

Fixes #11922

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-21 15:48:28 +02:00
Benny Halevy
8b81635d95 compaction: refactor dht::subtract_ranges out of get_ranges_for_invalidation
The algorithm is generic and can be used elsewhere.

Add a unit test for the function before it gets
optimized in the following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-21 15:48:26 +02:00
Benny Halevy
7c6f60ae72 compaction_manager: needs_cleanup: get first/last tokens from sstable decorated keys
Currently, the function is inefficient in two ways:
1. unnecessary copy of first/last keys to automatic variables
2. redecorating the partition keys with the schema passed to
   needs_cleanup.

We canjust use the tokens from the sstable first/last decorated keys.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-21 15:44:32 +02:00
Pavel Emelyanov
2f9b7931af sstables: Delete log file in replay_pending_delete_log()
It's natural that the replayer cleans up after itself

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-21 13:16:22 +03:00
Pavel Emelyanov
bdc47b7717 sstables: Move deletion log manipulations to sstable_directory.cc
The deletion log concept uses the fact that files are on a POSIX
filesystem. Support for another storage type will have to reimplement
this place, so keep the FS-specific code in _directory.cc file.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-21 13:16:21 +03:00
Pavel Emelyanov
865c51c6cf sstables: Open-code delete_sstables() call
It's no used by any other code, but to be used it requires the caller to
tranform TOC file names by prepending sstable directory to them. Things
get shorter and simpler if merging the helper code into the caller.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-21 13:15:25 +03:00
Pavel Emelyanov
a61c96a627 sstables: Use fs::path in replay_pending_delete_log()
It's called by a code that has fs::path at hand and internally uses
helpers that need fs::path too, so no need to convert it back and forth.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-21 13:15:25 +03:00
Pavel Emelyanov
f5684bcaf0 sstables: Indentation fix after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-21 13:15:25 +03:00
Pavel Emelyanov
85a73ca9c6 sstables: Coroutinize replay_pending_delete_log
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-21 13:15:25 +03:00
Pavel Emelyanov
6f3fd94162 sstables: Read pending delete log with one line helper
There's one in seastar since recently

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-21 13:15:25 +03:00
Pavel Emelyanov
2dedf4d03a sstables: Dont write pending log with file_writer
It's a wrapper over output_stream with offset tracking and the tracking
is not needed to generate a log file. As a bonus of switching back we
get a stream.write(sstring) sugar.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-21 13:15:24 +03:00
Botond Dénes
2d4439a739 Merge 'doc: add a troubleshooting article about the missing configuration files' from Anna Stuchlik
Fix https://github.com/scylladb/scylladb/issues/11598

This PR adds the troubleshooting article submitted by @syuu1228 in the deprecated _scylla-docs_ repo, with https://github.com/scylladb/scylla-docs/pull/4152.
I copied and reorganized the content and rewritten it a little according to the RST guidelines so that the page renders correctly.

@syuu1228 Could you review this PR to make sure that my changes didn't distort the original meaning?

Closes #11626

* github.com:scylladb/scylladb:
  doc: apply the feedback to improve clarity
  doc: add the link to the new Troubleshooting section and replace Scylla with ScyllaDB
  doc: add the new page to the toctree
  doc: add a troubleshooting article about the missing configuration files
2022-11-21 12:02:31 +02:00
Kamil Braun
135eb4a041 test.py: prepare for adding extra config from test when creating servers
We will use this for replace operations to pass the IP of replaced node.
2022-11-21 10:57:03 +01:00
Kamil Braun
ac91e9d8be test/pylib: manager_client: convert add_server to use put_json
We shall soon pass some JSON data into these requests.
2022-11-21 10:57:03 +01:00
Kamil Braun
82eb9af80d test/pylib: rest_client: allow returning JSON data from put_json
We'll use `put_json` for requests which want to pass JSON data into the
call and also return JSON.
2022-11-21 10:57:03 +01:00
Kamil Braun
4fef2d099b test/pylib: scylla_cluster: don't import from manager_client
There's a logical dependency from `manager_client` to `scylla_cluster`
(`ManagerClient` defined in `manager_client` talks to
`ScyllaClusterManager` defined in `scylla_cluster` over RPC). There is
no such dependency in the other way. Do not introduce it accidentally.

We can import these types from the `internal_types` module.
2022-11-21 10:57:03 +01:00
Nadav Har'El
757d2a4c02 test/alternator: un-xfail a test which passes on modern Python
We had an xfailing test that reproduced a case where Alternator tried
to report an error when the request was too long, but the boto library
didn't see this error and threw a "Broken Pipe" error instead. It turns
out that this wasn't a Scylla bug but rather a bug in urllib3, which
overzealously reported a "Broken Pipe" instead of trying to read the
server's response. It turns out this issue was already fixed in
   https://github.com/urllib3/urllib3/pull/1524

and now, on modern installations, the test that used to fail now passes
and reports "XPASS".

So in this patch we remove the "xfail" tag, and skip the test if
running an old version of urllib3.

Fixes #8195

Closes #12038
2022-11-21 08:10:10 +02:00
Botond Dénes
ffc3697f2f Merge 'storage_service api: handle dropped tables' from Benny Halevy
Gracefully skip tables that were removed in the background.

Fixes #12007

Closes #12013

* github.com:scylladb/scylladb:
  api: storage_service: fixup indentation
  api: storage_service: add run_on_existing_tables
  api: storage_service: add parse_table_infos
  api: storage_service: log errors from compaction related handlers
  api: storage_service: coroutinize compaction related handlers
2022-11-21 07:56:27 +02:00
Avi Kivity
994603171b Merge 'Add validator to the mutation compactor' from Botond Dénes
Fragment reordering and fragment dropping bugs have been plaguing us since forever. To fight them we added a validator to the sstable write path to prevent really messed up sstables from being written.
This series adds validation to the mutation compactor. This will cover reads and compaction among others, hopefully ridding us of such bugs on the read path too.
This series fixes some benign looking issues found by unit tests after the validator was added -- although how benign a producer emitting two partition-ends depends entirely on how the consumer reacts to it, so no such bug is actually benign.

Fixes: https://github.com/scylladb/scylladb/issues/11174

Closes #11532

* github.com:scylladb/scylladb:
  mutation_compactor: add validator
  mutation_fragment_stream_validator: add a 'none' validation level
  test/boost/mutation_query_test: test_partition_limit: sort input data
  querier: consume_page(): use partition_start as the sentinel value
  treewide: use ::for_partition_end() instead of ::end_of_partition_tag_t{}
  treewide: use ::for_partition_start() instead of ::partition_start_tag_t{}
  position_in_partition: add for_partition_{start,end}()
2022-11-20 20:33:26 +02:00
Avi Kivity
779b01106d Merge 'cql3: expr: add unit tests for prepare_expression' from Jan Ciołek
Adds unit tests for the function `expr::prepare_expression`.

Three minor bugs were found by these tests, both fixed in this PR.
1. When preparing a map, the type for tuple constructor was taken from an unprepared tuple, which has `nullptr` as its type.
2. Preparing an empty nonfrozen list or set resulted in `null`, but preparing a map didn't. Fixed this inconsistency.
3. Preparing a `bind_variable` with `nullptr` receiver was allowed. The `bind_variable` ended up with a `nullptr` type, which is incorrect. Changed it to throw an exception,

Closes #11941

* github.com:scylladb/scylladb:
  test preparing expr::usertype_constructor
  expr_test: test that prepare_expression checks style_type of collection_constructor
  expr_test: test preparing expr::collection_constructor for map
  prepare_expr: make preparing nonfrozen empty maps return null
  prepare_expr: fix a bug in map_prepare_expression
  expr_test: test preparing expr::collection_constructor for set
  expr_test: test preparing expr::collection_constructor for list
  expr_test: test preparing expr::tuple_constructor
  expr_test: test preparing expr::untyped_constant
  expr_test_utils: add make_bigint_raw/const
  expr_test_utils: add make_tinyint_raw/const
  expr_test: test preparing expr::bind_variable
  cql3: prepare_expr: forbid preparing bind_variable without a receiver
  expr_test: test preparing expr::null
  expr_test: test preparing expr::cast
  expr_test_utils: add make_receiver
  expr_test_utils: add make_smallint_raw/const
  expr_test: test preparing expr::token
  expr_test: test preparing expr::subscript
  expr_test: test preparing expr::column_value
  expr_test: test preparing expr::unresolved_identifier
  expr_test_utils: mock data_dictionary::database
2022-11-20 20:03:54 +02:00
Nadav Har'El
2ba8b8d625 test/cql-pytest: remove "xfail" from passing test testIndexOnFrozenCollectionOfUDT
We had a test that used to fail because of issue #8745. But this issue
was alread fixed, and we forgot to remove the "xfail" marker. The test
now passes, so let's remove the xfail marker.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12039
2022-11-20 19:54:59 +02:00
Avi Kivity
40f61db120 Merge 'docs: describe the Raft upgrade and recovery procedures' from Kamil Braun
Add new guide for upgrading 5.1 to 5.2.

In this new upgrade doc, include additional steps for enabling
Raft using the `consistent_cluster_management` flag. Note that we don't
have this flag yet but it's planned to replace the experimental flag in
5.2.

In the "Raft in ScyllaDB" document, add sections about:
- enabling Raft in existing clusters in Scylla 5.2,
- verifying that the internal Raft upgrade procedure finishes
  successfully,
- recovering from a stuck Raft upgrade procedure or from a majority loss
  situation.

Fix some problems in the documentation, e.g. it is not possible to
enable Raft in an existing cluster in 5.0, but the documentation claimed
that it is.

Follow-up items:
- if we decide for a different name for `consistent_cluster_management`,
  use that name in the docs instead
- update the warnings in Scylla to link to the Raft doc
- mention Enterprise versions once we know the numbers
- update the appropriate upgrade docs for Enterprise versions
  once they exist

Closes #11910

* github.com:scylladb/scylladb:
  docs: describe the Raft upgrade and recovery procedures
  docs: add upgrade guide 5.1 -> 5.2
2022-11-20 19:00:23 +02:00
Avi Kivity
15ee8cfc05 Merge 'reader_concurrency_semaphore: fix waiter/inactive race' from Botond Dénes
We recently (in 7fbad8de87) made sure all admission paths can trigger the eviction of inactive reads. As reader eviction happens in the background, a mechanism was added to make sure only a single eviction fiber was running at any given time. This mechanism however had a preemption point between stopping the fiber and releasing the evict lock. This gave an opportunity for either new waiters or inactive readers to be added, without the fiber acting on it. Since it still held onto the lock, it also prevented from other eviction fibers to start. This could create a situation where the semaphore could admit new reads by evicting inactive ones, but it still has waiters. Since an empty waitlist is also an admission criteria, once one waiter is wrongly added, many more can accumulate.
This series fixes this by ensuring the lock is released in the instant the fiber decides there is no more work to do.
It also fixes the assert failure on recursive eviction and adds a detection to the inactive/waiter contradiction.

Fixes: #11923
Refs: #11770

Closes #12026

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: do_wait_admission(): detect admission-waiter anomaly
  reader_concurrency_semaphore: evict_readers_in_the_background(): eliminate blind spot
  reader_concurrency_semaphore: do_detach_inactive_read(): do a complete detach
2022-11-20 18:51:34 +02:00
Avi Kivity
895d721d5e Merge 'scylla-sstable: data-dump improvements' from Botond Dénes
This series contains a mixed bag of improvements to  `scylla sstable dump-data`. These improvements are mostly aimed at making the json output clearer, getting rid of any ambiguities.

Closes #12030

* github.com:scylladb/scylladb:
  tools/scylla-sstable: traverse sstables in argument order
  tools/scylla-sstable: dump-data docs: s/clustering_fragments/clustering_elements
  tools/scylla-sstable: dump-data/json: use Null instead of "<unknown>"
  tools/scylla-sstable: dump-data/json: use more uniform format for collections
  tools/scylla-sstable: dump-data/json: make cells easier to parse
2022-11-20 17:02:27 +02:00
Avi Kivity
2f9c53fbe4 Merge 'test/pylib: scylla_cluster: use server ID to name workdir and log file, not IP address' from Kamil Braun
Since recently the framework uses a separate set of unique IDs to
identify servers, but the log file and workdir is still named using the
last part of the IP address.

This is confusing: the test logs sometimes don't provide the IP addr
(only the ID), and even if they do, the reader of the test log may not
know that they need to look at the last part of the IP to find the
node's log/workdir.

Also using ID will be necessary if we want to reuse IP addresses (e.g.
during node replace, or simply not to run out of IP addresses during
testing).

So use the ID instead to name the workdir and log file.

Also, when starting a test case, print the used cluster. This will make
it easier to map server IDs to their IP addresses when browsing through
the test logs.

Closes #12018

* github.com:scylladb/scylladb:
  test/pylib: manager_client: print used cluster when starting test case
  test/pylib: scylla_cluster: use server ID to name workdir and log file, not IP address
2022-11-20 16:56:19 +02:00
Avi Kivity
14218d82d6 Update tools/java submodule (serverless)
* tools/java caf754f243...874e2d529b (2):
  > Add Scylla Cloud serverless support
  > Switch cqlsh to use scylla-driver
2022-11-20 16:41:36 +02:00
Tomasz Grabiec
c8e983b4aa test: flat_mutation_reader_assertions: Use fatal BOOST_REQUIRE_EQUAL instead of BOOST_CHECK_EQUAL
BOOST_CHECK_EQUAL is a weaker form of assertion, it reports an error
and will cause the test case to fail but continues. This makes the
test harder to debug because there's no obvious way to catch the
failure in GDB and the test output is also flooded with things which
happen after the failed assertion.

Message-Id: <20221119171855.2240225-1-tgrabiec@scylladb.com>
2022-11-20 16:14:26 +02:00
Nadav Har'El
2d2034ea28 Merge 'cql3: don't ignore other restrictions when a multi column restriction is present during filtering' from Jan Ciołek
When filtering with multi column restriction present all other restrictions were ignored.
So a query like:
`SELECT * FROM WHERE pk = 0 AND (ck1, ck2) < (0, 0) AND regular_col = 0 ALLOW FILTERING;`
would ignore the restriction `regular_col = 0`.

This was caused by a bug in the filtering code:
2779a171fc/cql3/selection/selection.cc (L433-L449)

When multi column restrictions were detected, the code checked if they are satisfied and returned immediately.
This is fixed by returning only when these restrictions are not satisfied. When they are satisfied the other restrictions are checked as well to ensure all of them are satisfied.

This code was introduced back in 2019, when fixing #3574.
Perhaps back then it was impossible to mix multi column and regular columns and this approach was correct.

Fixes: #6200
Fixes: #12014

Closes #12031

* github.com:scylladb/scylladb:
  cql-pytest: add a reproducer for #12014, verify that filtering multi column and regular restrictions works
  boost/restrictions-test: uncomment part of the test that passes now
  cql-pytest: enable test for filtering combined multi column and regular column restrictions
  cql3: don't ignore other restrictions when a multi column restriction is present during filtering
2022-11-20 11:50:38 +02:00
Benny Halevy
ec5707a4a8 api: storage_service: fixup indentation 2022-11-20 09:14:45 +02:00
Benny Halevy
cc63719782 api: storage_service: add run_on_existing_tables
Gracefully skip tables that were removed
in the background.

Fixes #12007

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-20 09:14:29 +02:00
Benny Halevy
9ef9b9d1d9 api: storage_service: add parse_table_infos
The table UUIDs are the same on all shards
so we might as well get them on shard 0
(as we already do) and reuse them on other shards.

It is more efficient and accurate to lookup the table
eventually on the shard using its uuid rather than
its name.  If the table was dropped and recreated
using the same name in the background, the new
table will have a new uuid and do the api function
does not apply to it anymore.

A following change will handle the no_such_column_family
cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-20 09:14:21 +02:00
Benny Halevy
9b4a9b2772 api: storage_service: log errors from compaction related handlers
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-20 09:03:25 +02:00
Benny Halevy
a47f96bc05 api: storage_service: coroutinize compaction related handlers
Before we improve parsing tables lists
and handling of no_such_column_family
errors.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-20 09:03:25 +02:00
Jan Ciolek
286f182a8c cql-pytest: add a reproducer for #12014, verify that filtering multi column and regular restrictions works
In issue #12014 a user has encountered an instance of #6200.
When filtering a WHERE clause which contained
both multi-column and regular restrictions,
the regular restrictions were ignored.

Add a test which reproduces the issue
using a reproducer provided by the user.

This problem is tested in another similar test,
but this one reproduces the issue in the exact
way it was found by the user.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-18 15:27:42 +01:00
Jan Ciolek
63fb2612c3 boost/restrictions-test: uncomment part of the test that passes now
A part of the test was commented out due to #6200.
Now #6200 has been fixed and it can be uncommented.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-18 15:14:32 +01:00
Jan Ciolek
99e1032e34 cql-pytest: enable test for filtering combined multi column and regular column restrictions
The test test_multi_column_restrictions_and_filtering was marked as xfail,
because issue #6200 wasn't fixed. Now that filtering
multi column and other restrictions together has been fixed
the test passes.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-18 15:14:32 +01:00
Jan Ciolek
b974d4adfb cql3: don't ignore other restrictions when a multi column restriction is present during filtering
When filtering with multi column restriction present all other restrictions were ignored.
So a query like:
`SELECT * FROM WHERE pk = 0 AND (ck1, ck2) < (0, 0) AND regular_col = 0 ALLOW FILTERING;`

would ignore the restriction `regular_col = 0`.

This was caused by a bug in the filtering code:
2779a171fc/cql3/selection/selection.cc (L433-L449)

When multi column restrictions were detected,
the code checked if they are satisfied and returned immediately.
This is fixed by returning only when these restrictions
are not satisfied. When they are satisfied the other
restrictions are checked as well to ensure all
of them are satisfied.

This code was introduced back in 2019, when fixing #3574.
Perhaps back then it was impossible to mix multi column
and regular columns and this approach was correct.

Fixes: #6200
Fixes: #12014

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-18 15:14:16 +01:00
Botond Dénes
30597f17ed tools/scylla-sstable: traverse sstables in argument order
In the order the user passed them on the command-line.
2022-11-18 15:58:37 +02:00
Botond Dénes
e337b25aa9 tools/scylla-sstable: dump-data docs: s/clustering_fragments/clustering_elements
The usage of clustering_fragments is a typo, the output contains clustering_elements.
2022-11-18 15:58:36 +02:00
Botond Dénes
c39408b394 tools/scylla-sstable: dump-data/json: use Null instead of "<unknown>"
The currently used "<unknown>" marker for invalid values/types is
undistinguishable from a normal value in some cases. Use the much more
distinct and unique json Null instead.
2022-11-18 15:58:36 +02:00
Botond Dénes
1dfceb5716 tools/scylla-sstable: dump-data/json: use more uniform format for collections
Instead of trying to be clever and switching the output on the type of
collection, use the same format always: a list of objects, where the
object has a key and value attribute, containing to the respective
collection item key and values. This makes processing much easier for
machines (and humans too since the previous system wasn't working well).
2022-11-18 15:58:36 +02:00
Botond Dénes
f89acc8df7 tools/scylla-sstable: dump-data/json: make cells easier to parse
There are several slightly different cell types in scylla: regular
cells, collection cells (frozen and non-frozen) and counter cells
(update and shards). In C++ code the type of the cell is always
available for code wishing to make out exactly what kind of cell a cell
is. In the JSON output of the dump-data this is currently really hard to
do as there is not enough information to disambiguate all the different
cell types. We wish to make the JSON output self-sufficient so in this
patch we introduce a "type" field which contains one of:
* regular
* counter-update
* counter-shards
* frozen-collection
* collection

Furthermore, we bring the different types closer by also printing the
counter shards under the 'value' key, not under the 'shards' key as
before. The separate 'shards' is no longer needed to disambiguate.
The documentation and the write operation is also updated to reflect the
changes.
2022-11-18 15:58:36 +02:00
Petr Gusev
41629e97de test.py: handle --markers parameter
Some tests may take longer than a few seconds to run. We want to
mark such tests in some way, so that we can run them selectively.
This patch proposes to use pytest markers for this. The markers
from the test.py command line are passed to pytest
as is via the -m parameter.

By default, the marker filter is not applied and all tests
will be run without exception. To exclude e.g. slow tests
you can write --markers 'not slow'.

The --markers parameter is currently only supported
by Python tests, other tests ignore it. We intend to
support this parameter for other types of tests in the future.

Another possible improvement is not to run suites for which
all tests have been filtered out by markers. The markers are
currently handled by pytest, which means that the logic in
test.py (e.g., running a scylla test cluster) will be run
for such suites.

Closes #11713
2022-11-18 12:36:20 +01:00
Avi Kivity
7da12c64bc Revert "Revert "Merge 'cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query()' from Avi Kivity""
This reverts commit 22f13e7ca3, and reinstates
commit df8e1da8b2 ("Merge 'cql3: select_statement:
coroutinize indexed_table_select_statement::do_execute_base_query()' from
Avi Kivity"). The original commit was reverted due to failures in debug
mode on aarch64, but after commit 224a2877b9
("build: disable -Og in debug mode to avoid coroutine asan breakage"), it
works again.

Closes #12021
2022-11-18 12:44:00 +02:00
Kamil Braun
d7649a86c4 Merge 'Build up to support of dynamic IP address changes in Raft' from Konstantin Osipov
We plan to stop storing IP addresses in Raft configuration, and instead
use the information disseminated through gossip to locate Raft peers.

Implement patches that are building up to that:
* improve Raft API of configuration change notifications
* disseminate raft host id in Gossip
* avoid using Raft addresses from Raft configuraiton, and instead
  consistently use the translation layer between raft server id <-> IP
  address

Closes #11953

* github.com:scylladb/scylladb:
  raft: persist the initial raft address map
  raft: (upgrade) do not use IP addresses from Raft config
  raft: (and gossip) begin gossiping raft server ids
  raft: change the API of conf change notifications
2022-11-18 11:38:19 +01:00
Botond Dénes
437fcdeeda Merge 'Make use of enum_set in directory lister' from Pavel Emelyanov
The lister accepts sort of a filter -- what kind of entries to list, regular, directories or both. It currently uses unordered_set, but enum_set is shorter and better describes the intent.

Closes #12017

* github.com:scylladb/scylladb:
  lister: Make lister::dir_entry_types an enum_set
  database: Avoid useless local variable
2022-11-18 12:15:26 +02:00
Botond Dénes
b39ca29b3c reader_concurrency_semaphore: do_wait_admission(): detect admission-waiter anomaly
The semaphore should admit readers as soon as it can. So at any point in
time there should be either no waiters, or the semaphore shouldn't be
able to admit new reads. Otherwise something went wrong. Detect this
when queuing up reads and dump the diagnostics if detected.
Even though tests should ensure this should never happen, recently we've
seen a race between eviction and enqueuing producing such situations.
This is very hard to write tests for, so add built-in detection and
protection instead. Detecting this is very cheap anyway.
2022-11-18 11:35:47 +02:00
Botond Dénes
ca7014ddb8 reader_concurrency_semaphore: evict_readers_in_the_background(): eliminate blind spot
Said method has a protection against concurrent (recursive more like)
calls to itself, by setting a flag `_evicting` and returning early if
this flag is set. The evicting loop however has at least one preemption
point between deciding there is nothing more to evict and resetting said
flag. This window provides opporunity for new inactive reads or waiters
to be queued without this loop noticing, while denying any other
concurrent invocations at that time from reacting too.
Eliminate this by using repeat() instead of do_until() and setting
`_evicting = false` the moment the loop's run condition becomes false.
2022-11-18 11:35:47 +02:00
Botond Dénes
892f52c683 reader_concurrency_semaphore: do_detach_inactive_read(): do a complete detach
Currently this method detaches the inactive read from the handle and
notifies the permit, calls the notify handler if any and does some stat
bookkeeping. Extend it to do a complete detach: unlink the entry from
the inactive reads list and also cancel the ttl timer.
After this, all that is left to the caller is to destroy the entry.
This will prevent any recursive eviction from causing assertion failure.
Although recursive eviction shouldn't happen, it shouldn't trigger an
assert.
2022-11-18 11:35:43 +02:00
Pavel Emelyanov
a44ca06906 Merge 'token_metadata: Do not use topology info for is_member check' from Asias He
Since commit a980f94 (token_metadata: impl: keep the set of normal token owners as a member), we have a set, _normal_token_owners, which contains all the nodes in the ring.

We can use _normal_token_owners to check if a node is part of the ring directly instead of going through the _toplogy indirectly.

Fixes #11935

Closes #11936

* github.com:scylladb/scylladb:
  token_metadata: Rename is_member to is_normal_token_owner
  token_metadata: Add docs for is_member
  token_metadata: Do not use topology info for is_member check
  token_metadata: Check node is part of the topology instead of the ring
2022-11-18 11:54:07 +03:00
Asias He
4571fcf9e7 token_metadata: Rename is_member to is_normal_token_owner
The name is_normal_token_owner is more clear than is_member.
The is_normal_token_owner reflects what it really checks.
2022-11-18 09:29:20 +08:00
Asias He
965097cde5 token_metadata: Add docs for is_member
Make it clear, is_member checks if a node is part of the token ring and
checks nothing else.
2022-11-18 09:28:56 +08:00
Asias He
a495b71858 token_metadata: Do not use topology info for is_member check
Since commit a980f94 (token_metadata: impl: keep the set of normal token
owners as a member), we have a set, _normal_token_owners, which contains
all the nodes in the ring.

We can use _normal_token_owners to check if a node is part of the ring
directly instead of going through the _toplogy indirectly.

Fixes #11935
2022-11-18 09:28:56 +08:00
Asias He
f2ca790883 token_metadata: Check node is part of the topology instead of the ring
update_normal_tokens is the way to add a new node into the ring. We
should not require a new node to already be in the ring to be able to
add it to the ring. The current code works accidentally because
is_member is checking if a node is in the topology

We should use _topology.has_endpoint to check if a node is part of the
topology explicitly.
2022-11-18 09:28:56 +08:00
Jan Ciolek
77d68153f1 test preparing expr::usertype_constructor
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:41:10 +01:00
Jan Ciolek
eb92fb4289 expr_test: test that prepare_expression checks style_type of collection_constructor
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:41:10 +01:00
Jan Ciolek
77c63a6b92 expr_test: test preparing expr::collection_constructor for map
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:41:09 +01:00
Jan Ciolek
db67ade778 prepare_expr: make preparing nonfrozen empty maps return null
In Scylla and Cassandra inserting an empty collection
that is not frozen, is interpreted as inserting a null value.

list_prepare_expression and set_prepare_expression
have an if which handles this behavior, but there
wasn't one in map_prepare_expression.

As a result preparing empty list or set would result in null,
but preparing an empty map wouldn't. This is inconsistent,
it's better to return null in all cases of empty nonfrozen
collections.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:41:09 +01:00
Jan Ciolek
da71f9b50b prepare_expr: fix a bug in map_prepare_expression
map_prepare_expression takes a collection_constructor
of unprepared items and prepares it.

Elements of a map collection_constructor are tuples (key and value).

map_prepare_expression creates a prepared collection_constructor
by preparing each tuple and adding it to the result.

During this preparation it needs to set the type of the tuple.
There was a bug here - it took the type from unprepared
tuple_constructor and assigned it to the prepared one.
An unprepared tuple_constructor doesn't have a type
so it ended up assigning nullptr.

Instead of that it should create a tuple_type_impl instance
by looking at the types of map key and values,
and use this tuple_type_impl as the type of the prepared tuples.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:35:04 +01:00
Jan Ciolek
a656fdfe9a expr_test: test preparing expr::collection_constructor for set
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:22:37 +01:00
Jan Ciolek
76f587cfe7 expr_test: test preparing expr::collection_constructor for list
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:22:37 +01:00
Jan Ciolek
44b55e6caf expr_test: test preparing expr::tuple_constructor
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:22:37 +01:00
Jan Ciolek
265100a638 expr_test: test preparing expr::untyped_constant
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:22:37 +01:00
Jan Ciolek
f6b9100cd2 expr_test_utils: add make_bigint_raw/const
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:22:37 +01:00
Jan Ciolek
f9ff131f86 expr_test_utils: add make_tinyint_raw/const
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:22:36 +01:00
Jan Ciolek
76b6161386 expr_test: test preparing expr::bind_variable
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:22:36 +01:00
Jan Ciolek
4882724066 cql3: prepare_expr: forbid preparing bind_variable without a receiver
prepare_expression treats receiver as an optional argument,
it can be set to nullptr and the preparation should
still succeed when it's possible to infer the type of an expression.

preparing a bind_variable requires the receiver to be present,
because it doesn't contain any information about the type
of the bound value.

Added a check that the receiver is present.
Allowing to prepare a bind_variable without
the receiver present was a bug.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 20:22:36 +01:00
Avi Kivity
2779a171fc Merge 'Do not run aborted tasks' from Aleksandra Martyniuk
task_manager::task::impl contains an abort source which can
be used to check whether it is aborted and an abort method
which aborts the task (request_abort on abort_source) and all
its descendants recursively.

When the start method is called after the task was aborted,
then its state is set to failed and the task does not run.

Fixes: #11995

Closes #11996

* github.com:scylladb/scylladb:
  tasks: do not run tasks that are aborted
  tasks: delete unused variable
  tasks: add abort_source to task_manager::task::impl
2022-11-17 19:42:46 +02:00
Pavel Emelyanov
a396c27efc Merge 'message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client' from Kamil Braun
`get_rpc_client` calculates a `topology_ignored` field when creating a
client which says whether the client's endpoint had topology information
when this client was created. This is later used to check if that client
needs to be dropped and replaced with a new client which uses the
correct topology information.

The `topology_ignored` field was incorrectly calculated as `true` for
pending endpoints even though we had topology information for them. This
would lead to unnecessary drops of RPC clients later. Fix this.

Remove the default parameter for `with_pending` from
`topology::has_endpoint` to avoid similar bugs in the future.

Apparently this fixes #11780. The verbs used by decommission operation
use RPC client index 1 (see `do_get_rpc_client_idx` in
message/messaging_service.cc). From local testing with additional
logging I found that by the time this client is created (i.e. the first
verb in this group is used), we already know the topology. The node is
pending at that point - hence the bug would cause us to assume we don't
know the topology, leading us to dropping the RPC client later, possibly
in the middle of a decommission operation.

Fixes: #11780

Closes #11942

* github.com:scylladb/scylladb:
  message: messaging_service: check for known topology before calling is_same_dc/rack
  test: reenable test_topology::test_decommission_node_add_column
  test/pylib: util: configurable period in wait_for
  message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client
  message: messaging_service: topology independent connection settings for GOSSIP verbs
2022-11-17 20:14:32 +03:00
Jan Ciolek
42e01cc67f expr_test: test preparing expr::null
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 17:30:05 +01:00
Jan Ciolek
45b3fca71c expr_test: test preparing expr::cast
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 17:30:05 +01:00
Jan Ciolek
498c9bfa0d expr_test_utils: add make_receiver
Add a convenience function which creates receivers.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 17:30:04 +01:00
Jan Ciolek
6873a21fbd expr_test_utils: add make_smallint_raw/const
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 17:30:04 +01:00
Jan Ciolek
488056acb7 expr_test: test preparing expr::token
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 17:30:04 +01:00
Jan Ciolek
7958f77a40 expr_test: test preparing expr::subscript
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 17:30:04 +01:00
Jan Ciolek
569bd61c6c expr_test: test preparing expr::column_value
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 17:30:04 +01:00
Jan Ciolek
26174e29c6 expr_test: test preparing expr::unresolved_identifier
It's interesting that prepare_expression
for column identifiers doesn't require a receiver.
I hope this won't break validation in the future.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 17:30:04 +01:00
Jan Ciolek
c719a923bb expr_test_utils: mock data_dictionary::database
Add a function which creates a mock instance
of data_dictionary::database.

prepare_expression requires a data_dictionary::database
as an argument, so unit tests for it need something
to pass there. make_data_dictionary_database can
be used to create an instance that is sufficient for tests.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-17 17:30:00 +01:00
Kamil Braun
8e8c32befe test/pylib: manager_client: print used cluster when starting test case
It will be easier to map server IDs to their IP addresses when browsing
through the test logs.
2022-11-17 17:14:23 +01:00
Pavel Emelyanov
bc62ca46d4 lister: Make lister::dir_entry_types an enum_set
This type is currently an unordered_set, but only consists of at most
two elements. Making it an enum_set renders it into a size_t variable
and better describes the intention.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-17 19:01:45 +03:00
Pavel Emelyanov
c6021b57a1 database: Avoid useless local variable
It's used to run lister::scan_dir() with directory_entry_type::directory
only, but for that is copied around on lambda captures. It's simpler
just to use the value directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-17 19:00:49 +03:00
Kamil Braun
b83234d8aa test/pylib: scylla_cluster: use server ID to name workdir and log file, not IP address
Since recently the framework uses a separate set of unique IDs to
identify servers, but the log file and workdir is still named using the
last part of the IP address.

This is confusing: the test logs sometimes don't provide the IP addr
(only the ID), and even if they do, the reader of the test log may not
know that they need to look at the last part of the IP to find the
node's log/workdir.

Also using ID will be necessary if we want to reuse IP addresses (e.g.
during node replace, or simply not to run out of IP addresses during
testing).
2022-11-17 16:55:12 +01:00
Anna Stuchlik
f7f03e38ee doc: update the link to Enabling Experimental Features 2022-11-17 15:44:46 +01:00
Anna Stuchlik
02cea98f55 doc: remove the note referring to the previous ScyllaDB versions and add the relevant limitation to the paragraph 2022-11-17 15:05:00 +01:00
Anna Stuchlik
ce88c61785 doc: update the links to the Enabling Experimental Features section 2022-11-17 14:59:34 +01:00
Avi Kivity
76be6402ed Merge 'repair: harden effective replication map' from Benny Halevy
As described in #11993 per-shard repair_info instances get the effective_replication_map on their own with no centralized synchronization.

This series ensures that the effective replication maps used by repair (and other associated structures like the token metadata and topology) are all in sync with the one used to initiate the repair operation.

While at at, the series includes other cleanups in this area in repair and view that are not fixes as the calls happen in synchronous functions that do not yield.

Fixes #11993

Closes #11994

* github.com:scylladb/scylladb:
  repair: pass erm down to get_hosts_participating_in_repair and get_neighbors
  repair: pass effective_replication_map down to repair_info
  repair: coroutinize sync_data_using_repair
  repair: futurize do_repair_start
  effective_replication_map: add global_effective_replication_map
  shared_token_metadata: get_lock is const
  repair: sync_data_using_repair: require to run on shard 0
  repair: require all node operations to be called on shard 0
  repair: repair_info: keep effective_replication_map
  repair: do_repair_start: use keyspace erm to get keyspace local ranges
  repair: do_repair_start: use keyspace erm for get_primary_ranges
  repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc
  repair: do_repair_start: check_in_shutdown first
  repair: get_db().local() where needed
  repair: get topology from erm/token_metdata_ptr
  view: get_view_natural_endpoint: get topology from erm
2022-11-17 13:29:02 +02:00
Konstantin Osipov
262566216b raft: persist the initial raft address map 2022-11-17 14:26:36 +03:00
Konstantin Osipov
b35af73fdf raft: (upgrade) do not use IP addresses from Raft config
Always use raft address map to obtain the IP addresses
of upgrade peers. Right now the map is populated
from Raft configuration, so it's an equivalent transformation,
but in the future raft address map will be populated from other sources:
discovery and gossip, hence the logic of upgrade will change as well.

Do not proceed with the upgrade if an address is
missing from the map, since it means we failed to contact a raft member.
2022-11-17 14:26:31 +03:00
Pavel Emelyanov
2add9ba292 Merge 'Refactor topology out of token_metadata' from Benny Halevy
This series moves the topology code from locator/token_metadata.{cc,hh} out to localtor/topology.{cc,hh}
and introduces a shared header file: locator/types.hh contains shared, low level definitions, in anticipation of https://github.com/scylladb/scylladb/pull/11987

While at it, the token_metadata functions are turned into coroutines
and topology copy constructor is deleted.  The copy functionality is moved into an async `clone_gently` function that allows yielding while copying the topology.

Closes #12001

* github.com:scylladb/scylladb:
  locator: refactor topology out of token_metadata
  locator: add types.hh
  topology: delete copy constructor
  token_metadata: coroutinize clone functions
2022-11-17 13:55:34 +03:00
Aleksandra Martyniuk
7ead1a7857 compaction: request abort only once in compaction_data::stop
compaction_manager::task (and thus compaction_data) can be stopped
because of many different reasons. Thus, abort can be requested more
than once on compaction_data abort source causing a crash.

To prevent this before each request_abort() we check whether an abort
was requested before.

Closes #12004
2022-11-17 12:44:59 +02:00
Benny Halevy
1e2741d2fe abstract_replication_strategy: recognized_options: return unordered_set
An unordered_set is more efficient and there is no need
to return an ordered set for this purpose.

This change facilitates a follow-up change of adding
topology::get_datacenters(), returning an unordered_set
of datacenter names.

Refs #11987

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12003
2022-11-17 11:27:05 +02:00
Botond Dénes
e925c41f02 utils/gs/barrett.hh: aarch64: s/brarett/barrett/
Fix a typo introduced by the the recent patch fixing the spelling of
Barrett. The patch introduced a typo in the aarch64 version of the code,
which wasn't found by promotion, as that only builds on X86_64.

Closes #12006
2022-11-17 11:09:59 +02:00
Konstantin Osipov
051dceeaff raft: (and gossip) begin gossiping raft server ids
We plan to use gossip data to educate Raft RPC about IP addresses
of raft peers. Add raft server ids to application state, so
that when we get a notification about a gossip peer we can
identify which raft server id this notification is for,
specifically, we can find what IP address stands for this server
id, and, whenever the IP address changes, we can update Raft
address map with the new address.

On the same token, at boot time, we now have to start Gossip
before Raft, since Raft won't be able to send any messages
without gossip data about IP addresses.
2022-11-17 12:07:31 +03:00
Konstantin Osipov
990c7a209f raft: change the API of conf change notifications
Pass a change diff into the notification callback,
rather than add or remove servers one by one, so that
if we need to persist the state, we can do it once per
configuration change, not for every added or removed server.

For now still pass added and removed entries in two separate calls
per a single configuration change. This is done mainly to fulfill the
library contract that it never sends messages to servers
outside the current configuration. The group0 RPC
implementation doesn't need the two calls, since it simply
marks the removed servers as expired: they are not removed immediately
anyway, and messages can still be delivered to them.
However, there may be test/mock implementations of RPC which
could benefit from this contract, so we decided to keep it.
2022-11-17 12:07:31 +03:00
Benny Halevy
53fdf75cf9 repair: pass erm down to get_hosts_participating_in_repair and get_neighbors
Now that it is available in repair_info.

Fixes #11993

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 08:07:30 +02:00
Benny Halevy
b69be61f41 repair: pass effective_replication_map down to repair_info
And make sure the token_metadata ring version is same as the
reference one (from the erm on shard 0), when starting the
repair on each shard.

Refs #11993

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 08:07:29 +02:00
Benny Halevy
c47d36b53d repair: coroutinize sync_data_using_repair
Prepare for the next path that will co_await
make_global_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 08:07:04 +02:00
Benny Halevy
58b1c17f5d repair: futurize do_repair_start
Turn it into a coroutine to prepare for the next path
that will co_await make_global_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 08:07:04 +02:00
Benny Halevy
4b9269b7e2 effective_replication_map: add global_effective_replication_map
Class to hold a coherent view of a keyspace
effective replication map on all shards.

To be used in a following patch to pass the sharded
keyspace e_r_m:s to repair.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 08:07:01 +02:00
Avi Kivity
b8b78959fb build: switch to packaged libdeflate rather than a submodule
Now that our toolchain is based on Fedora 37, we can rely on its
libdeflate rather than have to carry our own in a submodule.

Frozen toolchain is regenerated. As a side effect clang is updated
from 15.0.0 to 15.0.4.

Closes #12000
2022-11-17 08:01:00 +02:00
Benny Halevy
2c677e294b shared_token_metadata: get_lock is const
The lock is acquired using an a function that
doesn't modify the shared_token_metadata object.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:58:21 +02:00
Benny Halevy
d6b2124903 repair: sync_data_using_repair: require to run on shard 0
And with that do_sync_data_using_repair can be folded into
sync_data_using_repair.

This will simplify using the effective_replication_map
throughout the operation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:58:21 +02:00
Benny Halevy
0c56c75cf8 repair: require all node operations to be called on shard 0
To simplify using of the effective_replication_map / token_metadata_ptr
throught the operation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:58:21 +02:00
Benny Halevy
64b0756adc repair: repair_info: keep effective_replication_map
Sampled when repair info is constructed.
To be used throughout the repair process.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:58:21 +02:00
Benny Halevy
c7d753cd44 repair: do_repair_start: use keyspace erm to get keyspace local ranges
Rather than calling db.get_keyspace_local_ranges that
looks up the keyspace and its erm again.

We want all the inforamtion derived from the erm to
be based on the same source.

The function is synchronous so this changes doesn't
fix anything, just cleans up the code.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:58:21 +02:00
Benny Halevy
aaf74776c2 repair: do_repair_start: use keyspace erm for get_primary_ranges
Ensure that the primary ranges are in sync with the
keyspace erm.

The function is synchronous so this change doesn't fix anything,
it just cleans up the code.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:58:21 +02:00
Benny Halevy
9200e6b005 repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc
Ensure the erm and topology are in sync.

The function is synchronous so this change doesn't fix
anything, just cleans up the code.

Fix mistake in comment while at it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:57:56 +02:00
Benny Halevy
59dc2567fd repair: do_repair_start: check_in_shutdown first
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:56:34 +02:00
Benny Halevy
881eb0df83 repair: get_db().local() where needed
In several places we get the sharded database using get_db()
and then we only use db.local().  Simplify the code by keeping
reference only to the local database upfront.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:56:34 +02:00
Benny Halevy
c22c4c8527 repair: get topology from erm/token_metdata_ptr
We want the topology to be synchronized with the respective
effective_replication_map / token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:56:34 +02:00
Benny Halevy
94f2e95a2f view: get_view_natural_endpoint: get topology from erm
Get the topology for the effective replication map rather
than from the storage_proxy to ensure its synchronized
with the natural endpoints.

Since there's no preemption between the two calls
currently there is no issue, so this is merely a clean up
of the code and not supposed to fix anything.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:56:34 +02:00
Nadav Har'El
e393639114 test/cql-pytest: reproducer for crash in LWT with null key
This patch adds a reproducer for issue #11954: Attempting an
"IF NOT EXISTS" (LWT) write with a null key crashes Scylla,
instead of producing a simple error message (like happens
without the "IF NOT EXISTS" after #7852 was fixed).

The test passed on Cassandra, but crashes Scylla. Because of this
crash, we can't just mark the test "xfail" and it's temporarily
marked "skip" instead.

Refs #11954.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11982
2022-11-17 07:31:13 +02:00
Benny Halevy
d0bd305d16 locator: refactor topology out of token_metadata
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-16 21:55:54 +02:00
Benny Halevy
297a4de4e4 locator: add types.hh
To export low-level types that are used by oher modules
for the locator interfaces.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-16 21:53:05 +02:00
Kamil Braun
0c9cb5c5bf Merge 'raft: wait for the next tick before retrying' from Gusev Petr
When `modify_config` or `add_entry` is forwarded to the leader, it may
reach the node at "inappropriate" time and result in an exception. There
are two reasons for it - the leader is changing and, in case of
`modify_config`, other `modify_config` is currently in progress. In both
cases the command is retried, but before this patch there was no delay
before retrying, which could led to a tight loop.

The patch adds a new exception type `transient_error`. When the client
receives it, it is obliged to retry the request after some delay.
Previously leader-side exceptions were converted to `not_a_leader`,
which is strange, especially for `conf_change_in_progress`.

Fixes: #11564

Closes #11769

* github.com:scylladb/scylladb:
  raft: rafactor: remove duplicate code on retries delays
  raft: use wait_for_next_tick in read_barrier
  raft: wait for the next tick before retrying
2022-11-16 18:20:54 +01:00
Aleksandra Martyniuk
4250bd9458 tasks: do not run tasks that are aborted
Currently in start() method a task is run even if it was already
aborted.

When start() is called on an aborted task, its state is set to
task_manager::task_state::failed and it doesn't run.
2022-11-16 18:09:41 +01:00
Aleksandra Martyniuk
ebffca7ea5 tasks: delete unused variable 2022-11-16 18:07:57 +01:00
Aleksandra Martyniuk
752edc2205 tasks: add abort_source to task_manager::task::impl
task_manager::task can be aborted with impl's abort_source.
By default abort request is propagated to all task's descendants.
2022-11-16 18:07:11 +01:00
Avi Kivity
c4f069c6fc Update seastar submodule
* seastar 153223a188...4f4cc00660 (10):
  > Merge 'Avoid using namespace internal' from Pavel Emelyanov
  > Merge 'De-futurize IO class update calls' from Pavel Emelyanov
  > abort_source: subscribe(): remove noexcept qualifier
  > Merge 'Add Prometheus filtering capabilities by label' from Amnon Heiman
  > fsqual: stop causing memory leak error on LeakSanitizer
  > metrics.cc: Do not merge empty histogram
  > Update tutorial.md
  > README-DPDK.md: document --cflags option
  > build: install liburing.pc using stow
  > core/polymorphic_temporary_buffer: include <seastar/core/memory.hh>

Closes #11991
2022-11-16 17:59:33 +02:00
Avi Kivity
3497891cf9 utils: spell "barrett" correctly
As P. T. Barnoom famously said, "write what you like but spell my name
correctly". Following that, we correct the spelling of Barrett's name
in the source tree.

Closes #11989
2022-11-16 16:30:38 +02:00
Benny Halevy
0c94ffcc85 topology: delete copy constructor
Topology is copied only from token_metadata_impl::clone_only_token_map
which copies the token_metadata_impl with yielding to prevent reactor
stalls.  This should apply to topology as well, so
add a clone_gently function for cloning the topology
from token_metadata_impl::clone_only_token_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-16 15:27:28 +02:00
Benny Halevy
4f4fc7fe22 token_metadata: coroutinize clone functions
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-16 15:27:28 +02:00
Kamil Braun
a83789160d message: messaging_service: check for known topology before calling is_same_dc/rack
`is_same_dc` and `is_same_rack` assume that the peer's topology is
known. If it's unknown, `on_internal_error` will be called inside
topology.

When these functions are used in `get_rpc_client`, they are already
protected by an earlier check for knowing the peer's topology
(the `has_topology()` lambda).

Another use is in `do_start_listen()`, where we create a filter for RPC
module to check if it should accept incoming connections. If cross-dc or
cross-rack encryption is enabled, we will reject connections attempts to
the regular (non-ssl) port from other dcs/rack using `is_same_dc/rack`.
However, it might happen that something (other Scylla node or otherwise)
tries to contact us on the regular port and we don't know that thing's
topology, which would result in `on_internal_error`. But this is not a
fatal error; we simply want to reject that connection. So protect these
calls as well.

Finally, there's `get_preferred_ip` with an unprotected `is_same_dc`
call which, for a given peer, may return a different IP from preferred IP
cache if the endpoint resides in the same DC. If there is not entry in
the preferred IP cache, we return the original (external) IP of the
peer. We can do the same if we don't know the peer's topology. It's
interesting that we didn't see this particular place blowing up. Perhaps
the preferred IP cache is always populated after we know the topology.
2022-11-16 14:01:50 +01:00
Kamil Braun
9b2449d3ea test: reenable test_topology::test_decommission_node_add_column
Also improve the test to increase the probability of reproducing #11780
by injecting sleeps in appropriate places.

Without the fix for #11780 from the earlier commit, the test reproduces
the issue in roughly half of all runs in dev build on my laptop.
2022-11-16 14:01:50 +01:00
Kamil Braun
0f49813312 test/pylib: util: configurable period in wait_for 2022-11-16 14:01:50 +01:00
Kamil Braun
1bd2471c19 message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client
`get_rpc_client` calculates a `topology_ignored` field when creating a
client which says whether the client's endpoint had topology information
when topology was created. This is later used to check if that client
needs to be dropped and replaced with a new client which uses the
correct topology information.

The `topology_ignored` field was incorrectly calculated as `true` for
pending endpoints even though we had topology information for them. This
would lead to unnecessary drops of RPC clients later. Fix this.

Remove the default parameter for `with_pending` from
`topology::has_endpoint` to avoid similar bugs in the future.

Apparently this fixes #11780. The verbs used by decommission operation
use RPC client index 1 (see `do_get_rpc_client_idx` in
message/messaging_service.cc). From local testing with additional
logging I found that by the time this client is created (i.e. the first
verb in this group is used), we already know the topology. The node is
pending at that point - hence the bug would cause us to assume we don't
know the topology, leading us to dropping the RPC client later, possibly
in the middle of a decommission operation.

Fixes: #11780
2022-11-16 14:01:50 +01:00
Kamil Braun
840be34b5f message: messaging_service: topology independent connection settings for GOSSIP verbs
The gossip verbs are used to learn about topology of other nodes.
If inter-dc/rack encryption is enabled, the knowledge of topology is
necessary to decide whether it's safe to send unencrypted messages to
nodes (i.e., whether the destination lies in the same dc/rack).

The logic in `messaging_service::get_rpc_client`, which decided whether
a connection must be encrypted, was this (given that encryption is
enabled): if the topology of the peer is known, and the peer is in the
same dc/rack, don't encrypt. Otherwise encrypt.

However, it may happen that node A knows node B's topology, but B
doesn't know A's topology. A deduces that B is in the same DC and rack
and tries sending B an unencrypted message. As the code currently
stands, this would cause B to call `on_internal_error`. This is what I
encountered when attempting to fix #11780.

To guarantee that it's always possible to deliver gossiper verbs (even
if one or both sides don't know each other's topology), and to simplify
reasoning about the system in general, choose connection settings that
are independent of the topology - for the connection used by gossiper
verbs (other connections are still topology-dependent and use complex
logic to handle the situation of unknown-and-later-known topology).

This connection only contains 'rare' and 'cheap' verbs, so it's not a
performance problem to always encrypt it (given that encryption is
configured). And this is what already was happening in the past; it was
at some point removed during topology knowledge management refactors. We
just bring this logic back.

Fixes #11992.

Inspired by xemul/scylla@45d48f3d02.
2022-11-16 13:58:07 +01:00
Anna Stuchlik
01c9846bb6 doc: add the link to the Enabling Experimental Features section 2022-11-16 13:24:45 +01:00
Anna Stuchlik
f1b2f44aad doc: move the TTL Alternator feature from the Experimental Features section to the production-ready section 2022-11-16 13:23:07 +01:00
Nadav Har'El
2f2f01b045 materialized views: fix view writes after base table schema change
When we write to a materialized view, we need to know some information
defined in the base table such as the columns in its schema. We have
a "view_info" object that tracks each view and its base.

This view_info object has a couple of mutable attributes which are
used to lazily-calculate and cache the SELECT statement needed to
read from the base table. If the base-table schema ever changes -
and the code calls set_base_info() at that point - we need to forget
this cached statement. If we don't (as before this patch), the SELECT
will use the wrong schema and writes will no longer work.

This patch also includes a reproducing test that failed before this
patch, and passes afterwords. The test creates a base table with a
view that has a non-trivial SELECT (it has a filter on one of the
base-regular columns), makes a benign modification to the base table
(just a silly addition of a comment), and then tries to write to the
view - and before this patch it fails.

Fixes #10026
Fixes #11542
2022-11-16 13:58:21 +02:00
Nadav Har'El
7cbb0b98bb Merge 'doc: document user defined functions (UDFs)' from Anna Stuchlik
This PR is V2 of the[ PR created by @psarna.](https://github.com/scylladb/scylladb/pull/11560).
I have:
- copied the content.
- applied the suggestions left by @nyh.
- made minor improvements, such as replacing "Scylla" with "ScyllaDB", fixing punctuation, and fixing the RST syntax.

Fixes https://github.com/scylladb/scylladb/issues/11378

Closes #11984

* github.com:scylladb/scylladb:
  doc: label user-defined functions as Experimental
  doc: restore the note for the Count function (removed by mistatke)
  doc: document user defined functions (UDFs)
2022-11-16 13:09:47 +02:00
Botond Dénes
cbf9be9715 Merge 'Avoid 0.0.0.0 (and :0) as preferred IP' from Pavel Emelyanov
Despite docs discourage from using INADDR_ANY as listen address, this is not disabled in code. Worse -- some snitch drivers may gossip it around as the INTERNAL_IP state. This set prevents this from happening and also adds a sanity check not to use this value if it somehow sneaks in.

Closes #11846

* github.com:scylladb/scylladb:
  messaging_service: Deny putting INADD_ANY as preferred ip
  messaging_service: Toss preferred ip cache management
  gossiping_property_file_snitch: Dont gossip INADDR_ANY preferred IP
  gossiping_property_file_snitch: Make _listen_address optional
2022-11-16 08:30:42 +02:00
Avi Kivity
43d3e91e56 tools: toolchain: prepare: use real bash associative array
When we translate from docker/go arch names to the kernel arch
names, we use an associative array hack using computed variable
names "{$!variable_name}". But it turns out bash has real
associative arrays, introduced with "declare -A". Use the to make
the code a little clearer.

Closes #11985
2022-11-16 08:17:47 +02:00
Botond Dénes
e90d0811d0 Merge 'doc: update ScyllaDB requirements - supported CPUs and AWS i4g instances' from Anna Stuchlik
Fix https://github.com/scylladb/scylla-docs/issues/4144

Closes #11226

* github.com:scylladb/scylladb:
  Update docs/getting-started/system-requirements.rst
  doc: specify the recommended AWS instance types
  doc: replace the tables with a generic description of support for Im4gn and Is4gen instances
  doc: add support for AWS i4g instances
  doc: extend the list of supported CPUs
2022-11-16 08:15:00 +02:00
Botond Dénes
bd1fcbc38f Merge 'Introduce reverse vector_deserializer.' from Michał Radwański
As indicated in #11816, we'd like to enable deserializing vectors in reverse.
The forward deserialization is achieved by reading from an input_stream. The
input stream internally is a singly linked list with complicated logic. In order to
allow for going through it in reverse, instead when creating the reverse vector
initializer, we scan the stream and store substreams to all the places that are a
starting point for a next element. The iterator itself just deserializes elements
from the remembered substreams, this time in reverse.

Fixes #11816

Closes #11956

* github.com:scylladb/scylladb:
  test/boost/serialization_test.cc: add test for reverse vector deserializer
  serializer_impl.hh: add reverse vector serializer
  serializer_impl: remove unneeded generic parameter
2022-11-16 07:37:24 +02:00
Anna Stuchlik
cdb6557f23 doc: label user-defined functions as Experimental 2022-11-15 21:22:01 +01:00
Avi Kivity
d85f731478 build: update toolchain to Fedora 37 with clang 15
'cargo' instantiation now overrides internal git client with
cli client due to unbounded memory usage [1].

[1] https://github.com/rust-lang/cargo/issues/10583#issuecomment-1129997984
2022-11-15 16:48:09 +00:00
Anna Stuchlik
1f1d88d04e doc: restore the note for the Count function (removed by mistatke) 2022-11-15 17:41:22 +01:00
Anna Stuchlik
dbb19f55fb doc: document user defined functions (UDFs) 2022-11-15 17:33:05 +01:00
Nadav Har'El
e4dba6a830 test/cql-pytest: add test for when MV requires IS NOT NULL
As noted in issue #11979, Scylla inconsistently (and unlike Cassandra)
requires "IS NOT NULL" one some but not all materialized-view key
columns. Specifically, Scylla does not require "IS NOT NULL" on the
base's partition key, while Cassandra does.

This patch is a test which demonstrates this inconsistency. It currently
passes on Cassandra and fails on Scylla, so is marked xfail.

Refs #11979

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11980
2022-11-15 14:21:48 +01:00
Asias He
16bd9ec8b1 gossip: Improve get_live_token_owners and get_unreachable_token_owners
The get_live_token_owners returns the nodes that are part of the ring
and live.

The get_unreachable_token_owners returns the nodes that are part of the ring
and is not alive.

The token_metadata::get_all_endpoints returns nodes that are part of the
ring.

The patch changes both functions to use the more authoritative source to
get the nodes that are part of the ring and call is_alive to check if
the node is up or down. So that the correctness does not depend on
any derived information.

This patch fixes a truncate issue in storage_proxy::truncate_blocking
where it calls get_live_token_owners and get_unreachable_token_owners to
decide the nodes to talk with for truncate operation. The truncate
failed because incorrect nodes were returned.

Fixes #10296
Fixes #11928

Closes #11952
2022-11-15 14:21:48 +01:00
Botond Dénes
21489c9f9c Merge 'doc: add the "Scylladb Enterprise" label to the Enterprise-only features' from Anna Stuchlik
This PR is a follow-up to https://github.com/scylladb/scylladb/pull/11918.

With this PR:
- The "ScyllaDB Enterprise" label is added to all the features that are only available in ScyllaDB Enterprise.
- The previous Enterprise-only note is removed (it was included in multiple files as _/rst_include/enterprise-only-note.rst_ - this file is removed as it is no longer used anywhere in the docs).
- "Scylla Enterprise" was removed from `versionadded `because now it's clear that the feature was added for Enterprise.

Closes #11975

* github.com:scylladb/scylladb:
  doc: remove the enterprise-only-note.rst file, which was replaced by the ScyllaDB Enterprise label and is not used anymore
  doc: add the ScyllaDB Enterprise label to the descriptions of Enterprise-only features
2022-11-15 14:21:48 +01:00
Botond Dénes
34f29c8d67 Merge 'Use with_sstable_directory() helper in tests' from Pavel Emelyanov
The helper is already widely used, one (last) test case can benefit from using it too

Closes #11978

* github.com:scylladb/scylladb:
  test: Indentation fix after previous patch
  test: Wse with_sstable_directory() helper
2022-11-15 14:21:48 +01:00
Nadav Har'El
8a4ab87e44 Merge 'utils: crc: generate crc barrett fold tables at compile time' from Avi Kivity
We use Barrett tables (misspelled in the code unfortunately) to fold
crc computations of multiple buffers into a single crc. This is important
because it turns out to be faster to compute crc of three different buffers
in parallel rather than compute the crc of one large buffer, since the crc
instruction has latency 3.

Currently, we have a separate code generation step to compute the
fold tables. The step generates a new C++ source files with the tables.
But modern C++ allows us to do this computation at compile time, avoiding
the code generation step. This simplifies the build.

This series does that. There is some complication in that the code uses
compiler intrinsics for the computation, and these are not constexpr friendly.
So we first introduce constexpr-friendly alternatives and use them.

To prove the transformation is correct, I compared the generated code from
before the series and from just before the last step (where we use constexpr
evaluation but still retain the generated file) and saw no difference in the values.

Note that constexpr is not strictly needed - we could have run the code in the
global variables' initializer. But that would cause a crash if we run on a pre-clmul
machine, and is not as fun.

Closes #11957

* github.com:scylladb/scylladb:
  test: crc: add unit tests for constexpr clmul and barrett fold
  utils: crc combine table: generate at compile time
  utils: barrett: inline functions in header
  utils: crc combine table: generate tables at compile time
  utils: crc combine table: extract table generation into a constexpr function
  utils: crc combine table: extract "pow table" code into constexpr function
  utils: crc combine table: store tables std::arrray rather than C array
  utils: barrett: make the barrett reduction constexpr friendly
  utils: clmul: add 64-bit constexpr clmul
  utils: barrett: extract barrett reduction constants
  utils: barrett: reorder functions
  utils: make clmul() constexpr
2022-11-15 14:21:48 +01:00
Petr Gusev
ae3e0e3627 raft: rafactor: remove duplicate code on retries delays
Introduce a templated function do_on_leader_with_retries,
use it in add_entries/modify_config/read_barrier. The
function implements the basic logic of retries with aborts
and leader changes handling, adds a delay between
iterations to protect against tight loops.
2022-11-15 13:18:53 +04:00
Petr Gusev
15cc1667d0 raft: use wait_for_next_tick in read_barrier
Replaced the yield on transport_error
with wait_for_next_tick. Added delays for retries, similar
to add_entry/modify_config: we postpone the next
call attempt if we haven't received new information
about the current leader.
2022-11-15 12:31:49 +04:00
Petr Gusev
5e15c3c9bd raft: wait for the next tick before retrying
When modify_config or add_entry is forwarded
to the leader, it may reach the node at
"inappropriate" time and result in an exception.
There are two reasons for it - the leader is
changing and, in case of modify_config, other
modify_config is currently in progress. In
both cases the command is retried, but before
this patch there was no delay before retrying,
which could led to a tight loop.

The patch adds a new exception type transient_error.
When the client node receives it, it is obliged to retry
the request, possibly after some delay. Previously, leader-side
exceptions were converted to not_a_leader exception,
which is strange, especially for conf_change_in_progress.

We add a delay before retrying in modify_config
and add_entry if the client hasn't received any new
information about the leader since the last attempt.
This can happen if the server
responds with a transient_error with an empty leader
and the current node has not yet learned the new leader.
We neglect an excessive delay if the newly elected leader
is the same as the previous one, this supposed to be a rare.

Fixes: #11564
2022-11-15 11:49:26 +04:00
Pavel Emelyanov
8dcd9d98d6 test: Indentation fix after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-14 20:11:01 +03:00
Pavel Emelyanov
c9128e9791 test: Wse with_sstable_directory() helper
It's already used everywhere, but one test case wires up the
sstable_directory by hand. Fix it too, but keep in mind, that the caller
fn stops the directory early.

(indentation is deliberately left broken)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-14 20:11:01 +03:00
Michał Radwański
32c60b44c5 test/boost/serialization_test.cc: add test for reverse vector
deserializer

This test is just a copy-pasted version of forward serializer test.
2022-11-14 16:06:24 +01:00
Michał Radwański
dce67f42f8 serializer_impl.hh: add reverse vector serializer
Currently when we want to deserialize mutation in reverse, we unfreeze
it and consume from the end. This new reverse vector deserializer
goes through input stream remembering substreams that contain a given
output range member, and while traversing from the back, deserialize
each substream.
2022-11-14 16:06:24 +01:00
Anna Stuchlik
e36bd208cc doc: remove the enterprise-only-note.rst file, which was replaced by the ScyllaDB Enterprise label and is not used anymore 2022-11-14 15:20:51 +01:00
Anna Stuchlik
36324fe748 doc: add the ScyllaDB Enterprise label to the descriptions of Enterprise-only features 2022-11-14 15:16:51 +01:00
Takuya ASADA
da6c472db9 install.sh: Skip systemd existance check when --without-systemd
When --without-systemd specified, install.sh should skip systemd
existance check.

Fixes #11898

Closes #11934
2022-11-14 14:07:46 +02:00
Benny Halevy
ff5527deb1 topology: copy _sort_by_proximity in copy constructor
Fixes #11962

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11965
2022-11-14 13:59:56 +03:00
Pavel Emelyanov
bd48fdaad5 Merge 'handle_state_normal: do not update topology of removed endpoint' from Benny Halevy
Currently, when replacing a node ip, keeping the old host,
we might end up with the the old endpoint in system.peers
if it is inserted back into the topology by `handle_state_normal`
when on_join is called with the old endpoint.

Then, later on, on_change sees that:
```
    if (get_token_metadata().is_member(endpoint)) {
        co_await do_update_system_peers_table(endpoint, state, value);
```

As described in #11925.

Fixes #11925

Closes #11930

* github.com:scylladb/scylladb:
  storage_service, system_keyspace: add debugging around system.peers update
  storage_service: handle_state_normal: update topology and notify_joined endpoint only if not removed
2022-11-14 13:58:28 +03:00
Botond Dénes
8e38551d93 Merge 'Allow each compaction group to have its own compaction backlog tracker' from Raphael "Raph" Carvalho
Today, compaction_backlog_tracker is managed in each compaction_strategy
implementation. So every compaction strategy is managing its own
tracker and providing a reference to it through get_backlog_tracker().

But this prevents each group from having its own tracker, because
there's only a single compaction_strategy instance per table.
To remove this limitation, compaction_strategy impl will no longer
manage trackers but will instead provide an interface for trackers
to be created, such that each compaction_group will be allowed to
create its own tracker and manage it by itself.

Now table's backlog will be the sum of all compaction_group backlogs.
The normalization factor is applied on the sum, so we don't have
to adjust each individual backlog to any factor.

Closes #11762

* github.com:scylladb/scylladb:
  replica: Allow one compaction_backlog_tracker for each compaction_group
  compaction: Make compaction_state available for compaction tasks being stopped
  compaction: Implement move assignment for compaction_backlog_tracker
  compaction: Fix compaction_backlog_tracker move ctor
  compaction: Use table_state's backlog tracker in compaction_read_monitor_generator
  compaction: kill undefined get_unimplemented_backlog_tracker()
  replica: Refactor table::set_compaction_strategy for multiple groups
  Fix exception safety when transferring ongoing charges to new backlog tracker
  replica: move_sstables_from_staging: Use tracker from group owning the SSTable
  replica: Move table::backlog_tracker_adjust_charges() to compaction_group
  replica: table::discard_sstables: Use compaction_group's backlog tracker
  replica: Disable backlog tracker in compaction_group::stop()
  replica: database_sstable_write_monitor: use compaction_group's backlog tracker
  replica: Move table::do_add_sstable() to compaction_group
  test/sstable_compaction_test: Switch to table_state::get_backlog_tracker()
  compaction/table_state: Introduce get_backlog_tracker()
2022-11-14 07:05:28 +02:00
Avi Kivity
b8cb34b928 test: crc: add unit tests for constexpr clmul and barrett fold
Check that the constexpr variants indeed match the runtime variants.

I verified manually that exactly one computation in each test is
executed at run time (and is compared against a constant).
2022-11-13 16:22:29 +02:00
Avi Kivity
70217b5109 utils: crc combine table: generate at compile time
By now the crc combine tables are generated at compile time,
but still in a separate code generation step. We now eliminate
the code generation step and instead link the global variables
directly into the main executable. The global variables have
been conveniently named exactly as the code generation step
names them, so we don't need to touch any users.
2022-11-12 17:26:45 +02:00
Avi Kivity
164e991181 utils: barrett: inline functions in header
Avoid duplicate definitions if the same header is used from more than
one place, at it will soon be.
2022-11-12 17:26:08 +02:00
Avi Kivity
a4f06773da utils: crc combine table: generate tables at compile time
Move the tables into global constinit variables that are
generated at compile time. Note the code that creates
the generated crc32_combine_table.cc is still called; it
transorms compile-time generated tables into a C++ source
that contains the same values, as literals.

If we generate a diff between gen/utils/gz/crc_combine_table.cc
before this series and after this patch, we see the only change
in the file is the type of the variable (which changed to
std::array), proving our constexpr code is correct.
2022-11-12 17:16:59 +02:00
Avi Kivity
a229fdc41e utils: crc combine table: extract table generation into a constexpr function
Move the code to a constexpr function, so we can later generate the tables at
compile time. Note that although the function is constexpr, it is still
evaluated at runtime, since the calling function (main()) isn't constexpr
itself.
2022-11-12 17:13:52 +02:00
Avi Kivity
d42bec59bb utils: crc combine table: extract "pow table" code into constexpr function
A "pow table" is used to generate the Barrett fold tables. Extract its
code into a constexpr function so we can later generate the fold tables
at compile time.
2022-11-12 17:11:44 +02:00
Avi Kivity
6e34014b64 utils: crc combine table: store tables std::arrray rather than C array
C arrays cannot be returned from functions and therefore aren't suitable
for constexpr processing. std::array<> is a regular value and so is
constexpr friendly.
2022-11-12 17:09:02 +02:00
Avi Kivity
1e9252f79a utils: barrett: make the barrett reduction constexpr friendly
Dispatch to intrinsics or constexpr based on evaluation context.
2022-11-12 17:04:44 +02:00
Avi Kivity
0bd90b5465 utils: clmul: add 64-bit constexpr clmul
This is used when generating the Barrett reduction tables, and also when
applying the Barrett reduction at runtime, so we need it to be constexpr
friendly.
2022-11-12 17:04:05 +02:00
Avi Kivity
c376c539b8 utils: barrett: extract barrett reduction constants
The constants are repeated across x86_64 and aarch64, so extract
them into a common definition.
2022-11-12 17:00:17 +02:00
Avi Kivity
2fdf81af7b utils: barrett: reorder functions
Reorder functions in dependency order rather than forward
declaring them. This makes them more constexpr-friendly.
2022-11-12 16:52:41 +02:00
Avi Kivity
8aa59a897e utils: make clmul() constexpr
clmul() is a pure function and so should already be constexpr,
but it uses intrinsics that aren't defined as constexpr and
so the compiler can't really compute it at compile time.

Fix by defining a constexpr variant and dispatching based
on whether we're being constant-evaluated or not.

The implementation is simple, but in any case proof that it
is correct will be provided later on.
2022-11-12 16:49:43 +02:00
Raphael S. Carvalho
b88acffd66 replica: Allow one compaction_backlog_tracker for each compaction_group
Today, compaction_backlog_tracker is managed in each compaction_strategy
implementation. So every compaction strategy is managing its own
tracker and providing a reference to it through get_backlog_tracker().

But this prevents each group from having its own tracker, because
there's only a single compaction_strategy instance per table.
To remove this limitation, compaction_strategy impl will no longer
manage trackers but will instead provide an interface for trackers
to be created, such that each compaction group will be allowed to
have its own tracker, which will be managed by compaction manager.

On compaction strategy change, table will update each group with
the new tracker, which is created using the previously introduced
ompaction_group_sstable_set_updater.

Now table's backlog will be the sum of all compaction_group backlogs.
The normalization factor is applied on the sum, so we don't have
to adjust each individual backlog to any factor.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:22:51 -03:00
Raphael S. Carvalho
d862dd815c compaction: Make compaction_state available for compaction tasks being stopped
compaction_backlog_tracker will be managed by compaction_manager, in the
per table state. As compaction tasks can access the tracker throughout
its lifetime, remove() can only deregister the state once we're done
stopping all tasks which map to that state.
remove() extracted the state upfront, then performed the stop, to
prevent new tasks from being registered and left behind. But we can
avoid the leak of new tasks by only closing the gate, which waits
for all tasks (which are stopped a step earlier) and once closed,
prevents new tasks from being registered.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:22:51 -03:00
Raphael S. Carvalho
0a152a2670 compaction: Implement move assignment for compaction_backlog_tracker
That's needed for std::optional to work on its behalf.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:22:49 -03:00
Raphael S. Carvalho
fe305cefd0 compaction: Fix compaction_backlog_tracker move ctor
Luckily it's not used anywhere. Default move ctor was picked but
it won't clear _manager of old object, meaning that its destructor
will incorrectly deregister the tracker from
compaction_backlog_manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:37 -03:00
Raphael S. Carvalho
8e1e30842d compaction: Use table_state's backlog tracker in compaction_read_monitor_generator
A step closer towards a separate backlog tracker for each compaction group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:37 -03:00
Raphael S. Carvalho
fedafd76eb compaction: kill undefined get_unimplemented_backlog_tracker()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:37 -03:00
Raphael S. Carvalho
90991bda69 replica: Refactor table::set_compaction_strategy for multiple groups
Refactoring the function for it to accomodate multiple compaction
groups.

To still provide strong exception guarantees, preparation and
execution of changes will be separated.

Once multiple groups are supported, each group will be prepared
first, and the noexcept execution will be done as a last step.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:37 -03:00
Raphael S. Carvalho
244efddb22 Fix exception safety when transferring ongoing charges to new backlog tracker
When setting a new strategy, the charges of old tracker is transferred
to the new one.

The problem is that we're not reverting changes if exception is
triggered before the new strategy is successfully set.

To fix this exception safety issue, let's copy the charges instead
of moving them. If exception is triggered, the old tracker is still
the one used and remain intact.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:37 -03:00
Raphael S. Carvalho
d1e2dbc592 replica: move_sstables_from_staging: Use tracker from group owning the SSTable
When moving SSTables from staging directory, we'll conditionally add
them to backlog tracker. As each group has its own tracker, a given
sstable will be added to the tracker of the group that owns it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:37 -03:00
Raphael S. Carvalho
9031dc3199 replica: Move table::backlog_tracker_adjust_charges() to compaction_group
Procedures that call this function happen to be in compaction_group,
so let's move it to group. Simplifies the change where the procedure
retrieves tracker from the group itself.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Raphael S. Carvalho
116459b69e replica: table::discard_sstables: Use compaction_group's backlog tracker
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Raphael S. Carvalho
b2d8545b15 replica: Disable backlog tracker in compaction_group::stop()
As we're moving backlog tracker to compaction group, we need to
stop the tracker there too. We're moving it a step earlier in
table::stop(), before sstables are cleared, but that's okay
because it's still done after the group was deregistered
from compaction manager, meaning no compactions are running.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Raphael S. Carvalho
91b0d772e2 replica: database_sstable_write_monitor: use compaction_group's backlog tracker
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Raphael S. Carvalho
f37a05b559 replica: Move table::do_add_sstable() to compaction_group
All callers of do_add_sstable() live in compaction_group, so it
should be moved into compaction_group too. It also makes easier
for the function to retrieve the backlog tracker from the group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Raphael S. Carvalho
835927a2ad test/sstable_compaction_test: Switch to table_state::get_backlog_tracker()
Important for decoupling backlog tracker from table's compaction
strategy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Raphael S. Carvalho
1ec0ef18a5 compaction/table_state: Introduce get_backlog_tracker()
This interface will be helpful for allowing replica::table, unit
tests and sstables::compaction to access the compaction group's tracker
which will be managed by the compaction manager, once we complete
the decoupling work.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:36 -03:00
Nadav Har'El
ff87624fb4 test/cql-pytest: add another regression test for reversed-type bug
In commit 544ef2caf3 we fixed a bug where
a reveresed clustering-key order caused problems using a secondary index
because of incorrect type comparison. That commit also included a
regression test for this fix.

However, that fix was incomplete, and improved later in commit
c8653d1321. That later fix was labeled
"better safe than sorry", and did not include a test demonstrating
any actual bug, so unsurprisingly we never backported that second
fix to any older branches.

Recently we discovered that missing the second patch does cause real
problems, and this patch includes a test which fails when the first
patch is in, but the second patch isn't (and passes when both patches
are in, and also passes on Cassandra).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11943
2022-11-11 11:01:22 +02:00
Botond Dénes
302917f63d mutation_compactor: add validator
The mutation compactor is used on most read-paths we have, so adding a
validator to it gives us a good coverage, in particular it gives us full
coverage of queries and compaction.
The validator validates mutation token (and mutation fragment kind)
monotonicity as that is quite cheap, while it is enough to catch the
most common problems. As we already have a validator on the compaction
path (in the sstable writer), the validator is disabled when the
mutation compactor is instantiated for compaction.
We should probably make this configurable at some point. The addition
of this validator should prevent the worst of the fragment reordering
bugs to affect reads.
2022-11-11 10:26:05 +02:00
Botond Dénes
5c245b4a5e mutation_fragment_stream_validator: add a 'none' validation level
Which, as its name suggests, makes the validating filter not validate
anything at all. This validation level can be used effectively to make
it so as if the validator was not there at all.
2022-11-11 09:58:44 +02:00
Botond Dénes
a4b58f5261 test/boost/mutation_query_test: test_partition_limit: sort input data
The test's input data is currently out-of-order, violating a fundamental
invariant of data always being sorted. This doesn't cause any problems
right now, but soon it will. Sort it to avoid it.
2022-11-11 09:58:44 +02:00
Botond Dénes
2c551bb7ce querier: consume_page(): use partition_start as the sentinel value
Said method calls `compact_mutation_state::start_new_page()` which
requires the kind of the next fragment in the reader. When there is no
fragment (reader is at EOS), we use partition-end. This was a poor
choice: if the reader is at EOS, partition-kind was the last fragment
kind, if the stream were to continue the next fragment would be a
partition-start.
2022-11-11 09:58:18 +02:00
Botond Dénes
0bcfc9d522 treewide: use ::for_partition_end() instead of ::end_of_partition_tag_t{}
We just added a convenience static factory method for partition end,
change the present users of the clunky constructor+tag to use it
instead.
2022-11-11 09:58:18 +02:00
Botond Dénes
f1a039fc2b treewide: use ::for_partition_start() instead of ::partition_start_tag_t{}
We just added a convenience static factory method for partition start,
change the present users of the clunky constructor+tag to use it
instead.
2022-11-11 09:58:18 +02:00
Botond Dénes
6a002953e9 position_in_partition: add for_partition_{start,end}() 2022-11-11 09:58:18 +02:00
Kamil Braun
4a2ec888d5 Merge 'test.py: use internal id to manage servers' from Alecco
Instead of using assigned IP addresses, use a local integer ID for
managing servers. IP address can be reused by a different server.

While there, get host ID (UUID). This can also be reused with `node
replace` so it's not good enough for tracking.

Closes #11747

* github.com:scylladb/scylladb:
  test.py: use internal id to manage servers
  test.py: rename hostname to ip_addr
  test.py: get host id
  test.py: use REST api client in ScyllaCluster
  test.py: remove unnecessary reference to web app
  test.py: requests without aiohttp ClientSession
2022-11-10 17:12:16 +01:00
Kamil Braun
1cc68b262e docs: describe the Raft upgrade and recovery procedures
In the 5.1 -> 5.2 upgrade doc, include additional steps for enabling
Raft using the `consistent_cluster_management` flag. Note that we don't
have this flag yet but it's planned to replace the experimental flag in
5.2.

In the "Raft in ScyllaDB" document, add sections about:
- enabling Raft in existing clusters in Scylla 5.2,
- verifying that the internal Raft upgrade procedure finishes
  successfully,
- recovering from a stuck Raft upgrade procedure or from a majority loss
  situation.

Fix some problems in the documentation, e.g. it is not possible to
enable Raft in an existing cluster in 5.0, but the documentation claimed
that it is.

Follow-up items:
- if we decide for a different name for `consistent_cluster_management`,
  use that name in the docs instead
- update the warnings in Scylla to link to the Raft doc
- mention Enterprise versions once we know the numbers
- update the appropriate upgrade docs for Enterprise versions
  once they exist
2022-11-10 17:08:57 +01:00
Kamil Braun
3dab07ec11 docs: add upgrade guide 5.1 -> 5.2
It's a copy-paste from the 5.0 -> 5.1 guide with substitutions:
s/5.1/5.2,
s/5.0/5.1

The metric update guide is not written, I left a TODO.

Also I didn't include the guide in
docs/upgrade/upgrade-opensource/index.rst, since 5.2 is not released
yet.

The guide can be accessed by manually following the link:
/upgrade/upgrade-opensource/upgrade-guide-from-5.1-to-5.2/
2022-11-10 16:49:14 +01:00
Alejo Sanchez
700054abee test.py: use internal id to manage servers
Instead of using assigned IP addresses, use an internal server id.

Define types to distinguish local server id, host ID (UUID), and IP
address.

This is needed to test servers changing IP address and for node replace
(host UUID).

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-11-10 09:14:37 +01:00
Alejo Sanchez
1e38f5478c test.py: rename hostname to ip_addr
The code explicitly manages an IP as string, make it explicit in the
variable name.

Define its type and test for set in the instance instead of using an
empty string as placeholder.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-11-10 09:14:37 +01:00
Alejo Sanchez
f478eb52a3 test.py: get host id
When initializing a ScyllaServer, try to get the host id instead of only
checking the REST API is up.

Use the existing aiohttp session from ScyllaCluster.

In case of HTTP error check the status was not an internal error (500+).

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-11-10 09:14:37 +01:00
Alejo Sanchez
78663dda72 test.py: use REST api client in ScyllaCluster
Move the REST api client to ScyllaCluster. This will allow the cluster
to query its own servers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-11-10 09:14:37 +01:00
Alejo Sanchez
75ea345611 test.py: remove unnecessary reference to web app
The aiohttp.web.Application only needs to be passed, so don't store a
reference in ScyllaCluster object.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-11-10 09:14:37 +01:00
Alejo Sanchez
a5316b0c6b test.py: requests without aiohttp ClientSession
Simplify REST helper by doing requests without a session.

Reusing an aiohttp.ClientSession causes knock-on effects on
`rest_api/test_task_manager` due to handling exceptions outside of an
async with block.

Requests for cluster management and Scylla REST API don't need session,
anyway.

Raise HTTPError with status code, text reason, params, and json.

In ScyllaCluster.install_and_start() instead of adding one more custom
exception, just catch all exceptions as they will be re-raised later.

While there avoid code duplication and improve sanity, type checking,
and lint score.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-11-10 09:14:37 +01:00
Botond Dénes
21bc37603a Merge 'utils: config_src: add set_value_on_all_shards functions' from Benny Halevy
Currently when we set a single value we need
to call broadcast_to_all_shards to let observers on all
shards get notified of the new value.

However, the latter broadcasts all value to all shards
so it's terribly inefficient.

Instead, add async set_value_on_all_shards functions
to broadcast a value to all shards.

Use those in system_keyspace for db_config_table virtual table
and in task_manager_test to update the task_manager ttl.

Refs #7316

Closes #11893

* github.com:scylladb/scylladb:
  tests: check ttl on different shards
  utils: config_src: add set_value_on_all_shards functions
  utils: config_file: add config_source::API
2022-11-10 07:16:39 +02:00
Botond Dénes
3aff59f189 Merge 'staging sstables: filter tokens for view update generation' from Benny Halevy
This mini-series introduces dht::tokens_filter and uses it for consuming staging sstable in the view_update_generator.

The tokens_filter uses the token ranges owned by the current node, as retrieved by get_keyspace_local_ranges.

Refs #9559

Closes #11932

* github.com:scylladb/scylladb:
  db: view_update_generator: always clean up staging sstables
  compaction: extract incremental_owned_ranges_checker out to dht
2022-11-10 07:00:51 +02:00
Avi Kivity
9b6ab5db4a Update seastar submodule
* seastar e0dabb361f...153223a188 (8):
  > build: compile dpdk with -fpie (position independent executable)
  > Merge 'io_request: remove ctor overloads of io_request and s/io_request/const io_request/' from Kefu Chai
  > iostream: remove unused function
  > smp: destroy_smp_service_group: verify smp_service_group id
  > core/circular_buffer: refactor loop in circular_buffer::erase()
  > Merge 'Outline reactor::add_task() and sanitize reactor::shuffle() methods' from Pavel Emelyanov
  > Add NOLINT for cert-err58-cpp
  > tests: Fix false-positive use-after-free detection

Closes #11940
2022-11-09 23:36:50 +02:00
Aleksandra Martyniuk
b0ed4d1f0f tests: check ttl on different shards
Test checking if ttl is properly set is extended to check
whether the ttl value is changed on non-zero shard.
2022-11-09 16:58:46 +02:00
Botond Dénes
725e5b119d Revert "replica: Pick new generation for SSTables being moved from staging dir"
This reverts commit ba6186a47f.

Said commit violates the widely held assumption that sstables
generations can be used as sstable identity. One known problem caused
this is potential OOO partition emitted when reading from sstables
(#11843). We now also have a better fix for #11789 (the bug this commit
was meant to fix): 4aa0b16852. So we can
revert without regressions.

Fixes: #11843

Closes #11886
2022-11-09 16:35:31 +02:00
Eliran Sinvani
ab7429b77d cql: Fix crash upon use of the word empty for service level name
Wrong access to an uninitialized token instead of the actual
generated string caused the parser to crash, this wasn't
detected by the ANTLR3 compiler because all the temporary
variables defined in the ANTLR3 statements are global in the
generated code. This essentialy caused a null dereference.

Tests: 1. The fixed issue scenario from github.
       2. Unit tests in release mode.

Fixes #11774

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190612133151.20609-1-eliransin@scylladb.com>

Closes #11777
2022-11-09 15:58:57 +02:00
Anna Stuchlik
d2e54f7097 Merge branch 'master' into anna-requirements-arm-aws 2022-11-09 14:39:00 +01:00
Anna Stuchlik
8375304d9b Update docs/getting-started/system-requirements.rst
Co-authored-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2022-11-09 14:37:34 +01:00
Benny Halevy
38d8777d42 storage_service, system_keyspace: add debugging around system.peers update
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-09 14:45:47 +02:00
Benny Halevy
5401b6055c storage_service: handle_state_normal: update topology and notify_joined endpoint only if not removed
Currently, when replacing a node ip, keeping the old host,
we might end up with the the old endpoint in system.peers
if it is inserted back into the topology by `handle_state_normal`
when on_join is called with the old endpoint.

Then, later on, on_change sees that:
```
        if (get_token_metadata().is_member(endpoint)) {
            co_await do_update_system_peers_table(endpoint, state, value);
```

As described in #11925.

Fixes #11925

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-09 14:45:22 +02:00
Benny Halevy
1a183047c0 utils: config_src: add set_value_on_all_shards functions
Currently when we set a single value we need
to call broadcast_to_all_shards to let observers on all
shards get notified of the new value.

However, the latter broadcasts all value to all shards
so it's terribly inefficient.

Instead, add async set_value_on_all_shards functions
to broadcast a value to all shards.

Use those in system_keyspace for db_config_table virtual table
and in task_manager_test to update the task_manager ttl.

Refs #7316

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-09 11:55:14 +02:00
Benny Halevy
e83f42ec70 utils: config_file: add config_source::API
For task_manager test api.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-09 11:53:20 +02:00
Botond Dénes
94db2123b9 Update tools/java submodule
* tools/java 583261fc0e...caf754f243 (1):
  > build: remove JavaScript snippets in ant build file
2022-11-09 07:59:04 +02:00
Benny Halevy
10f8f13b90 db: view_update_generator: always clean up staging sstables
Since they are currently not cleaned up by cleanup compaction
filter their tokens, processing only tokens owned by the
current node (based on the keyspace replication strategy).

Refs #9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-09 07:38:22 +02:00
Benny Halevy
fd3e66b0cc compaction: extract incremental_owned_ranges_checker out to dht
It is currently used by cleanup_compaction partition filter.
Factor it out so it can be used to filter staging sstables in
the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-09 07:32:56 +02:00
Gleb Natapov' via ScyllaDB development
2100a8f4ca service: raft: demote configuration change error to warning since it is retried anyway
Message-Id: <Y2ohbFtljmd5MNw0@scylladb.com>
2022-11-09 00:09:39 +01:00
Avi Kivity
04ecf4ee18 Update tools/java submodule (cassandra-stress fails with node down)
* tools/java 87672be28e...583261fc0e (1):
  > cassandra-stress: pass all hosts stright to the driver
2022-11-08 14:58:14 +02:00
Botond Dénes
7f69cccbdf scylla-gdb.py: $downcast_vptr(): add multiple inheritance support
When a class inherits from multiple virtual base classes, pointers to
instances of this class via one of its base classes, might point to
somewhere into the object, not at its beginning. Therefore, the simple
method employed currently by $downcast_vptr() of casting the provided
pointer to the type extracted from the vtable name fails. Instead when
this situation is detected (detectable by observing that the symbol name
of the partial vtable is not to an offset of +16, but larger),
$downcast_vptr() will iterate over the base classes, adjusting the
pointer with their offsets, hoping to find the true start of the object.
In the one instance I tested this with, this method worked well.
At the very least, the method will now yield a null pointer when it
fails, instead of a badly casted object with corrupt content (which the
developer might or might not attribute to the bad cast).

Closes #11892
2022-11-08 14:51:26 +02:00
Michał Chojnowski
3e0c7a6e9f test: sstable_datafile_test: eliminate a use of std::regex to prevent stack overflow
This usage of std::regex overflows the seastar::thread stack size (128 KiB),
causing memory corruption. Fix that.

Closes #11911
2022-11-08 14:41:34 +02:00
Botond Dénes
2037d7f9cd Merge 'doc: add the "ScyllaDB Enterprise" label to highlight the Enterprise-only features' from Anna Stuchlik
This PR adds the "ScyllaDB Enterprise" label to highlight the Enterprise-only features on the following pages:
- Encryption at Rest - the label indicates that the entire page is about an Enterprise-only feature.
- Compaction - the labels indicate the sections that are Enterprise-only.

There are more occurrences across the docs that require a similar update. I'll update them in another PR if this PR is approved.

Closes #11918

* github.com:scylladb/scylladb:
  doc: fix the links to resolve the warnings
  doc: add the Enterprise label on the Compaction page (to a subheading and on a list of strategies) to replace the info box
  doc: add the Enterprise label to the Encryption at Rest page (the entire page) to replace the info box
2022-11-08 09:53:48 +02:00
Raphael S. Carvalho
a57724e711 Make off-strategy compaction wait for view building completion
Prior to off-strategy compaction, streaming / repair would place
staging files into main sstable set, and wait for view building
completion before they could be selected for regular compaction.

The reason for that is that view building relies on table providing
a mutation source without data in staging files. Had regular compaction
mixed staging data with non-staging one, table would have a hard time
providing the required mutation source.

After off-strategy compaction, staging files can be compacted
in parallel to view building. If off-strategy completes first, it
will place the output into the main sstable set. So a parallel view
building (on sstables used for off-strategy) may potentially get a
mutation source containing staging data from the off-strategy output.
That will mislead view builder as it won't be able to detect
changes to data in main directory.

To fix it, we'll do what we did before. Filter out staging files
from compaction, and trigger the operation only after we're done
with view building. We're piggybacking on off-strategy timer for
still allowing the off-strategy to only run at the end of the
node operation, to reduce the amount of compaction rounds on
the data introduced by repair / streaming.

Fixes #11882.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #11919
2022-11-08 08:53:58 +02:00
Botond Dénes
243fcb96f0 Update tools/python3 submodule
* tools/python3 bf6e892...773070e (1):
  > create-relocatable-package: harden against missing files
2022-11-08 08:43:30 +02:00
Avi Kivity
46690bcb32 build: harden create-relocatable-package.py against changes in libthread-db.so name
create-relocatable-package.py collects shared libraries used by
executables for packaging. It also adds libthread-db.so to make
debugging possible. However, the name it uses has changed in glibc,
so packaging fails in Fedora 37.

Switch to the version-agnostic names, libthread-db.so. This happens
to be a symlink, so resolve it.

Closes #11917
2022-11-08 08:41:22 +02:00
Takuya ASADA
acc408c976 scylla_setup: fix incorrect type definition on --online-discard option
--online-discard option defined as string parameter since it doesn't
specify "action=", but has default value in boolean (default=True).
It breaks "provisioning in a similar environment" since the code
supposed boolean value should be "action='store_true'" but it's not.

We should change the type of the option to int, and also specify
"choices=[0, 1]" just like --io-setup does.

Fixes #11700

Closes #11831
2022-11-08 08:40:44 +02:00
Avi Kivity
3d345609d8 config: disable "mc" format sstables for new data
"md" format was introduced in 4.3, in 3530e80ce1, two years ago.
Disable the option to create new sstables with the "mc" format.

Closes #11265
2022-11-08 08:36:27 +02:00
Anna Stuchlik
0eaafced9d doc: fix the links to resolve the warnings 2022-11-07 19:15:21 +01:00
Anna Stuchlik
b57e0cfb7c doc: add the Enterprise label on the Compaction page (to a subheading and on a list of strategies) to replace the info box 2022-11-07 18:54:35 +01:00
Anna Stuchlik
9f3fcb3fa0 doc: add the Enterprise label to the Encryption at Rest page (the entire page) to replace the info box 2022-11-07 18:48:37 +01:00
Tomasz Grabiec
a9063f9582 Merge 'service/raft: failure detector: ping raft::server_ids, not gms::inet_addresses' from Kamil Braun
Whenever a Raft configuration change is performed, `raft::server` calls
`raft_rpc::add_server`/`raft_rpc::remove_server`. Our `raft_rpc`
implementation has a function, `_on_server_update`, passed in the
constructor, which it called in `add_server`/`remove_server`;
that function would update the set of endpoints detected by the
direct failure detector. `_on_server_update` was passed an IP address
and that address was added to / removed from the failure detector set
(there's another translation layer between the IP addresses and internal
failure detector 'endpoint ID's; but we can ignore it for the purposes
of this commit).

Therefore: the failure detector was pinging a certain set of IP
addresses. These IP addresses were updated during Raft configuration
changes.

To implement the `is_alive(raft::server_id)` function (required by
`raft::failure_detector` interface), we would translate the ID using
the Raft address map, which is currently also updated during
configuration changes, to an IP address, and check if that IP address is
alive according to the direct failure detector (which maintained an
`_alive_set` of type `unordered_set<gms::inet_address>`).

This all works well but it assumes that servers can be identified using
IP addresses - it doesn't play well with the fact that servers may
change their IP addresses. The only immutable identifier we have for a
server is `raft::server_id`. In the future, Raft configurations will not
associate IP addresses with Raft servers; instead we will assume that IP
addresses can change at any time, and there will be a different
mechanism that eventually updates the Raft address map with the latest
IP address for each `raft::server_id`.

To prepare us for that future, in this commit we no longer operate in
terms of IP addresses in the failure detector, but in terms of
`raft::server_id`s. Most of the commit is boilerplate, changing
`gms::inet_address` to `raft::server_id` and function/variable names.
The interesting changes are:
- in `is_alive`, we no longer need to translate the `raft::server_id` to
  an IP address, because now the stored `_alive_set` already contains
  `raft::server_id`s instead of `gms::inet_address`es.
- the `ping` function now takes a `raft::server_id` instead of
  `gms::inet_address`. To send the ping message, we need to translate
  this to IP address; we do it by the `raft_address_map` pointer
  introduced in an earlier commit.

Thus, there is still a point where we have to translate between
`raft::server_id` and `gms::inet_address`; but observe we now do it at
the last possible moment - just before sending the message. If we
have no translation, we consider the `ping` to have failed - it's
equivalent to a network failure where no route to a given address was
found.

Closes #11759

* github.com:scylladb/scylladb:
  direct_failure_detector: get rid of complex `endpoint_id` translations
  service/raft: ping `raft::server_id`s, not `gms::inet_address`es
  service/raft: store `raft_address_map` reference in `direct_fd_pinger`
  gms: gossiper: move `direct_fd_pinger` out to a separate service
  gms: gossiper: direct_fd_pinger: extract generation number caching to a separate class
2022-11-07 16:42:35 +01:00
Botond Dénes
2b572d94f5 Merge 'doc: improve the documentation landing page ' from Anna Stuchlik
This PR introduces the following changes to the documentation landing page:

- The " New to ScyllaDB? Start here!" box is added.
- The "Connect your application to Scylla" box is removed.
- Some wording has been improved.
- "Scylla" has been replaced with "ScyllaDB".

Closes #11896

* github.com:scylladb/scylladb:
  Update docs/index.rst
  doc: replace Scylla with ScyllaDB on the landing page
  doc: improve the wording on the landing page
  doc: add the link to the ScyllaDB Basics page to the documentation landing page
2022-11-07 16:18:59 +02:00
Avi Kivity
91f2cd5ac4 test: lib: exception_predicate: use boost::regex instead of std::regex
std::regex was observed to overflow stack on aarch64 in debug mode. Use
boost::regex until the libstdc++ bug[1] is fixed.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

Closes #11888
2022-11-07 14:03:25 +02:00
Kamil Braun
0c7ff0d2cb docs: a single 5.0 -> 5.1 upgrade guide
There were 4 different pages for upgrading Scylla 5.0 to 5.1 (and the
same is true for other version pairs, but I digress) for different
environments:
- "ScyllaDB Image for EC2, GCP, and Azure"
- Ubuntu
- Debian
- RHEL/CentOS

THe Ubuntu and Debian pages used a common template:
```
.. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p1.rst
.. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p2.rst
```
with different variable substitutions.

The "Image" page used a similar template, with some extra content in the
middle:
```
.. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p1.rst
.. include:: /upgrade/_common/upgrade-image-opensource.rst
.. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p2.rst
```

The RHEL/CentOS page used a different template:
```
.. include:: /upgrade/_common/upgrade-guide-v4-rpm.rst
```

This was an unmaintainable mess. Most of the content was "the same" for
each of these options. The only content that must actually be different
is the part with package installation instructions (e.g. calls to `yum`
vs `apt-get`). The rest of the content was logically the same - the
differences were mistakes, typos, and updates/fixes to the text that
were made in some of these docs but not others.

In this commit I prepare a single page that covers the upgrade and
rollback procedures for each of these options. The section dependent on
the system was implemented using Sphinx Tabs.

I also fixed and changed some parts:

- In the "Gracefully stop the node" section:
Ubuntu/Debian/Images pages had:

```rst
.. code:: sh

   sudo service scylla-server stop
```

RHEL/CentOS pages had:
```rst
.. code:: sh

.. include:: /rst_include/scylla-commands-stop-index.rst
```

the stop-index file contained this:
```rst
.. tabs::

   .. group-tab:: Supported OS

      .. code-block:: shell

         sudo systemctl stop scylla-server

   .. group-tab:: Docker

      .. code-block:: shell

         docker exec -it some-scylla supervisorctl stop scylla

      (without stopping *some-scylla* container)
```

So the RHEL/CentOS version had two tabs: one for Scylla installed
directly on the system, one for Scylla running in Docker - which is
interesting, because nothing anywhere else in the upgrade documents
mentions Docker.  Furthermore, the RHEL/CentOS version used `systemctl`
while the ubuntu/debian/images version used `service` to stop/start
scylla-server.  Both work on modern systems.

The Docker option is completely out of place - the rest of the upgrade
procedure does not mention Docker. So I decided it doesn't make sense to
include it. Docker documentation could be added later if we actually
decide to write upgrade documentation when using Docker...  Between
`systemctl` and `service` I went with `service` as it's a bit
higher-level.

- Similar change for "Start the node" section, and corresponding
  stop/start sections in the Rollback procedure.

- To reuse text for Ubuntu and Debian, when referencing "ScyllaDB deb
  repo" in the Debian/Ubuntu tabs, I provide two separate links: to
  Debian and Ubuntu repos.

- the link to rollback procedure in the RPM guide (in 'Download and
  install the new release' section) pointed to rollback procedure from
  3.0 to 3.1 guide... Fixed to point to the current page's rollback
  procedure.

- in the rollback procedure steps summary, the RPM version missed the
  "Restore system tables" step.

- in the rollback procedure, the repository links were pointing to the
  new versions, while they should point to the old versions.

There are some other pre-existing problems I noticed that need fixing:

- EC2/GCP/Azure option has no corresponding coverage in the rollback
  section (Download and install the old release) as it has in the
  upgrade section. There is no guide for rolling back 3rd party and OS
  packages, only Scylla. I left a TODO in a comment.
- the repository links assume certain Debian and Ubuntu versions (Debian
  10 and Ubuntu 20), but there are more available options (e.g. Ubuntu
  22). Not sure how to deal with this problem. Maybe a separate section
  with links? Or just a generic link without choice of platform/version?

Closes #11891
2022-11-07 14:02:08 +02:00
Avi Kivity
9fa1783892 Merge 'cleanup compaction: flush memtable' from Benny Halevy
Flush the memtable before cleaning up the table so not to leave any disowned tokens in the memtable
as they might be resurrected if left in the memtable.

Fixes #1239

Closes #11902

* github.com:scylladb/scylladb:
  table: perform_cleanup_compaction: flush memtable
  table: add perform_cleanup_compaction
  api: storage_service: add logging for compaction operations et al
2022-11-07 13:18:12 +02:00
Anna Stuchlik
c8455abb71 Update docs/index.rst
Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>
2022-11-07 10:25:24 +01:00
AdamStawarz
6bc455ebea Update tombstones-flush.rst
change syntax:

nodetool compact <keyspace>.<mytable>;
to
nodetool compact <keyspace> <mytable>;

Closes #11904
2022-11-07 11:19:26 +02:00
Avi Kivity
224a2877b9 build: disable -Og in debug mode to avoid coroutine asan breakage
Coroutines and asan don't mix well on aarch64. This was seen in
22f13e7ca3 (" Revert "Merge 'cql3: select_statement: coroutinize
indexed_table_select_statement::do_execute_base_query()' from Avi
Kivity"") where a routine coroutinization was reverted due to failures
on aarch64 debug mode.

In clang 15 this is even worse, the existing code starts failing.
However, if we disable optimization (-O0 rather than -Og), things
begin to work again. In fact we can reinstate the patch reverted
above even with clang 12.

Fix (or rather workaround) the problem by avoiding -Og on aarch64
debug mode. There's the lingering fear that release mode is
miscompiled too, but all the tests pass on clang 15 in release mode
so it appears related to asan.

Closes #11894
2022-11-07 10:55:13 +02:00
Benny Halevy
eb3a94e2bc table: perform_cleanup_compaction: flush memtable
We don't explicitly cleanup the memtable, while
it might hold tokens disowned by the current node.

Flush the memtable before performing cleanup compaction
to make sure all tokens in the memtable are cleaned up.

Note that non-owned ranges are invalidate in the cache
in compaction_group::update_main_sstable_list_on_compaction_completion
using desc.ranges_for_cache_invalidation.

Fixes #1239

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-06 19:41:40 +02:00
Benny Halevy
fc278be6c4 table: add perform_cleanup_compaction
Move the integration with compaction_manager
from the api layer to the tabel class so
it can also make sure the memtable is cleaned up in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-06 19:41:33 +02:00
Benny Halevy
85523c45c0 api: storage_service: add logging for compaction operations et al
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-06 19:41:31 +02:00
Petr Gusev
44f48bea0f raft: test_remove_node_with_concurrent_ddl
The test runs remove_node command with background ddl workload.
It was written in an attempt to reproduce scylladb#11228 but seems to have
value on its own.

The if_exists parameter has been added to the add_table
and drop_table functions, since the driver could retry
the request sent to a removed node, but that request
might have already been completed.

Function wait_for_host_known waits until the information
about the node reaches the destination node. Since we add
new nodes at each iteration in main, this can take some time.

A number of abort-related options was added
SCYLLA_CMDLINE_OPTIONS as it simplifies
nailing down problems.

Closes #11734
2022-11-04 17:16:35 +01:00
David Garcia
26bc53771c docs: automatic previews configuration
Closes #11591
2022-11-04 15:44:22 +02:00
Kamil Braun
e086521c1a direct_failure_detector: get rid of complex endpoint_id translations
The direct failure detector operates on abstract `endpoint_id`s for
pinging. The `pigner` interface is responsible for translating these IDs
to 'real' addresses.

Earlier we used two types of addresses: IP addresses in 'production'
code (`gms::gossiper::direct_fd_pinger`) and `raft::server_id`s in test
code (in `randomized_nemesis_test`). For each of these use cases we
would maintain mappings between `endpoint_id`s and the address type.

In recent commits we switched the 'production' code to also operate on
Raft server IDs, which are UUIDs underneath.

In this commit we switch `endpoint_id`s from `unsigned` type to
`utils::UUID`. Because each use case operates in Raft server IDs, we can
perform a simple translation: `raft_id.uuid()` to get an `endpoint_id`
from a Raft ID, `raft::server_id{ep_id}` to obtain a Raft ID from
an `endpoint_id`. We no longer have to maintain complex sharded data
structures to store the mappings.
2022-11-04 09:38:08 +01:00
Kamil Braun
bdeef77f20 service/raft: ping raft::server_ids, not gms::inet_addresses
Whenever a Raft configuration change is performed, `raft::server` calls
`raft_rpc::add_server`/`raft_rpc::remove_server`. Our `raft_rpc`
implementation has a function, `_on_server_update`, passed in the
constructor, which it called in `add_server`/`remove_server`;
that function would update the set of endpoints detected by the
direct failure detector. `_on_server_update` was passed an IP address
and that address was added to / removed from the failure detector set
(there's another translation layer between the IP addresses and internal
failure detector 'endpoint ID's; but we can ignore it for the purposes
of this commit).

Therefore: the failure detector was pinging a certain set of IP
addresses. These IP addresses were updated during Raft configuration
changes.

To implement the `is_alive(raft::server_id)` function (required by
`raft::failure_detector` interface), we would translate the ID using
the Raft address map, which is currently also updated during
configuration changes, to an IP address, and check if that IP address is
alive according to the direct failure detector (which maintained an
`_alive_set` of type `unordered_set<gms::inet_address>`).

This all works well but it assumes that servers can be identified using
IP addresses - it doesn't play well with the fact that servers may
change their IP addresses. The only immutable identifier we have for a
server is `raft::server_id`. In the future, Raft configurations will not
associate IP addresses with Raft servers; instead we will assume that IP
addresses can change at any time, and there will be a different
mechanism that eventually updates the Raft address map with the latest
IP address for each `raft::server_id`.

To prepare us for that future, in this commit we no longer operate in
terms of IP addresses in the failure detector, but in terms of
`raft::server_id`s. Most of the commit is boilerplate, changing
`gms::inet_address` to `raft::server_id` and function/variable names.
The interesting changes are:
- in `is_alive`, we no longer need to translate the `raft::server_id` to
  an IP address, because now the stored `_alive_set` already contains
  `raft::server_id`s instead of `gms::inet_address`es.
- the `ping` function now takes a `raft::server_id` instead of
  `gms::inet_address`. To send the ping message, we need to translate
  this to IP address; we do it by the `raft_address_map` pointer
  introduced in an earlier commit.

Thus, there is still a point where we have to translate between
`raft::server_id` and `gms::inet_address`; but observe we now do it at
the last possible moment - just before sending the message. If we
have no translation, we consider the `ping` to have failed - it's
equivalent to a network failure where no route to a given address was
found.
2022-11-04 09:38:08 +01:00
Kamil Braun
ac70a05c7e service/raft: store raft_address_map reference in direct_fd_pinger
The pinger will use the map to translate `raft::server_id`s to
`gms::inet_address`es when pinging.
2022-11-04 09:38:08 +01:00
Kamil Braun
2c20f2ab9d gms: gossiper: move direct_fd_pinger out to a separate service
In later commit `direct_fd_pinger` will operate in terms of
`raft::server_id`s. Decouple it from `gossiper` since we don't want to
entangle `gossiper` with Raft-specific stuff.
2022-11-04 09:38:08 +01:00
Kamil Braun
e9a4263e14 gms: gossiper: direct_fd_pinger: extract generation number caching to a separate class
`gms::gossiper::direct_fd_pinger` serves multiple purposes: one of them
is to maintain a mapping between `gms::inet_address`es and
`direct_failure_detector::pinger::endpoint_id`s, another is to cache the
last known gossiper's generation number to use it for sending gossip
echo messages. The latter is the only gossiper-specific thing in this
class.

We want to move `direct_fd_pinger` utside `gossiper`. To do that, split the
gossiper-specific thing -- the generation number management -- to a
smaller class, `echo_pinger`.

`echo_pinger` is a top-level class (not a nested one like
`direct_fd_pinger` was) so we can forward-declare it and pass references
to it without including gms/gossiper.hh header.
2022-11-04 09:38:08 +01:00
Avi Kivity
768d77d31b Update seastar submodule
* seastar f32ed00954...e0dabb361f (12):
  > sstring: define formatter
  > file: Dont violate API layering
  > Add compile_commands.json to gitignore
  > Merge 'Add an allocation failure metric' from Travis Downs
  > Use const test objects
  > Ragel chunk parser: compilation err, unused var
  > build: do not expose Valgrind in SeastarTargets.cmake
  > defer: mark deferred_* with [[nodiscard]]
  > Log selected reactor backend during startup
  > http: mark str with [[maybe_unused]]
  > Merge 'reactor: open fd without O_NONBLOCK when using io_uring backend' from Kefu Chai
  > reactor: add accept and connect to io_uring backend

Closes #11895
2022-11-04 09:27:56 +04:00
Anna Stuchlik
fb01565a15 doc: replace Scylla with ScyllaDB on the landing page 2022-11-03 17:42:49 +01:00
Anna Stuchlik
7410ab0132 doc: improve the wording on the landing page 2022-11-03 17:38:14 +01:00
Anna Stuchlik
ab5e48261b doc: add the link to the ScyllaDB Basics page to the documentation landing page 2022-11-03 17:31:03 +01:00
Pavel Emelyanov
efbfcdb97e Merge 'Replicate raft_address_map non-expiring entries to other shards' from Kamil Braun
Replicating `raft_address_map` entries is needed for the following use
cases:
- the direct failure detector - currently it assumes a static mapping of
  `raft::server_id`s to `gms::inet_address`es, which is obtained on Raft
  group 0 configuration changes. To handle dynamic mappings we need to
  modify the failure detector so it pings `raft::server_id`s and obtains
  the `gms::inet_address` before sending the message from
  `raft_address_map`. The failure detector is sharded, so we need the
  mappings to be available on all shards.
- in the future we'll have multiple Raft groups running on different
  shards. To send messages they'll need `raft_address_map`.

Initially I tried to replicate all entries - expiring and non-expiring.
The implementation turned out to be very complex - we need to handle
dropping expired entries and refreshing expiring entries' timestamps
across shards, and doing this correctly while accounting for possible
races is quite problematic.

Eventually I arrived at the conclusion that replicating only
non-expiring entries, and furthermore allowing non-expiring entries to
be added only on shard 0, is good enough for our use cases:
- The direct failure detector is pinging group 0 members only; group
  0 members correspond exactly to the non-expiring entries.
- Group 0 configuration changes are handled on shard 0, so non-expiring
  entries are added/removed on shard 0.
- When we have multiple Raft groups, we can reuse a single Raft server
  ID for all Raft servers running on a single node belonging to
  different groups; they are 'namespaced' by the group IDs. Furthermore,
  every node has a server that belongs to group 0. Thus for every Raft
  server in every group, it has a corresponding server in group 0 with
  the same ID, which has a non-expiring entry in `raft_address_map`,
  which is replicated to all shards; so every group will be able to
  deliver its messages.

With these assumptions the implementation is short and simple.
We can always complicate it in the future if we find that the
assumptions are too strong.

Closes #11791

* github.com:scylladb/scylladb:
  test/raft: raft_address_map_test: add replication test
  service/raft: raft_address_map: replicate non-expiring entries to other shards
  service/raft: raft_address_map: assert when entry is missing in drop_expired_entries
  service/raft: turn raft_address_map into a service
2022-11-03 18:34:42 +03:00
Avi Kivity
ca2010144e test: loading_cache_test: fix use-after-free in test_loading_cache_remove_leaves_no_old_entries_behind
We capture `key` by reference, but it is in a another continuation.

Capture it by value, and avoid the default capture specification.

Found by clang 15 + asan + aarch64.

Closes #11884
2022-11-03 17:23:40 +02:00
Avi Kivity
0c3967cf5e Merge 'scylla-gdb.py: improve scylla-fiber' from Botond Dénes
The main theme of this patchset is improving `scylla-fiber`, with some assorted unrelated improvement tagging along.
In lieu of explicit support for mapping up continuation chains in memory from seastar (there is one but it uses function calls), scylla fiber uses a quite crude method to do this: it scans task objects for outbound references to other task objects to find waiters tasks and scans inbound references from other tasks to find waited-on tasks. This works well for most objects, but there are some problematic ones:
* `seastar::thread_context`: the waited-on task (`seastar::(anonymous namespace)::thread_wake_task`) is allocated on the thread's stack which is not in the object itself. Scylla fiber now scans the stack bottom-up to find this task.
* `seastar::smp_message_queue::async_work_item`: the waited on task lives on another shard. Scylla fiber now digs out the remote shard from the work item and continues the search on the remote shard.
* `seastar::when_all_state`: the waited on task is a member in the same object tripping loop detection and terminating the search. Seastar fiber now uses the `_continuation` member explicitely to look for the next links.

Other minor improvements were also done, like including the shard of the task in the printout.
Example demonstrating all the new additions:
```
(gdb) scylla fiber 0x000060002d650200
Stopping because loop is detected: task 0x000061c00385fb60 was seen before.
[shard 28] #-13 (task*) 0x000061c00385fba0 0x00000000003b5b00 vtable for seastar::internal::when_all_state_component<seastar::future<void> > + 16
[shard 28] #-12 (task*) 0x000061c00385fb60 0x0000000000417010 vtable for seastar::internal::when_all_state<seastar::internal::identity_futures_tuple<seastar::future<void>, seastar::future<void> >, seastar::future<void>, seastar::future<void> > + 16
[shard 28] #-11 (task*) 0x000061c009f16420 0x0000000000419830 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEEZNS_6futureISt5tupleIJNS4_IvEES6_EEE14discard_resultEvEUlDpOT_E_ZNS8_14then_impl_nrvoISC_S6_EET0_OT_EUlOS3_RSC_ONS_12future_stateIS7_EEE_S7_EE + 16
[shard 28] #-10 (task*) 0x000061c0098e9e00 0x0000000000447440 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>::run_and_dispose()::{lambda(auto:1)#1}, seastar::future<void>::then_wrapped_nrvo<void, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}> >(seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16
[shard  0] #-9 (task*) 0x000060000858dcd0 0x0000000000449d68 vtable for seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}> + 16
[shard  0] #-8 (task*) 0x0000600050c39f60 0x00000000007abe98 vtable for seastar::parallel_for_each_state + 16
[shard  0] #-7 (task*) 0x000060000a59c1c0 0x0000000000449f60 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::sharded<cql_transport::cql_server>::stop()::{lambda(seastar::future<void>)#2}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(seastar::future<void>)#2}>({lambda(seastar::future<void>)#2}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(seastar::future<void>)#2}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16
[shard  0] #-6 (task*) 0x000060000a59c400 0x0000000000449ea0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, cql_transport::controller::do_stop_server()::{lambda(std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > >&)#1}::operator()(std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > >&) const::{lambda()#1}::operator()() const::{lambda()#1}, seastar::future<void>::then_impl_nrvo<{lambda()#1}, {lambda()#1}>({lambda()#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda()#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16
[shard  0] #-5 (task*) 0x0000600009d86cc0 0x0000000000449c00 vtable for seastar::internal::do_with_state<std::tuple<std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > > >, seastar::future<void> > + 16
[shard  0] #-4 (task*) 0x00006000019ffe20 0x00000000007ab368 vtable for seastar::(anonymous namespace)::thread_wake_task + 16
[shard  0] #-3 (task*) 0x00006000085ad080 0x0000000000809e18 vtable for seastar::thread_context + 16
[shard  0] #-2 (task*) 0x0000600009c04100 0x00000000006067f8 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEEZNS_5asyncIZZN7service15storage_service5drainEvENKUlRS6_E_clES7_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSC_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSD_DpOSG_EUlvE0_ZNS_6futureIvE14then_impl_nrvoIST_SV_EET0_SQ_EUlOS3_RST_ONS_12future_stateINS1_9monostateEEEE_vEE + 16
[shard  0] #-1 (task*) 0x000060000a59c080 0x0000000000606ae8 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEENS_6futureIvE12finally_bodyIZNS_5asyncIZZN7service15storage_service5drainEvENKUlRS9_E_clESA_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSF_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSG_DpOSJ_EUlvE1_Lb0EEEZNS5_17then_wrapped_nrvoIS5_SX_EENSD_ISG_E4typeEOT0_EUlOS3_RSX_ONS_12future_stateINS1_9monostateEEEE_vEE + 16
[shard  0] #0  (task*) 0x000060002d650200 0x0000000000606378 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<service::storage_service::run_with_api_lock<service::storage_service::drain()::{lambda(service::storage_service&)#1}>(seastar::basic_sstring<char, unsigned int, 15u, true>, service::storage_service::drain()::{lambda(service::storage_service&)#1}&&)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&)::{lambda()#1}, false>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(service::storage_service&)#1}>({lambda(service::storage_service&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(service::storage_service&)#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16
[shard  0] #1  (task*) 0x000060000bc40540 0x0000000000606d48 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEENS_6futureIvE12finally_bodyIZNS_3smp9submit_toIZNS_7shardedIN7service15storage_serviceEE9invoke_onIZNSB_17run_with_api_lockIZNSB_5drainEvEUlRSB_E_EEDaNS_13basic_sstringIcjLj15ELb1EEEOT_EUlSF_E_JES5_EET1_jNS_21smp_submit_to_optionsESK_DpOT0_EUlvE_EENS_8futurizeINSt9result_ofIFSJ_vEE4typeEE4typeEjSN_SK_EUlvE_Lb0EEEZNS5_17then_wrapped_nrvoIS5_S10_EENSS_ISJ_E4typeEOT0_EUlOS3_RS10_ONS_12future_stateINS1_9monostateEEEE_vEE + 16
[shard  0] #2  (task*) 0x000060000332afc0 0x00000000006cb1c8 vtable for seastar::continuation<seastar::internal::promise_base_with_type<seastar::json::json_return_type>, api::set_storage_service(api::http_context&, seastar::httpd::routes&)::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}::operator()(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >) const::{lambda()#1}, seastar::future<void>::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}, {lambda()#1}<seastar::json::json_return_type> >({lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::json::json_return_type>&&, {lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16
[shard  0] #3  (task*) 0x000060000a1af700 0x0000000000812208 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::httpd::function_handler::function_handler(std::function<seastar::future<seastar::json::json_return_type> (std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)> const&)::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}::operator()(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >) const::{lambda(seastar::json::json_return_type&&)#1}, seastar::future<seastar::json::json_return_type>::then_impl_nrvo<seastar::json::json_return_type&&, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > >(seastar::json::json_return_type&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, seastar::json::json_return_type&, seastar::future_state<seastar::json::json_return_type>&&)#1}, seastar::json::json_return_type> + 16
[shard  0] #4  (task*) 0x0000600009d86440 0x0000000000812228 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::httpd::function_handler::handle(seastar::basic_sstring<char, unsigned int, 15u, true> const&, std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future>({lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, {lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16
[shard  0] #5  (task*) 0x0000600009dba0c0 0x0000000000812f48 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::handle_exception<std::function<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > (std::__exception_ptr::exception_ptr)>&>(std::function<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > (std::__exception_ptr::exception_ptr)>&)::{lambda(auto:1&&)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_wrapped_nrvo<seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, {lambda(auto:1&&)#1}>({lambda(auto:1&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, {lambda(auto:1&&)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16
[shard  0] #6  (task*) 0x0000600026783ae0 0x00000000008118b0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<bool>, seastar::httpd::connection::generate_reply(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::httpd::connection::generate_reply(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}<bool> >({lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&&)::{lambda(seastar::internal::promise_base_with_type<bool>&&, {lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16
[shard  0] #7  (task*) 0x000060000a4089c0 0x0000000000811790 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::httpd::connection::read_one()::{lambda()#1}::operator()()::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}::operator()(std::default_delete<std::unique_ptr>) const::{lambda(std::default_delete<std::unique_ptr>)#1}::operator()(std::default_delete<std::unique_ptr>) const::{lambda(bool)#2}, seastar::future<bool>::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}, {lambda(std::default_delete<std::unique_ptr>)#1}<void> >({lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}&, seastar::future_state<bool>&&)#1}, bool> + 16
[shard  0] #8  (task*) 0x000060000a5b16e0 0x0000000000811430 vtable for seastar::internal::do_until_state<seastar::httpd::connection::read()::{lambda()#1}, seastar::httpd::connection::read()::{lambda()#2}> + 16
[shard  0] #9  (task*) 0x000060000aec1080 0x00000000008116d0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::httpd::connection::read()::{lambda(seastar::future<void>)#3}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(seastar::future<void>)#3}>({lambda(seastar::future<void>)#3}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(seastar::future<void>)#3}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16
[shard  0] #10 (task*) 0x000060000b7d2900 0x0000000000811950 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<seastar::httpd::connection::read()::{lambda()#4}, true>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::httpd::connection::read()::{lambda()#4}>(seastar::httpd::connection::read()::{lambda()#4}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::httpd::connection::read()::{lambda()#4}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16

Found no further pointers to task objects.
If you think there should be more, run `scylla fiber 0x000060002d650200 --verbose` to learn more.
Note that continuation across user-created seastar::promise<> objects are not detected by scylla-fiber.
```

Closes #11822

* github.com:scylladb/scylladb:
  scylla-gdb.py: collection_element: add support for boost::intrusive::list
  scylla-gdb.py: optional_printer: eliminate infinite loop
  scylla-gdb.py: scylla-fiber: add note about user-instantiated promise objects
  scylla-gdb.py: scylla-fiber: reject self-references when probing pointers
  scylla-gdb.py: scylla-fiber: add starting task to known tasks
  scylla-gdb.py: scylla-fiber: add support for walking over when_all
  scylla-gdb.py: add when_all_state to task type whitelist
  scylla-gdb.py: scylla-fiber: also print shard of tasks
  scylla-gdb.py: scylla-fiber: unify task printing
  scylla-gdb.py: scylla fiber: add support for walking over shards
  scylla-gdb.py: scylla fiber: add support for walking over seastar threads
  scylla-gdb.py: scylla-ptr: keep current thread context
  scylla-gdb.py: improve scylla column_families
  scylla-gdb.py: scylla_sstables.filename(): fix generation formatting
  scylla-gdb.py: improve schema_ptr
  scylla-gdb.py: scylla memory: restore compatibility with <= 5.1
2022-11-03 13:52:31 +02:00
Kamil Braun
2049962e11 Fix version numbers in upgrade page title
Closes #11878
2022-11-03 10:06:25 +02:00
Takuya ASADA
45789004a3 install-dependencies.sh: update node_exporter to 1.4.0
To fix CVE-2022-24675, we need to a binary compiled in <= golang 1.18.1.
Only released version which compiled <= golang 1.18.1 is node_exporter
1.4.0, so we need to update to it.

See scylladb/scylla-enterprise#2317

Closes #11400

[avi: regenerated frozen toolchain]

Closes #11879
2022-11-03 10:15:22 +04:00
Yaron Kaikov
20110bdab4 configure.py: remove un-used tar files creation
Starting from https://github.com/scylladb/scylla-pkg/pull/3035 we
removed all old tar.gz prefix from uploading to S3 or been used by
downstream jobs.

Hence, there is no point building those tar.gz files anymore

Closes #11865
2022-11-02 17:44:09 +02:00
Anna Stuchlik
d1f7cc99bc doc: fix the external links to the ScyllaDB University lesson about TTL
Closes #11876
2022-11-02 15:05:43 +02:00
Nadav Har'El
59fa8fe903 Merge 'doc: add the information about AArch64 support to Requirements' from Anna Stuchlik
Fix https://github.com/scylladb/scylla-doc-issues/issues/864

This PR:
- updates the introduction to add information about AArch64 and rewrite the content.
- replaces "Scylla" with "ScyllaDB".

Closes #11778

* github.com:scylladb/scylladb:
  Update docs/getting-started/system-requirements.rst
  doc: fix the link to the OS Support page
  doc: replace Scylla with ScyllaDB
  doc: update the info about supported architecture and rewrite the introduction
2022-11-02 11:18:20 +02:00
Anna Stuchlik
ea799ad8fd Update docs/getting-started/system-requirements.rst
Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>
2022-11-02 09:56:56 +01:00
guy9
097a65df9f adding top banner to the Docs website with a link to the ScyllaDB University fall LIVE event
Closes #11873
2022-11-02 10:20:40 +02:00
Nadav Har'El
b9d88a3601 cql/pytest: add reproducer for timestamp column validation issue
This patch adds a reproducing test for issue #11588, which is still open
so the test is expected to fail on Scylla ("xfail), and passes on Cassandra.

The test shows that Scylla allows an out-of-range value to be written to
timestamp column, but then it can't be read back.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11864
2022-11-01 08:11:01 +02:00
Botond Dénes
dc46bfa783 Merge 'Prepare repair for task manager integration' from Aleksandra Martyniuk
The PR prepares repair for task manager integration:
- Creates repair_module
- Keeps repair_module in repair_service
- Moves tracker methods to repair_module
- Changes UUID to task_id in repair module

Closes #11851

* github.com:scylladb/scylladb:
  repair: check shutdown with abort source in repair module
  repair: use generic module gate for repair module operations
  repair: move tracker to repair module
  repair: move next_repair_command to repair_module
  repair: generate repair id in repair module
  repair: keep shard number in repair_uniq_id
  repair: change UUID to task_id
  repair: add task_manager::module to repair_service
  repair: create repair module and task
2022-11-01 08:05:14 +02:00
Aleksandra Martyniuk
f2fe586f03 repair: check shutdown with abort source in repair module
In repair module the shutdown can be checked using abort_source.
Thus, we can get rid of shutdown flag.
2022-10-31 10:57:29 +01:00
Aleksandra Martyniuk
2d878cc9b5 repair: use generic module gate for repair module operations
Repair module uses a gate to prevent starting new tasks on shutdown.
Generic module's gate serves the same purpose, thus we can
use it also in repair specific context.
2022-10-31 10:56:36 +01:00
Aleksandra Martyniuk
4aae7e9026 repair: move tracker to repair module
Since both tracker and repair_module serve similar purpose,
it is confusing where we should seek for methods connected to them.
Thus, to make it more transparent, tracker class is deleted and all
its attributes and methods are moved to repair_module.
2022-10-31 10:55:36 +01:00
Aleksandra Martyniuk
a5c05dcb60 repair: move next_repair_command to repair_module
Number of the repair operation was counted both with
next_repair_command from tracer and sequence number
from task_manager::module.

To get rid of redundancy next_repair_command was deleted and all
methods using its value were moved to repair_module.
2022-10-31 10:54:39 +01:00
Aleksandra Martyniuk
c81260fb8b repair: generate repair id in repair module
repair_uniq_id for repair task can be generated in repair module
and accessed from the task.
2022-10-31 10:54:24 +01:00
Aleksandra Martyniuk
6432a26ccf repair: keep shard number in repair_uniq_id
Execution shard is one of the traits specific to repair tasks.
Child task should freely access shard id of its parent. Thus,
the shard number is kept in a repair_uniq_id struct.
2022-10-31 10:41:17 +01:00
guy9
276ec377c0 removed broken roadmap link
Closes #11854
2022-10-31 11:33:03 +02:00
Aleksandra Martyniuk
e2c7c1495d repair: change UUID to task_id
Change type of repair id from utils::UUID to task_id to distinguish
them from ids of other entities.
2022-10-31 10:07:08 +01:00
Aleksandra Martyniuk
dc80af33bc repair: add task_manager::module to repair_service
repair_service keeps a shared pointer to repair_module.
2022-10-31 10:04:50 +01:00
Aleksandra Martyniuk
576277384a repair: create repair module and task
Create repair_task_impl and repair_module inheriting from respectively
task manager task_impl and module to integrate repair operations with
task manager.
2022-10-31 10:04:48 +01:00
Takuya ASADA
159bc7c7ea install-dependencies.sh: use binary distributions of PIP package
We currently avoid compiling C code in "pip3 install scylla-driver", but
we actually providing portable binary distributions of the package,
so we should use it by "pip3 install --only-binary=:all: scylla-driver".
The binary distribution contains dependency libraries, so we won't have
problem loading it on relocatable python3.

Closes #11852
2022-10-31 10:38:36 +02:00
Kamil Braun
db6cc035ed test/raft: raft_address_map_test: add replication test 2022-10-31 09:17:12 +01:00
Kamil Braun
7d84007fd5 service/raft: raft_address_map: replicate non-expiring entries to other shards
Replicating `raft_address_map` entries is needed for the following use
cases:
- the direct failure detector - currently it assumes a static mapping of
  `raft::server_id`s to `gms::inet_address`es, which is obtained on Raft
  group 0 configuration changes. To handle dynamic mappings we need to
  modify the failure detector so it pings `raft::server_id`s and obtains
  the `gms::inet_address` before sending the message from
  `raft_address_map`. The failure detector is sharded, so we need the
  mappings to be available on all shards.
- in the future we'll have multiple Raft groups running on different
  shards. To send messages they'll need `raft_address_map`.

Initially I tried to replicate all entries - expiring and non-expiring.
The implementation turned out to be very complex - we need to handle
dropping expired entries and refreshing expiring entries' timestamps
across shards, and doing this correctly while accounting for possible
races is quite problematic.

Eventually I arrived at the conclusion that replicating only
non-expiring entries, and furthermore allowing non-expiring entries to
be added only on shard 0, is good enough for our use cases:
- The direct failure detector is pinging group 0 members only; group
  0 members correspond exactly to the non-expiring entries.
- Group 0 configuration changes are handled on shard 0, so non-expiring
  entries are added/removed on shard 0.
- When we have multiple Raft groups, we can reuse a single Raft server
  ID for all Raft servers running on a single node belonging to
  different groups; they are 'namespaced' by the group IDs. Furthermore,
  every node has a server that belongs to group 0. Thus for every Raft
  server in every group, it has a corresponding server in group 0 with
  the same ID, which has a non-expiring entry in `raft_address_map`,
  which is replicated to all shards; so every group will be able to
  deliver its messages.

With these assumptions the implementation is short and simple.
We can always complicate it in the future if we find that the
assumptions are too strong.
2022-10-31 09:17:12 +01:00
Kamil Braun
acacbad465 service/raft: raft_address_map: assert when entry is missing in drop_expired_entries 2022-10-31 09:17:12 +01:00
Kamil Braun
159bb32309 service/raft: turn raft_address_map into a service 2022-10-31 09:17:10 +01:00
Botond Dénes
139fbb466e Merge 'Task manager extension' from Aleksandra Martyniuk
The PR adds changes to task manager that allow more convenient integration with modules.

Introduced changes:
- adds internal flag in task::impl that allows user to filter too specific tasks
- renames `parent_data` to more appropriate name `task_info`
- creates `tasks/types.hh` which allows using some types connected with task manager without the necessity to include whole task manager
- adds more flexible version of `make_task` method

Closes #11821

* github.com:scylladb/scylladb:
  tasks: add alternative make_task method
  tasks: rename parent_data to task_info and move it
  tasks: move task_id to tasks/types.hh
  tasks: add internal flag for task_manager::task::impl
2022-10-31 09:57:10 +02:00
Botond Dénes
2c021affd1 Merge 'storage_service, repair: use per-shard abort_source' from Benny Halevy
Prevent copying shared_ptr across shards
in do_sync_data_using_repair by allocating
a shared_ptr<abort_source> per shard in
node_ops_meta_data and respectively in node_ops_info.

Fixes #11826

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11827

* github.com:scylladb/scylladb:
  repair: use sharded abort_source to abort repair_info
  repair: node_ops_info: add start and stop methods
  storage_service: node_ops_abort_thread: abort all node ops on shutdown
  storage_service: node_ops_abort_thread: co_return only after printing log message
  storage_service: node_ops_meta_data: add start and stop methods
  repair: node_ops_info: prevent accidental copy
2022-10-31 09:43:34 +02:00
Botond Dénes
63a90cfb6c scylla-gdb.py: collection_element: add support for boost::intrusive::list 2022-10-31 08:18:20 +02:00
Botond Dénes
2fa1864174 scylla-gdb.py: optional_printer: eliminate infinite loop
Currently, to_string() recursively calls itself for engaged optionals.
Eliminate it. Also, use the std_optional wrapper instead of accessing
std::optional internals directly.
2022-10-31 08:18:20 +02:00
Botond Dénes
77b2555a04 scylla-gdb.py: scylla-fiber: add note about user-instantiated promise objects
Scylla fiber uses a crude method of scanning inbound and outbound
references to/from other task objects of recognized type. This method
cannot detect user instantiated promise<> objects. Add a note about this
to the printout, so users are beware of this.
2022-10-31 08:18:20 +02:00
Botond Dénes
2276565a2e scylla-gdb.py: scylla-fiber: reject self-references when probing pointers
A self-reference is never the pointer we are looking for when looking
for other tasks referencing us. Reject such references when scanning
outright.
2022-10-31 08:18:20 +02:00
Botond Dénes
f4365dd7f5 scylla-gdb.py: scylla-fiber: add starting task to known tasks
We collect already seen tasks in a set to be able to detect perceived
task loops and stop when one is seen. Initialize this set with the
starting task, so if it forms a loop, we won't repeat it in the trace
before cutting the loop.
2022-10-31 08:18:20 +02:00
Botond Dénes
48bbf2e467 scylla-gdb.py: scylla-fiber: add support for walking over when_all 2022-10-31 08:18:20 +02:00
Botond Dénes
cb8f02e24b scylla-gdb.py: add when_all_state to task type whitelist 2022-10-31 08:18:20 +02:00
Botond Dénes
62621abc44 scylla-gdb.py: scylla-fiber: also print shard of tasks
Now that scylla-fiber can cross shards, it is important to display the
shard each task in the chain lives on.
2022-10-31 08:18:19 +02:00
Botond Dénes
c21c80f711 scylla-gdb.py: scylla-fiber: unify task printing
Currently there is two loops and a separate line printing the starting
task, all duplicating the formatting logic. Define a method for it and
use it in all 3 places instead.
2022-10-31 08:18:19 +02:00
Botond Dénes
c103280bfd scylla-gdb.py: scylla fiber: add support for walking over shards
Shard boundaries can be crossed in one direction currently: when looking
for waiters on a task, but not in the other direction (looking for
waited-on tasks). This patch fixes that.
2022-10-31 08:18:19 +02:00
Botond Dénes
437f888ba0 scylla-gdb.py: scylla fiber: add support for walking over seastar threads
Currently seastar threads end any attempt to follow waited-on-futures.
Seastar threads need special handling because it allocates the wake up
task on its stack. This patch adds this special handling.
2022-10-31 08:18:19 +02:00
Botond Dénes
fcc63965ed scylla-gdb.py: scylla-ptr: keep current thread context
scylla_ptr.analyze() switches to the thread the analyzed object lives
on, but forgets to switch back. This was very annoying as any commands
using it (which is a bunch of them) were prone to suddenly and
unexpectedly switching threads.
This patch makes sure that the original thread context is switched back
to after analyzing the pointer.
2022-10-31 08:18:19 +02:00
Botond Dénes
91516c1d68 scylla-gdb.py: improve scylla column_families
Rename to scylla tables. Less typing and more up-to-date.
By default it now only lists tables from local shard. Added flag -a
which brings back old behaviour (lists on all shards).
Added -u (only list user tables) and -k (list tables of provided
keyspace only) filtering options.
2022-10-31 08:18:19 +02:00
Botond Dénes
1d3d613b76 scylla-gdb.py: scylla_sstables.filename(): fix generation formatting
Generation was recently converted from an integer to an object. Update
the filename formatting, while keeping backward compatibility.
2022-10-31 08:18:19 +02:00
Botond Dénes
c869f54742 scylla-gdb.py: improve schema_ptr
Add __getitem__(), so members can be accessed.
Strip " from ks_name and cf_name.
Add is_system().
2022-10-31 08:18:19 +02:00
Botond Dénes
66832af233 scylla-gdb.py: scylla memory: restore compatibility with <= 5.1
Recent reworks around dirty memory manager broke backward compatibility
of the scylla memory command (and possibly others). This patch restores
it.
2022-10-31 08:18:19 +02:00
Tenghuan He
e0948ba199 Add directory change instruction
Add directory change instruction while building scylla

Closes #11717
2022-10-30 23:53:02 +02:00
Pavel Emelyanov
477e0c967a scylla-gdb: Evaluate LSA object sizes dynamically
The lsa-segment command tries to walk LSA segment objects by decoding
their descriptors and (!) object sizes as well. Some objects in LSA have
dynamic sizes, i.e. those depending on the object contents. The script
tries to drill down the object internals to get this size, but bad news
is that nowadays there are many dynamic objects that are not covered.
Once stepped upon unsupported object, scylla-gdb likely stops because
the "next" descriptor happens to be in the middle of the object and its
parsing throws.

This patch fixes this by taking advantage of the virtual size() call of
the migrate_fn_type all LSA objects are linked with (indirectly). It
gets the migrator object, the LSA object itself and calls

  ((migrate_fn_type*)<migrator_ptr>)->size((const void*)<object_ptr>)

with gdb. The evaluated value is the live dynamic size of the object.

fixes: #11792
refs: #2455

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11847
2022-10-28 14:11:30 +03:00
Botond Dénes
74c9aa3a3f Merge 'removenode: allow specifying nodes to ignore using host_id' from Benny Halevy
Currently, when specifying nodes to ignore for replace or removenode,
we support specifying them only using their ip address.

As discussed in https://github.com/scylladb/scylladb/issues/11839 for removenode,
we intentionally require the host uuid for specifying the node to remove,
so the nodes to ignore (that are also done, otherwise we need not ignore them),
should be consistent with that and be specified using their host_id.

The series extends the apis and allows either the nodes ip address or their host_id
to be specified, for backward compatibility.

We should deprecate the ip address method over time and convert the tests and management
software to use the ignored nodes' host_id:s instead.

Closes #11841

* github.com:scylladb/scylladb:
  api: doc: remove_node: improve summary
  api, service: storage_service: removenode: allow passing ignore_nodes as uuid:s
  storage_service: get_ignore_dead_nodes_for_replace: use tm.parse_host_id_and_endpoint
  locator: token_metadata: add parse_host_id_and_endpoint
  api: storage_service: remove_node: validate host_id
2022-10-28 13:35:04 +03:00
Benny Halevy
335a8cc362 api: doc: remove_node: improve summary
The current summary of the operation is obscure.
It refers to a token in the ring and the endpoint associated with it,
while the operation uses a host_id to identify a whole node.

Instead, clarify the summary to refer to a node in the cluster,
consistent with the description for the host_id parameter.
Also, describe the effect the call has on the data the removed node
logically owned.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-28 07:52:37 +03:00
Benny Halevy
9ef2631ec2 api, service: storage_service: removenode: allow passing ignore_nodes as uuid:s
Currently the api is inconsistent: requiring a uuid for the
host_id of the node to be removed, while the ignored nodes list
is given as comma-separated ip addresses.

Instead, support identifying the ignored_nodes either
by their host_id (uuid) or ip address.

Also, require all ignore_nodes to be of the same kind:
either UUIDs or ip addresses, as a mix of the 2 is likely
indicating a user error.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-28 07:49:03 +03:00
Benny Halevy
40cd685371 storage_service: get_ignore_dead_nodes_for_replace: use tm.parse_host_id_and_endpoint
Allow specifying the dead node to ignore either as host_id
or ip address.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-28 07:38:13 +03:00
Benny Halevy
b74807cb8a locator: token_metadata: add parse_host_id_and_endpoint
To be used for specifying nodes either by their
host_id or ip address and using the token_metadata
to resolve the mapping.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-28 07:38:13 +03:00
Benny Halevy
340a5a0c94 api: storage_service: remove_node: validate host_id
The node to be removed must be identified by its host_id.
Validate that at the api layer and pass the parsed host_id
down to storage_service::removenode.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-28 07:38:13 +03:00
Takuya ASADA
464b5de99b scylla_setup: allow symlink to --disks option
Currently, --disks options does not allow symlinks such as
/dev/disk/by-uuid/* or /dev/disk/azure/*.

To allow using them, is_unused_disk() should resolve symlink to
realpath, before evaluating the disk path.

Fixes #11634

Closes #11646
2022-10-28 07:24:11 +03:00
Botond Dénes
b744036840 Merge 'scylla_util.py: on sysconfig_parser, don't use double quote when it's possible' from Takuya ASADA
It seems like distribution original sysconfig files does not use double
quote to set the parameter when the value does not contain space.
Adding function to detect spaces in the value, don't usedouble quote
when it not detected.

Fixes #9149

Closes #9153

* github.com:scylladb/scylladb:
  scylla_util.py: adding unescape for sysconfig_parser
  scylla_util.py: on sysconfig_parser, don't use double quote when it's possible
2022-10-28 07:19:13 +03:00
Benny Halevy
44e1058f63 docs: nodetool/removenode: fix host_id in examples
removenode host_id must specify the host ID as a UUID,
not an ip address.

Fixes #11839

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11840
2022-10-27 14:29:36 +03:00
Pavel Emelyanov
7b193ab0a5 messaging_service: Deny putting INADD_ANY as preferred ip
Even though previous patch makes scylla not gossip this as internal_ip,
an extra sanity check may still be useful. E.g. older versions of scylla
may still do it, or this address can be loaded from system_keyspace.

refs: #11502

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-27 14:25:43 +03:00
Pavel Emelyanov
aa7a759ac9 messaging_service: Toss preferred ip cache management
Make it call cache_preferred_ip() even when the cache is loaded from
system_keyspace and move the connection reset there. This is mainly to
prepare for the next patch, but also makes the code a bit shorter

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-27 14:25:43 +03:00
Pavel Emelyanov
91b460f1c4 gossiping_property_file_snitch: Dont gossip INADDR_ANY preferred IP
Gossiping 0.0.0.0 as preferred IP may break the peer as it will
"interpret" this address as <myself> which is not what peer expects.
However, g.p.f.s. uses --listen-address argument as the internal IP
and it's not prohibited to configure it to be 0.0.0.0

It's better not to gossip the INTERNAL_IP property at all if the listen
address is such.

fixes: #11502

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-27 14:25:43 +03:00
Pavel Emelyanov
99579bd186 gossiping_property_file_snitch: Make _listen_address optional
As the preparation for the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-27 14:15:26 +03:00
Benny Halevy
0ea8250e83 repair: use sharded abort_source to abort repair_info
Currently we use a single shared_ptr<abort_source>
that can't be copied across shards.

Instead, use a sharded<abort_source> in node_ops_info so that each
repair_info instance will use an (optional) abort_source*
on its own shard.

Added respective start and stop methodsm plus a local_abort_source
getter to get the shard-local abort_source (if available).

Fixes #11826

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-27 12:18:30 +03:00
Benny Halevy
88f993e5ed repair: node_ops_info: add start and stop methods
Prepare for adding a sharded<abort_source> member.

Wire start/stop in storage_service::node_ops_meta_data.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-27 12:18:30 +03:00
Benny Halevy
c2f384093d storage_service: node_ops_abort_thread: abort all node ops on shutdown
A later patch adds a sharded<abort_source> to node_ops_info.
On shutdown, we must orderly stop it, so use node_ops_abort_thread
shutdown path (where node_ops_singal_abort is called will a nullopt)
to abort (and stop) all outstanding node_ops by passing
a null_uuid to node_ops_abort, and let it iterate over all
node ops to abort and stop them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-27 12:14:06 +03:00
Benny Halevy
0efd290378 storage_service: node_ops_abort_thread: co_return only after printing log message
Currently the function co_returns if (!uuid_opt)
so the log info message indicating it's stopped
is not printed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-27 12:14:03 +03:00
Benny Halevy
47e4761b4e storage_service: node_ops_meta_data: add start and stop methods
Prepare for starting and stopping repair node_ops_info

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-27 12:14:03 +03:00
Benny Halevy
5c25066ea7 repair: node_ops_info: prevent accidental copy
Delete node_ops_info copy and move constructors before
we add a sharded<abort_source> member for the per-shard repairs
in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-27 12:14:03 +03:00
Takuya ASADA
cd6030d5df scylla_util.py: adding unescape for sysconfig_parser
Even we have __escape() for escaping " middle of the value to writing
sysconfig file, we didn't unescape for reading from sysconfig file.
So adding __unescape() and call it on get().
2022-10-27 16:39:47 +09:00
Takuya ASADA
de57433bcf scylla_util.py: on sysconfig_parser, don't use double quote when it's possible
It seems like distribution original sysconfig files does not use double
quote to set the parameter when the value does not contain space.
Adding function to detect spaces in the value, don't usedouble quote
when it not detected.

Fixes #9149
2022-10-27 16:36:27 +09:00
Aleksandra Martyniuk
6494de9bb0 tasks: add alternative make_task method
Task manager tasks should be created with make_task method since
it properly sets information about child-parent relationship
between tasks. Though, sometimes we may want to keep additional
task data in classes inheriting from task_manager::task::impl.
Doing it with existing make_task method makes it impossible since
implementation objects are created internally.

The commit adds a new make_task that allows to provide a task
implementation pointer created by caller. All the fields except
for the one connected with children and parent should be set before.
2022-10-26 14:01:05 +02:00
Aleksandra Martyniuk
10d11a7baf tasks: rename parent_data to task_info and move it
parent_data struct contains info that is common	for each task,
not only in parent-child relationship context. To use it this way
without confusion, its name is changed to task_info.

In order to be able to widely and comfortably use task_info,
it is moved from tasks/task_manager.hh to tasks/types.hh
and slightly extended.
2022-10-26 14:01:05 +02:00
Aleksandra Martyniuk
9ecc2047ac tasks: move task_id to tasks/types.hh 2022-10-26 14:01:05 +02:00
Aleksandra Martyniuk
e2e8a286cc tasks: add internal flag for task_manager::task::impl
It is convenient to create many different tasks implementations
representing more and more specific parts of the operation in
a module. Presenting all of them through the api makes it cumbersome
for user to navigate and track, though.

Flag internal is added to task_manager::task::impl so that the tasks
could be filtered before they are sent to user.
2022-10-26 14:01:05 +02:00
Pavel Emelyanov
e245780d56 gossiper: Request topology states in shadow round
When doing shadow round for replacement the bootstrapping node needs to
know the dc/rack info about the node it replaces to configure it on
topology. This topology info is later used by e.g. repair service.

fixes: #11829

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11838
2022-10-25 13:21:20 +03:00
Pavel Emelyanov
64c9359443 storage_proxy: Don't use default-initialized endpoint in get_read_executor()
After calling filter_for_query() the extra_replica to speculate to may
be left default-initialized which is :0 ipv6 address. Later below this
address is used as-is to check if it belongs to the same DC or not which
is not nice, as :0 is not an address of any existing endpoint.

Recent move of dc/rack data onto topology made this place reveal itself
by emitting the internal error due to :0 not being present on the
topology's collection of endpoints. Prior to this move the dc filter
would count :0 as belonging to "default_dc" datacenter which may or may
not match with the dc of the local node.

The fix is to explicitly tell set extra_replica from unset one.

fixes: #11825

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11833
2022-10-25 09:16:50 +03:00
Takuya ASADA
1a11a38add unified: move unified package contents to sub-directory
On most of the software distribution tar.gz, it has sub-directory to contain
everything, to prevent extract contents to current directory.
We should follow this style on our unified package too.

To do this we need to increment relocatable package version to '3.0'.

Fixes #8349

Closes #8867
2022-10-25 08:58:15 +03:00
Takuya ASADA
a938b009ca scylla_raid_setup: run uuidpath existance check only after mount failed
We added UUID device file existance check on #11399, we expect UUID
device file is created before checking, and we wait for the creation by
"udevadm settle" after "mkfs.xfs".

However, we actually getting error which says UUID device file missing,
it probably means "udevadm settle" doesn't guarantee the device file created,
on some condition.

To avoid the error, use var-lib-scylla.mount to wait for UUID device
file is ready, and run the file existance check when the service is
failed.

Fixes #11617

Closes #11666
2022-10-25 08:54:21 +03:00
Yaniv Kaul
cec21d10ed docs: Fix typo (patch -> batch)
See subject.

Closes #11837
2022-10-25 08:50:44 +03:00
Michał Radwański
36508bf5e9 serializer_impl: remove unneeded generic parameter
Input stream used in vector_deserializer doesn't need to be generic, as
there is only one implementation used.
2022-10-24 17:21:38 +02:00
Tomasz Grabiec
687df05e28 db: make_forwardable::reader: Do not emit range_tombstone_change with position past the range
Since the end bound is exclusive, the end position should be
before_key(), not after_key().

Affects only tests, as far as I know, only there we can get an end
bound which is a clustering row position.

Would cause failures once row cache is switched to v2 representation
because of violated assumptions about positions.

Introduced in 76ee3f029c

Closes #11823
2022-10-24 17:06:52 +03:00
Anna Stuchlik
9f7536d549 doc: fix the link to the OS Support page 2022-10-13 15:36:51 +02:00
Anna Stuchlik
1fd1ce042a doc: replace Scylla with ScyllaDB 2022-10-13 15:21:46 +02:00
Anna Stuchlik
81ce7a88de doc: update the info about supported architecture and rewrite the introduction 2022-10-13 15:18:29 +02:00
Anna Stuchlik
3950a1cac8 doc: apply the feedback to improve clarity 2022-10-03 11:14:51 +02:00
Anna Stuchlik
46f0e99884 doc: add the link to the new Troubleshooting section and replace Scylla with ScyllaDB 2022-09-23 11:46:15 +02:00
Anna Stuchlik
af2a85b191 doc: add the new page to the toctree 2022-09-23 11:37:38 +02:00
Anna Stuchlik
b034e2856e doc: add a troubleshooting article about the missing configuration files 2022-09-23 11:17:18 +02:00
Anna Stuchlik
260f85643d doc: specify the recommended AWS instance types 2022-08-08 14:35:54 +02:00
Anna Stuchlik
2c69a8f458 doc: replace the tables with a generic description of support for Im4gn and Is4gen instances 2022-08-08 14:17:59 +02:00
Anna Stuchlik
ceaf0c41bd doc: add support for AWS i4g instances 2022-08-05 17:18:44 +02:00
Anna Stuchlik
7711436577 doc: extend the list of supported CPUs 2022-08-05 16:55:40 +02:00
Anna Stuchlik
844c875f15 doc: add info about the time-consuming step due to resharding 2022-07-26 14:52:11 +02:00
Anna Stuchlik
ff5c4a33f5 doc: add the new KB to the toctree 2022-07-25 14:29:33 +02:00
Anna Stuchlik
f1daef4b1b doc: doc: add a KB about updating the mode in perftune.yaml after upgrade 2022-07-25 14:22:02 +02:00
1493 changed files with 74409 additions and 35958 deletions

24
.github/CODEOWNERS vendored
View File

@@ -12,7 +12,7 @@ test/cql/cdc_* @kbr- @elcallio @piodul @jul-stas
test/boost/cdc_* @kbr- @elcallio @piodul @jul-stas
# COMMITLOG / BATCHLOG
db/commitlog/* @elcallio
db/commitlog/* @elcallio @eliransin
db/batch* @elcallio
# COORDINATOR
@@ -25,7 +25,7 @@ compaction/* @raphaelsc @nyh
transport/*
# CQL QUERY LANGUAGE
cql3/* @tgrabiec @psarna @cvybhu
cql3/* @tgrabiec @cvybhu @nyh
# COUNTERS
counters* @jul-stas
@@ -33,7 +33,7 @@ tests/counter_test* @jul-stas
# DOCS
docs/* @annastuchlik @tzach
docs/alternator @annastuchlik @tzach @nyh @psarna
docs/alternator @annastuchlik @tzach @nyh @havaker @nuivall
# GOSSIP
gms/* @tgrabiec @asias
@@ -45,9 +45,9 @@ dist/docker/*
utils/logalloc* @tgrabiec
# MATERIALIZED VIEWS
db/view/* @nyh @psarna
cql3/statements/*view* @nyh @psarna
test/boost/view_* @nyh @psarna
db/view/* @nyh @cvybhu @piodul
cql3/statements/*view* @nyh @cvybhu @piodul
test/boost/view_* @nyh @cvybhu @piodul
# PACKAGING
dist/* @syuu1228
@@ -62,9 +62,9 @@ service/migration* @tgrabiec @nyh
schema* @tgrabiec @nyh
# SECONDARY INDEXES
db/index/* @nyh @psarna
cql3/statements/*index* @nyh @psarna
test/boost/*index* @nyh @psarna
index/* @nyh @cvybhu @piodul
cql3/statements/*index* @nyh @cvybhu @piodul
test/boost/*index* @nyh @cvybhu @piodul
# SSTABLES
sstables/* @tgrabiec @raphaelsc @nyh
@@ -74,11 +74,11 @@ streaming/* @tgrabiec @asias
service/storage_service.* @tgrabiec @asias
# ALTERNATOR
alternator/* @nyh @psarna
test/alternator/* @nyh @psarna
alternator/* @nyh @havaker @nuivall
test/alternator/* @nyh @havaker @nuivall
# HINTED HANDOFF
db/hints/* @piodul @vladzcloudius
db/hints/* @piodul @vladzcloudius @eliransin
# REDIS
redis/* @nyh @syuu1228

View File

@@ -0,0 +1,17 @@
name: "Docs / Amplify enhanced"
on: issue_comment
jobs:
build:
runs-on: ubuntu-latest
if: ${{ github.event.issue.pull_request }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Amplify enhanced
env:
TOKEN: ${{ secrets.GITHUB_TOKEN }}
uses: scylladb/sphinx-scylladb-theme/.github/actions/amplify-enhanced@master

View File

@@ -2,10 +2,14 @@ name: "Docs / Publish"
# For more information,
# see https://sphinx-theme.scylladb.com/stable/deployment/production.html#available-workflows
env:
FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}
on:
push:
branches:
- master
- 'master'
- 'enterprise'
paths:
- "docs/**"
workflow_dispatch:
@@ -24,12 +28,13 @@ jobs:
with:
python-version: 3.7
- name: Set up env
run: make -C docs setupenv
run: make -C docs FLAG="${{ env.FLAG }}" setupenv
- name: Build docs
run: make -C docs multiversion
run: make -C docs FLAG="${{ env.FLAG }}" multiversion
- name: Build redirects
run: make -C docs redirects
run: make -C docs FLAG="${{ env.FLAG }}" redirects
- name: Deploy docs to GitHub Pages
run: ./docs/_utils/deploy.sh
if: (github.ref_name == 'master' && env.FLAG == 'opensource') || (github.ref_name == 'enterprise' && env.FLAG == 'enterprise')
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -2,10 +2,14 @@ name: "Docs / Build PR"
# For more information,
# see https://sphinx-theme.scylladb.com/stable/deployment/production.html#available-workflows
env:
FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}
on:
pull_request:
branches:
- master
- enterprise
paths:
- "docs/**"
@@ -23,6 +27,6 @@ jobs:
with:
python-version: 3.7
- name: Set up env
run: make -C docs setupenv
run: make -C docs FLAG="${{ env.FLAG }}" setupenv
- name: Build docs
run: make -C docs test
run: make -C docs FLAG="${{ env.FLAG }}" test

1
.gitignore vendored
View File

@@ -32,4 +32,3 @@ compile_commands.json
.ccls-cache/
.mypy_cache
.envrc
rust/Cargo.lock

9
.gitmodules vendored
View File

@@ -6,12 +6,6 @@
path = swagger-ui
url = ../scylla-swagger-ui
ignore = dirty
[submodule "libdeflate"]
path = libdeflate
url = ../libdeflate
[submodule "abseil"]
path = abseil
url = ../abseil-cpp
[submodule "scylla-jmx"]
path = tools/jmx
url = ../scylla-jmx
@@ -21,3 +15,6 @@
[submodule "scylla-python3"]
path = tools/python3
url = ../scylla-python3
[submodule "tools/cqlsh"]
path = tools/cqlsh
url = ../scylla-cqlsh

View File

@@ -2,803 +2,200 @@ cmake_minimum_required(VERSION 3.18)
project(scylla)
if(NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
message(STATUS "Setting build type to 'Release' as none was specified.")
set(CMAKE_BUILD_TYPE "Release" CACHE
STRING "Choose the type of build." FORCE)
# Set the possible values of build type for cmake-gui
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS
"Debug" "Release" "Dev" "Sanitize")
endif()
include(CTest)
if(CMAKE_BUILD_TYPE)
string(TOLOWER "${CMAKE_BUILD_TYPE}" BUILD_TYPE)
else()
set(BUILD_TYPE "release")
endif()
function(default_target_arch arch)
set(x86_instruction_sets i386 i686 x86_64)
if(CMAKE_SYSTEM_PROCESSOR IN_LIST x86_instruction_sets)
set(${arch} "westmere" PARENT_SCOPE)
elseif(CMAKE_SYSTEM_PROCESSOR EQUAL "aarch64")
set(${arch} "armv8-a+crc+crypto" PARENT_SCOPE)
else()
set(${arch} "" PARENT_SCOPE)
endif()
endfunction()
default_target_arch(target_arch)
if(target_arch)
set(target_arch_flag "-march=${target_arch}")
endif()
set(cxx_coro_flag)
if (CMAKE_CXX_COMPILER_ID MATCHES GNU)
set(cxx_coro_flag -fcoroutines)
endif()
list(APPEND CMAKE_MODULE_PATH
${CMAKE_CURRENT_SOURCE_DIR}/cmake
${CMAKE_CURRENT_SOURCE_DIR}/seastar/cmake)
set(CMAKE_BUILD_TYPE "${CMAKE_BUILD_TYPE}" CACHE
STRING "Choose the type of build." FORCE)
# Set the possible values of build type for cmake-gui
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS
"Debug" "Release" "Dev" "Sanitize")
string(TOUPPER "${CMAKE_BUILD_TYPE}" build_mode)
include(mode.${build_mode})
include(mode.common)
add_compile_definitions(
${Seastar_DEFINITIONS_${build_mode}}
FMT_DEPRECATED_OSTREAM)
include(limit_jobs)
# Configure Seastar compile options to align with Scylla
set(Seastar_CXX_FLAGS ${cxx_coro_flag} ${target_arch_flag} CACHE INTERNAL "" FORCE)
set(Seastar_CXX_DIALECT gnu++20 CACHE INTERNAL "" FORCE)
set(CMAKE_CXX_STANDARD "20" CACHE INTERNAL "")
set(CMAKE_CXX_EXTENSIONS ON CACHE INTERNAL "")
set(CMAKE_CXX_VISIBILITY_PRESET hidden)
set(Seastar_TESTING ON CACHE BOOL "" FORCE)
add_subdirectory(seastar)
add_subdirectory(abseil)
# Exclude absl::strerror from the default "all" target since it's not
# used in Scylla build and, moreover, makes use of deprecated glibc APIs,
# such as sys_nerr, which are not exposed from "stdio.h" since glibc 2.32,
# which happens to be the case for recent Fedora distribution versions.
#
# Need to use the internal "absl_strerror" target name instead of namespaced
# variant because `set_target_properties` does not understand the latter form,
# unfortunately.
set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)
# System libraries dependencies
find_package(Boost COMPONENTS filesystem program_options system thread regex REQUIRED)
find_package(Boost REQUIRED
COMPONENTS filesystem program_options system thread regex unit_test_framework)
find_package(Lua REQUIRED)
find_package(ZLIB REQUIRED)
find_package(ICU COMPONENTS uc REQUIRED)
find_package(ICU COMPONENTS uc i18n REQUIRED)
find_package(absl COMPONENTS hash raw_hash_set REQUIRED)
find_package(libdeflate REQUIRED)
find_package(libxcrypt REQUIRED)
find_package(Snappy REQUIRED)
find_package(RapidJSON REQUIRED)
find_package(Thrift REQUIRED)
find_package(xxHash REQUIRED)
set(scylla_build_dir "${CMAKE_BINARY_DIR}/build/${BUILD_TYPE}")
set(scylla_gen_build_dir "${scylla_build_dir}/gen")
file(MAKE_DIRECTORY "${scylla_build_dir}" "${scylla_gen_build_dir}")
set(scylla_gen_build_dir "${CMAKE_BINARY_DIR}/gen")
file(MAKE_DIRECTORY "${scylla_gen_build_dir}")
# Place libraries, executables and archives in ${buildroot}/build/${mode}/
foreach(mode RUNTIME LIBRARY ARCHIVE)
set(CMAKE_${mode}_OUTPUT_DIRECTORY "${scylla_build_dir}")
endforeach()
# Generate C++ source files from thrift definitions
function(scylla_generate_thrift)
set(one_value_args TARGET VAR IN_FILE OUT_DIR SERVICE)
cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})
get_filename_component(in_file_name ${args_IN_FILE} NAME_WE)
set(aux_out_file_name ${args_OUT_DIR}/${in_file_name})
set(outputs
${aux_out_file_name}_types.cpp
${aux_out_file_name}_types.h
${aux_out_file_name}_constants.cpp
${aux_out_file_name}_constants.h
${args_OUT_DIR}/${args_SERVICE}.cpp
${args_OUT_DIR}/${args_SERVICE}.h)
add_custom_command(
DEPENDS
${args_IN_FILE}
thrift
OUTPUT ${outputs}
COMMAND ${CMAKE_COMMAND} -E make_directory ${args_OUT_DIR}
COMMAND thrift -gen cpp:cob_style,no_skeleton -out "${args_OUT_DIR}" "${args_IN_FILE}")
add_custom_target(${args_TARGET}
DEPENDS ${outputs})
set(${args_VAR} ${outputs} PARENT_SCOPE)
endfunction()
scylla_generate_thrift(
TARGET scylla_thrift_gen_cassandra
VAR scylla_thrift_gen_cassandra_files
IN_FILE "${CMAKE_SOURCE_DIR}/interface/cassandra.thrift"
OUT_DIR ${scylla_gen_build_dir}
SERVICE Cassandra)
# Parse antlr3 grammar files and generate C++ sources
function(scylla_generate_antlr3)
set(one_value_args TARGET VAR IN_FILE OUT_DIR)
cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})
get_filename_component(in_file_pure_name ${args_IN_FILE} NAME)
get_filename_component(stem ${in_file_pure_name} NAME_WE)
set(outputs
"${args_OUT_DIR}/${stem}Lexer.hpp"
"${args_OUT_DIR}/${stem}Lexer.cpp"
"${args_OUT_DIR}/${stem}Parser.hpp"
"${args_OUT_DIR}/${stem}Parser.cpp")
add_custom_command(
DEPENDS
${args_IN_FILE}
OUTPUT ${outputs}
# Remove #ifdef'ed code from the grammar source code
COMMAND sed -e "/^#if 0/,/^#endif/d" "${args_IN_FILE}" > "${args_OUT_DIR}/${in_file_pure_name}"
COMMAND antlr3 "${args_OUT_DIR}/${in_file_pure_name}"
# We replace many local `ExceptionBaseType* ex` variables with a single function-scope one.
# Because we add such a variable to every function, and because `ExceptionBaseType` is not a global
# name, we also add a global typedef to avoid compilation errors.
COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Lexer.hpp"
COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Lexer.cpp"
COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Parser.hpp"
COMMAND sed -i
-e "s/^\\( *\\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$/\\1const \\2/"
-e "/^.*On :.*$/d"
-e "1i using ExceptionBaseType = int;"
-e "s/^{/{ ExceptionBaseType\\* ex = nullptr;/; s/ExceptionBaseType\\* ex = new/ex = new/; s/exceptions::syntax_exception e/exceptions::syntax_exception\\& e/"
"${args_OUT_DIR}/${stem}Parser.cpp"
VERBATIM)
add_custom_target(${args_TARGET}
DEPENDS ${outputs})
set(${args_VAR} ${outputs} PARENT_SCOPE)
endfunction()
set(antlr3_grammar_files
cql3/Cql.g
alternator/expressions.g)
set(antlr3_gen_files)
foreach(f ${antlr3_grammar_files})
get_filename_component(grammar_file_name "${f}" NAME_WE)
get_filename_component(f_dir "${f}" DIRECTORY)
scylla_generate_antlr3(
TARGET scylla_antlr3_gen_${grammar_file_name}
VAR scylla_antlr3_gen_${grammar_file_name}_files
IN_FILE "${CMAKE_SOURCE_DIR}/${f}"
OUT_DIR ${scylla_gen_build_dir}/${f_dir})
list(APPEND antlr3_gen_files "${scylla_antlr3_gen_${grammar_file_name}_files}")
endforeach()
# Generate C++ sources from ragel grammar files
seastar_generate_ragel(
TARGET scylla_ragel_gen_protocol_parser
VAR scylla_ragel_gen_protocol_parser_file
IN_FILE "${CMAKE_SOURCE_DIR}/redis/protocol_parser.rl"
OUT_FILE ${scylla_gen_build_dir}/redis/protocol_parser.hh)
# Generate C++ sources from Swagger definitions
set(swagger_files
api/api-doc/cache_service.json
api/api-doc/collectd.json
api/api-doc/column_family.json
api/api-doc/commitlog.json
api/api-doc/compaction_manager.json
api/api-doc/config.json
api/api-doc/endpoint_snitch_info.json
api/api-doc/error_injection.json
api/api-doc/failure_detector.json
api/api-doc/gossiper.json
api/api-doc/hinted_handoff.json
api/api-doc/lsa.json
api/api-doc/messaging_service.json
api/api-doc/storage_proxy.json
api/api-doc/storage_service.json
api/api-doc/stream_manager.json
api/api-doc/system.json
api/api-doc/task_manager.json
api/api-doc/task_manager_test.json
api/api-doc/utils.json)
set(swagger_gen_files)
foreach(f ${swagger_files})
get_filename_component(fname "${f}" NAME_WE)
get_filename_component(dir "${f}" DIRECTORY)
seastar_generate_swagger(
TARGET scylla_swagger_gen_${fname}
VAR scylla_swagger_gen_${fname}_files
IN_FILE "${CMAKE_SOURCE_DIR}/${f}"
OUT_DIR "${scylla_gen_build_dir}/${dir}")
list(APPEND swagger_gen_files "${scylla_swagger_gen_${fname}_files}")
endforeach()
# Create C++ bindings for IDL serializers
function(scylla_generate_idl_serializer)
set(one_value_args TARGET VAR IN_FILE OUT_FILE)
cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})
get_filename_component(out_dir ${args_OUT_FILE} DIRECTORY)
set(idl_compiler "${CMAKE_SOURCE_DIR}/idl-compiler.py")
find_package(Python3 COMPONENTS Interpreter)
add_custom_command(
DEPENDS
${args_IN_FILE}
${idl_compiler}
OUTPUT ${args_OUT_FILE}
COMMAND ${CMAKE_COMMAND} -E make_directory ${out_dir}
COMMAND Python3::Interpreter ${idl_compiler} --ns ser -f ${args_IN_FILE} -o ${args_OUT_FILE})
add_custom_target(${args_TARGET}
DEPENDS ${args_OUT_FILE})
set(${args_VAR} ${args_OUT_FILE} PARENT_SCOPE)
endfunction()
set(idl_serializers
idl/cache_temperature.idl.hh
idl/commitlog.idl.hh
idl/consistency_level.idl.hh
idl/frozen_mutation.idl.hh
idl/frozen_schema.idl.hh
idl/gossip_digest.idl.hh
idl/hinted_handoff.idl.hh
idl/idl_test.idl.hh
idl/keys.idl.hh
idl/messaging_service.idl.hh
idl/mutation.idl.hh
idl/paging_state.idl.hh
idl/partition_checksum.idl.hh
idl/paxos.idl.hh
idl/query.idl.hh
idl/raft.idl.hh
idl/range.idl.hh
idl/read_command.idl.hh
idl/reconcilable_result.idl.hh
idl/replay_position.idl.hh
idl/result.idl.hh
idl/ring_position.idl.hh
idl/streaming.idl.hh
idl/token.idl.hh
idl/tracing.idl.hh
idl/truncation_record.idl.hh
idl/uuid.idl.hh
idl/view.idl.hh)
set(idl_gen_files)
foreach(f ${idl_serializers})
get_filename_component(idl_name "${f}" NAME)
get_filename_component(idl_target "${idl_name}" NAME_WE)
get_filename_component(idl_dir "${f}" DIRECTORY)
string(REPLACE ".idl.hh" ".dist.hh" idl_out_hdr_name "${idl_name}")
scylla_generate_idl_serializer(
TARGET scylla_idl_gen_${idl_target}
VAR scylla_idl_gen_${idl_target}_files
IN_FILE "${CMAKE_SOURCE_DIR}/${f}"
OUT_FILE ${scylla_gen_build_dir}/${idl_dir}/${idl_out_hdr_name})
list(APPEND idl_gen_files "${scylla_idl_gen_${idl_target}_files}")
endforeach()
set(scylla_sources
add_library(scylla-main STATIC)
target_sources(scylla-main
PRIVATE
absl-flat_hash_map.cc
alternator/auth.cc
alternator/conditions.cc
alternator/controller.cc
alternator/executor.cc
alternator/expressions.cc
alternator/serialization.cc
alternator/server.cc
alternator/stats.cc
alternator/streams.cc
api/api.cc
api/cache_service.cc
api/collectd.cc
api/column_family.cc
api/commitlog.cc
api/compaction_manager.cc
api/config.cc
api/endpoint_snitch.cc
api/error_injection.cc
api/failure_detector.cc
api/gossiper.cc
api/hinted_handoff.cc
api/lsa.cc
api/messaging_service.cc
api/storage_proxy.cc
api/storage_service.cc
api/stream_manager.cc
api/system.cc
api/task_manager.cc
api/task_manager_test.cc
atomic_cell.cc
auth/allow_all_authenticator.cc
auth/allow_all_authorizer.cc
auth/authenticated_user.cc
auth/authentication_options.cc
auth/authenticator.cc
auth/common.cc
auth/default_authorizer.cc
auth/password_authenticator.cc
auth/passwords.cc
auth/permission.cc
auth/permissions_cache.cc
auth/resource.cc
auth/role_or_anonymous.cc
auth/roles-metadata.cc
auth/sasl_challenge.cc
auth/service.cc
auth/standard_role_manager.cc
auth/transitional.cc
bytes.cc
caching_options.cc
canonical_mutation.cc
cdc/cdc_partitioner.cc
cdc/generation.cc
cdc/log.cc
cdc/metadata.cc
cdc/split.cc
client_data.cc
clocks-impl.cc
collection_mutation.cc
compaction/compaction.cc
compaction/compaction_manager.cc
compaction/compaction_strategy.cc
compaction/leveled_compaction_strategy.cc
compaction/size_tiered_compaction_strategy.cc
compaction/time_window_compaction_strategy.cc
compress.cc
converting_mutation_partition_applier.cc
counters.cc
cql3/abstract_marker.cc
cql3/attributes.cc
cql3/cf_name.cc
cql3/column_condition.cc
cql3/column_identifier.cc
cql3/column_specification.cc
cql3/constants.cc
cql3/cql3_type.cc
cql3/expr/expression.cc
cql3/expr/prepare_expr.cc
cql3/expr/restrictions.cc
cql3/functions/aggregate_fcts.cc
cql3/functions/castas_fcts.cc
cql3/functions/error_injection_fcts.cc
cql3/functions/functions.cc
cql3/functions/user_function.cc
cql3/index_name.cc
cql3/keyspace_element_name.cc
cql3/lists.cc
cql3/maps.cc
cql3/operation.cc
cql3/prepare_context.cc
cql3/query_options.cc
cql3/query_processor.cc
cql3/restrictions/statement_restrictions.cc
cql3/result_set.cc
cql3/role_name.cc
cql3/selection/abstract_function_selector.cc
cql3/selection/selectable.cc
cql3/selection/selection.cc
cql3/selection/selector.cc
cql3/selection/selector_factories.cc
cql3/selection/simple_selector.cc
cql3/sets.cc
cql3/statements/alter_keyspace_statement.cc
cql3/statements/alter_service_level_statement.cc
cql3/statements/alter_table_statement.cc
cql3/statements/alter_type_statement.cc
cql3/statements/alter_view_statement.cc
cql3/statements/attach_service_level_statement.cc
cql3/statements/authentication_statement.cc
cql3/statements/authorization_statement.cc
cql3/statements/batch_statement.cc
cql3/statements/cas_request.cc
cql3/statements/cf_prop_defs.cc
cql3/statements/cf_statement.cc
cql3/statements/create_aggregate_statement.cc
cql3/statements/create_function_statement.cc
cql3/statements/create_index_statement.cc
cql3/statements/create_keyspace_statement.cc
cql3/statements/create_service_level_statement.cc
cql3/statements/create_table_statement.cc
cql3/statements/create_type_statement.cc
cql3/statements/create_view_statement.cc
cql3/statements/delete_statement.cc
cql3/statements/detach_service_level_statement.cc
cql3/statements/drop_aggregate_statement.cc
cql3/statements/drop_function_statement.cc
cql3/statements/drop_index_statement.cc
cql3/statements/drop_keyspace_statement.cc
cql3/statements/drop_service_level_statement.cc
cql3/statements/drop_table_statement.cc
cql3/statements/drop_type_statement.cc
cql3/statements/drop_view_statement.cc
cql3/statements/function_statement.cc
cql3/statements/grant_statement.cc
cql3/statements/index_prop_defs.cc
cql3/statements/index_target.cc
cql3/statements/ks_prop_defs.cc
cql3/statements/list_permissions_statement.cc
cql3/statements/list_service_level_attachments_statement.cc
cql3/statements/list_service_level_statement.cc
cql3/statements/list_users_statement.cc
cql3/statements/modification_statement.cc
cql3/statements/permission_altering_statement.cc
cql3/statements/property_definitions.cc
cql3/statements/raw/parsed_statement.cc
cql3/statements/revoke_statement.cc
cql3/statements/role-management-statements.cc
cql3/statements/schema_altering_statement.cc
cql3/statements/select_statement.cc
cql3/statements/service_level_statement.cc
cql3/statements/sl_prop_defs.cc
cql3/statements/truncate_statement.cc
cql3/statements/update_statement.cc
cql3/statements/strongly_consistent_modification_statement.cc
cql3/statements/strongly_consistent_select_statement.cc
cql3/statements/use_statement.cc
cql3/type_json.cc
cql3/untyped_result_set.cc
cql3/update_parameters.cc
cql3/user_types.cc
cql3/util.cc
cql3/ut_name.cc
cql3/values.cc
data_dictionary/data_dictionary.cc
db/batchlog_manager.cc
db/commitlog/commitlog.cc
db/commitlog/commitlog_entry.cc
db/commitlog/commitlog_replayer.cc
db/config.cc
db/consistency_level.cc
db/cql_type_parser.cc
db/data_listeners.cc
db/extensions.cc
db/heat_load_balance.cc
db/hints/host_filter.cc
db/hints/manager.cc
db/hints/resource_manager.cc
db/hints/sync_point.cc
db/large_data_handler.cc
db/legacy_schema_migrator.cc
db/marshal/type_parser.cc
db/rate_limiter.cc
db/schema_tables.cc
db/size_estimates_virtual_reader.cc
db/snapshot-ctl.cc
db/sstables-format-selector.cc
db/system_distributed_keyspace.cc
db/system_keyspace.cc
db/view/row_locking.cc
db/view/view.cc
db/view/view_update_generator.cc
db/virtual_table.cc
dht/boot_strapper.cc
dht/i_partitioner.cc
dht/murmur3_partitioner.cc
dht/range_streamer.cc
dht/token.cc
replica/distributed_loader.cc
direct_failure_detector/failure_detector.cc
duration.cc
exceptions/exceptions.cc
readers/mutation_readers.cc
frozen_mutation.cc
frozen_schema.cc
generic_server.cc
gms/application_state.cc
gms/endpoint_state.cc
gms/failure_detector.cc
gms/feature_service.cc
gms/gossip_digest_ack2.cc
gms/gossip_digest_ack.cc
gms/gossip_digest_syn.cc
gms/gossiper.cc
gms/inet_address.cc
gms/versioned_value.cc
gms/version_generator.cc
hashers.cc
index/secondary_index.cc
index/secondary_index_manager.cc
debug.cc
init.cc
keys.cc
utils/lister.cc
locator/abstract_replication_strategy.cc
locator/azure_snitch.cc
locator/ec2_multi_region_snitch.cc
locator/ec2_snitch.cc
locator/everywhere_replication_strategy.cc
locator/gce_snitch.cc
locator/gossiping_property_file_snitch.cc
locator/local_strategy.cc
locator/network_topology_strategy.cc
locator/production_snitch_base.cc
locator/rack_inferring_snitch.cc
locator/simple_snitch.cc
locator/simple_strategy.cc
locator/snitch_base.cc
locator/token_metadata.cc
lang/lua.cc
main.cc
replica/memtable.cc
message/messaging_service.cc
multishard_mutation_query.cc
mutation.cc
mutation_fragment.cc
mutation_partition.cc
mutation_partition_serializer.cc
mutation_partition_view.cc
mutation_query.cc
readers/mutation_reader.cc
mutation_writer/feed_writers.cc
mutation_writer/multishard_writer.cc
mutation_writer/partition_based_splitting_writer.cc
mutation_writer/shard_based_splitting_writer.cc
mutation_writer/timestamp_based_splitting_writer.cc
partition_slice_builder.cc
partition_version.cc
querier.cc
query.cc
query_ranges_to_vnodes.cc
query-result-set.cc
raft/fsm.cc
raft/log.cc
raft/raft.cc
raft/server.cc
raft/tracker.cc
service/broadcast_tables/experimental/lang.cc
range_tombstone.cc
range_tombstone_list.cc
tombstone_gc_options.cc
tombstone_gc.cc
reader_concurrency_semaphore.cc
redis/abstract_command.cc
redis/command_factory.cc
redis/commands.cc
redis/keyspace_utils.cc
redis/lolwut.cc
redis/mutation_utils.cc
redis/options.cc
redis/query_processor.cc
redis/query_utils.cc
redis/server.cc
redis/service.cc
redis/stats.cc
release.cc
repair/repair.cc
repair/row_level.cc
replica/database.cc
replica/table.cc
row_cache.cc
schema.cc
schema_mutations.cc
schema_registry.cc
serializer.cc
service/client_state.cc
service/forward_service.cc
service/migration_manager.cc
service/misc_services.cc
service/pager/paging_state.cc
service/pager/query_pagers.cc
service/paxos/paxos_state.cc
service/paxos/prepare_response.cc
service/paxos/prepare_summary.cc
service/paxos/proposal.cc
service/priority_manager.cc
service/qos/qos_common.cc
service/qos/service_level_controller.cc
service/qos/standard_service_level_distributed_data_accessor.cc
service/raft/raft_group_registry.cc
service/raft/raft_rpc.cc
service/raft/raft_sys_table_storage.cc
service/raft/group0_state_machine.cc
service/storage_proxy.cc
service/storage_service.cc
sstables/compress.cc
sstables/integrity_checked_file_impl.cc
sstables/kl/reader.cc
sstables/metadata_collector.cc
sstables/m_format_read_helpers.cc
sstables/mx/reader.cc
sstables/mx/writer.cc
sstables/prepended_input_stream.cc
sstables/random_access_reader.cc
sstables/sstable_directory.cc
sstables/sstable_mutation_reader.cc
sstables/sstables.cc
sstables/sstable_set.cc
sstables/sstables_manager.cc
sstables/sstable_version.cc
sstables/writer.cc
streaming/consumer.cc
streaming/progress_info.cc
streaming/session_info.cc
streaming/stream_coordinator.cc
streaming/stream_manager.cc
streaming/stream_plan.cc
streaming/stream_reason.cc
streaming/stream_receive_task.cc
streaming/stream_request.cc
streaming/stream_result_future.cc
streaming/stream_session.cc
streaming/stream_session_state.cc
streaming/stream_summary.cc
streaming/stream_task.cc
streaming/stream_transfer_task.cc
sstables_loader.cc
table_helper.cc
tasks/task_manager.cc
thrift/controller.cc
thrift/handler.cc
thrift/server.cc
thrift/thrift_validation.cc
timeout_config.cc
tools/scylla-sstable-index.cc
tools/scylla-types.cc
tracing/traced_file.cc
tracing/trace_keyspace_helper.cc
tracing/trace_state.cc
tracing/tracing_backend_registry.cc
tracing/tracing.cc
transport/controller.cc
transport/cql_protocol_extension.cc
transport/event.cc
transport/event_notifier.cc
transport/messages/result_message.cc
transport/server.cc
types.cc
unimplemented.cc
utils/arch/powerpc/crc32-vpmsum/crc32_wrapper.cc
utils/array-search.cc
utils/ascii.cc
utils/base64.cc
utils/big_decimal.cc
utils/bloom_calculations.cc
utils/bloom_filter.cc
utils/buffer_input_stream.cc
utils/build_id.cc
utils/config_file.cc
utils/directories.cc
utils/disk-error-handler.cc
utils/dynamic_bitset.cc
utils/error_injection.cc
utils/exceptions.cc
utils/file_lock.cc
utils/generation-number.cc
utils/gz/crc_combine.cc
utils/gz/gen_crc_combine_table.cc
utils/human_readable.cc
utils/i_filter.cc
utils/large_bitset.cc
utils/like_matcher.cc
utils/limiting_data_source.cc
utils/logalloc.cc
utils/managed_bytes.cc
utils/multiprecision_int.cc
utils/murmur_hash.cc
utils/rate_limiter.cc
utils/rjson.cc
utils/runtime.cc
utils/updateable_value.cc
utils/utf8.cc
utils/uuid.cc
utils/UUID_gen.cc
validation.cc
vint-serialization.cc
zstd.cc)
set(scylla_gen_sources
"${scylla_thrift_gen_cassandra_files}"
"${scylla_ragel_gen_protocol_parser_file}"
"${swagger_gen_files}"
"${idl_gen_files}"
"${antlr3_gen_files}")
target_link_libraries(scylla-main
PRIVATE
db
absl::hash
absl::raw_hash_set
Seastar::seastar
Snappy::snappy
systemd
ZLIB::ZLIB)
add_subdirectory(api)
add_subdirectory(alternator)
add_subdirectory(db)
add_subdirectory(auth)
add_subdirectory(cdc)
add_subdirectory(compaction)
add_subdirectory(cql3)
add_subdirectory(data_dictionary)
add_subdirectory(dht)
add_subdirectory(gms)
add_subdirectory(idl)
add_subdirectory(index)
add_subdirectory(interface)
add_subdirectory(lang)
add_subdirectory(locator)
add_subdirectory(mutation)
add_subdirectory(mutation_writer)
add_subdirectory(readers)
add_subdirectory(redis)
add_subdirectory(replica)
add_subdirectory(raft)
add_subdirectory(repair)
add_subdirectory(rust)
add_subdirectory(schema)
add_subdirectory(service)
add_subdirectory(sstables)
add_subdirectory(streaming)
add_subdirectory(test)
add_subdirectory(thrift)
add_subdirectory(tools)
add_subdirectory(tracing)
add_subdirectory(transport)
add_subdirectory(types)
add_subdirectory(utils)
include(add_version_library)
add_version_library(scylla_version
release.cc)
add_executable(scylla
${scylla_sources}
${scylla_gen_sources})
main.cc)
target_link_libraries(scylla PRIVATE
scylla-main
api
auth
alternator
db
cdc
compaction
cql3
data_dictionary
dht
gms
idl
index
lang
locator
mutation
mutation_writer
raft
readers
redis
repair
replica
schema
scylla_version
service
sstables
streaming
test-perf
thrift
tools
tracing
transport
types
utils)
target_link_libraries(Boost::regex
INTERFACE
ICU::i18n
ICU::uc)
target_link_libraries(scylla PRIVATE
seastar
# Boost dependencies
Boost::filesystem
Boost::program_options
Boost::system
Boost::thread
Boost::regex
Boost::headers
# Abseil libs
absl::hashtablez_sampler
absl::raw_hash_set
absl::synchronization
absl::graphcycles_internal
absl::stacktrace
absl::symbolize
absl::debugging_internal
absl::demangle_internal
absl::time
absl::time_zone
absl::int128
absl::city
absl::hash
absl::malloc_internal
absl::spinlock_wait
absl::base
absl::dynamic_annotations
absl::raw_logging_internal
absl::exponential_biased
absl::throw_delegate
# System libs
ZLIB::ZLIB
ICU::uc
systemd
zstd
snappy
${LUA_LIBRARIES}
thrift
crypt)
Boost::program_options)
# Force SHA1 build-id generation
set(default_linker_flags "-Wl,--build-id=sha1")
include(CheckLinkerFlag)
foreach(linker "lld" "gold")
set(linker_flag "-fuse-ld=${linker}")
check_linker_flag(CXX ${linker_flag} "CXX_LINKER_HAVE_${linker}")
if(CXX_LINKER_HAVE_${linker})
string(APPEND default_linker_flags " ${linker_flag}")
break()
endif()
endforeach()
set(CMAKE_EXE_LINKER_FLAGS "${default_linker_flags}" CACHE INTERNAL "")
target_link_libraries(scylla PRIVATE
-Wl,--build-id=sha1 # Force SHA1 build-id generation
# TODO: Use lld linker if it's available, otherwise gold, else bfd
-fuse-ld=lld)
# TODO: patch dynamic linker to match configure.py behavior
target_compile_options(scylla PRIVATE
-std=gnu++20
${cxx_coro_flag}
${target_arch_flag})
# Hacks needed to expose internal APIs for xxhash dependencies
target_compile_definitions(scylla PRIVATE XXH_PRIVATE_API HAVE_LZ4_COMPRESS_DEFAULT)
target_include_directories(scylla PRIVATE
"${CMAKE_CURRENT_SOURCE_DIR}"
libdeflate
abseil
"${scylla_gen_build_dir}")
###
### Create crc_combine_table helper executable.
### Use it to generate crc_combine_table.cc to be used in scylla at build time.
###
add_executable(crc_combine_table utils/gz/gen_crc_combine_table.cc)
target_link_libraries(crc_combine_table PRIVATE seastar)
target_include_directories(crc_combine_table PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}")
target_compile_options(crc_combine_table PRIVATE
-std=gnu++20
${cxx_coro_flag}
${target_arch_flag})
add_dependencies(scylla crc_combine_table)
# Generate an additional source file at build time that is needed for Scylla compilation
add_custom_command(OUTPUT "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc"
COMMAND $<TARGET_FILE:crc_combine_table> > "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc"
DEPENDS crc_combine_table)
target_sources(scylla PRIVATE "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc")
###
### Generate version file and supply appropriate compile definitions for release.cc
###
execute_process(COMMAND ${CMAKE_SOURCE_DIR}/SCYLLA-VERSION-GEN --output-dir "${CMAKE_BINARY_DIR}/gen" RESULT_VARIABLE scylla_version_gen_res)
if(scylla_version_gen_res)
message(SEND_ERROR "Version file generation failed. Return code: ${scylla_version_gen_res}")
endif()
file(READ "${CMAKE_BINARY_DIR}/gen/SCYLLA-VERSION-FILE" scylla_version)
string(STRIP "${scylla_version}" scylla_version)
file(READ "${CMAKE_BINARY_DIR}/gen/SCYLLA-RELEASE-FILE" scylla_release)
string(STRIP "${scylla_release}" scylla_release)
get_property(release_cdefs SOURCE "${CMAKE_SOURCE_DIR}/release.cc" PROPERTY COMPILE_DEFINITIONS)
list(APPEND release_cdefs "SCYLLA_VERSION=\"${scylla_version}\"" "SCYLLA_RELEASE=\"${scylla_release}\"")
set_source_files_properties("${CMAKE_SOURCE_DIR}/release.cc" PROPERTIES COMPILE_DEFINITIONS "${release_cdefs}")
###
### Custom command for building libdeflate. Link the library to scylla.
###
set(libdeflate_lib "${scylla_build_dir}/libdeflate/libdeflate.a")
add_custom_command(OUTPUT "${libdeflate_lib}"
COMMAND make -C "${CMAKE_SOURCE_DIR}/libdeflate"
BUILD_DIR=../build/${BUILD_TYPE}/libdeflate/
CC=${CMAKE_C_COMPILER}
"CFLAGS=${target_arch_flag}"
../build/${BUILD_TYPE}/libdeflate//libdeflate.a) # Two backslashes are important!
# Hack to force generating custom command to produce libdeflate.a
add_custom_target(libdeflate DEPENDS "${libdeflate_lib}")
target_link_libraries(scylla PRIVATE "${libdeflate_lib}")
# TODO: create cmake/ directory and move utilities (generate functions etc) there
# TODO: Build tests if BUILD_TESTING=on (using CTest module)

View File

@@ -2,7 +2,7 @@
## Asking questions or requesting help
Use the [Scylla Users mailing list](https://groups.google.com/g/scylladb-users) or the [Slack workspace](http://slack.scylladb.com) for general questions and help.
Use the [ScyllaDB Community Forum](https://forum.scylladb.com) or the [Slack workspace](http://slack.scylladb.com) for general questions and help.
Join the [Scylla Developers mailing list](https://groups.google.com/g/scylladb-dev) for deeper technical discussions and to discuss your ideas for contributions.

View File

@@ -195,7 +195,7 @@ $ # Edit configuration options as appropriate
$ SCYLLA_HOME=$HOME/scylla build/release/scylla
```
The `scylla.yaml` file in the repository by default writes all database data to `/var/lib/scylla`, which likely requires root access. Change the `data_file_directories` and `commitlog_directory` fields as appropriate.
The `scylla.yaml` file in the repository by default writes all database data to `/var/lib/scylla`, which likely requires root access. Change the `data_file_directories`, `commitlog_directory` and `schema_commitlog_directory` fields as appropriate.
Scylla has a number of requirements for the file-system and operating system to operate ideally and at peak performance. However, during development, these requirements can be relaxed with the `--developer-mode` flag.

View File

@@ -30,9 +30,9 @@ requirements - you just need to meet the frozen toolchain's prerequisites
Building Scylla with the frozen toolchain `dbuild` is as easy as:
```bash
$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla
$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla
```
For further information, please see:
@@ -60,7 +60,7 @@ Please note that you need to run Scylla with `dbuild` if you built it with the f
For more run options, run:
```bash
$ ./tools/toolchain/dbuild ./build/release/scylla --help
$ ./tools/toolchain/dbuild ./build/release/scylla --help
```
## Testing
@@ -100,10 +100,10 @@ If you are a developer working on Scylla, please read the [developer guidelines]
## Contact
* The [users mailing list] and [Slack channel] are for users to discuss configuration, management, and operations of the ScyllaDB open source.
* The [community forum] and [Slack channel] are for users to discuss configuration, management, and operations of the ScyllaDB open source.
* The [developers mailing list] is for developers and people interested in following the development of ScyllaDB to discuss technical topics.
[Users mailing list]: https://groups.google.com/forum/#!forum/scylladb-users
[Community forum]: https://forum.scylladb.com/
[Slack channel]: http://slack.scylladb.com/

View File

@@ -34,7 +34,7 @@ END
DATE=""
while [[ $# -gt 0 ]]; do
while [ $# -gt 0 ]; do
opt="$1"
case $opt in
-h|--help)
@@ -72,7 +72,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=5.2.0-dev
VERSION=5.3.0-dev
if test -f version
then

1
abseil

Submodule abseil deleted from 7f3c0d7811

30
alternator/CMakeLists.txt Normal file
View File

@@ -0,0 +1,30 @@
include(generate_cql_grammar)
generate_cql_grammar(
GRAMMAR expressions.g
SOURCES cql_grammar_srcs)
add_library(alternator STATIC)
target_sources(alternator
PRIVATE
controller.cc
server.cc
executor.cc
stats.cc
serialization.cc
expressions.cc
conditions.cc
auth.cc
streams.cc
ttl.cc
${cql_grammar_srcs})
target_include_directories(alternator
PUBLIC
${CMAKE_SOURCE_DIR}
${CMAKE_BINARY_DIR}
PRIVATE
${RAPIDJSON_INCLUDE_DIRS})
target_link_libraries(alternator
cql3
idl
Seastar::seastar
xxHash::xxhash)

View File

@@ -10,8 +10,6 @@
#include "log.hh"
#include <string>
#include <string_view>
#include <gnutls/crypto.h>
#include "hashers.hh"
#include "bytes.hh"
#include "alternator/auth.hh"
#include <fmt/format.h>
@@ -29,99 +27,6 @@ namespace alternator {
static logging::logger alogger("alternator-auth");
static hmac_sha256_digest hmac_sha256(std::string_view key, std::string_view msg) {
hmac_sha256_digest digest;
int ret = gnutls_hmac_fast(GNUTLS_MAC_SHA256, key.data(), key.size(), msg.data(), msg.size(), digest.data());
if (ret) {
throw std::runtime_error(fmt::format("Computing HMAC failed ({}): {}", ret, gnutls_strerror(ret)));
}
return digest;
}
static hmac_sha256_digest get_signature_key(std::string_view key, std::string_view date_stamp, std::string_view region_name, std::string_view service_name) {
auto date = hmac_sha256("AWS4" + std::string(key), date_stamp);
auto region = hmac_sha256(std::string_view(date.data(), date.size()), region_name);
auto service = hmac_sha256(std::string_view(region.data(), region.size()), service_name);
auto signing = hmac_sha256(std::string_view(service.data(), service.size()), "aws4_request");
return signing;
}
static std::string apply_sha256(std::string_view msg) {
sha256_hasher hasher;
hasher.update(msg.data(), msg.size());
return to_hex(hasher.finalize());
}
static std::string apply_sha256(const std::vector<temporary_buffer<char>>& msg) {
sha256_hasher hasher;
for (const temporary_buffer<char>& buf : msg) {
hasher.update(buf.get(), buf.size());
}
return to_hex(hasher.finalize());
}
static std::string format_time_point(db_clock::time_point tp) {
time_t time_point_repr = db_clock::to_time_t(tp);
std::string time_point_str;
time_point_str.resize(17);
::tm time_buf;
// strftime prints the terminating null character as well
std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", ::gmtime_r(&time_point_repr, &time_buf));
time_point_str.resize(16);
return time_point_str;
}
void check_expiry(std::string_view signature_date) {
//FIXME: The default 15min can be changed with X-Amz-Expires header - we should honor it
std::string expiration_str = format_time_point(db_clock::now() - 15min);
std::string validity_str = format_time_point(db_clock::now() + 15min);
if (signature_date < expiration_str) {
throw api_error::invalid_signature(
fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",
signature_date, expiration_str));
}
if (signature_date > validity_str) {
throw api_error::invalid_signature(
fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",
signature_date, validity_str));
}
}
std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
const std::vector<temporary_buffer<char>>& body_content, std::string_view region, std::string_view service, std::string_view query_string) {
auto amz_date_it = signed_headers_map.find("x-amz-date");
if (amz_date_it == signed_headers_map.end()) {
throw api_error::invalid_signature("X-Amz-Date header is mandatory for signature verification");
}
std::string_view amz_date = amz_date_it->second;
check_expiry(amz_date);
std::string_view datestamp = amz_date.substr(0, 8);
if (datestamp != orig_datestamp) {
throw api_error::invalid_signature(
format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",
orig_datestamp, datestamp));
}
std::string_view canonical_uri = "/";
std::stringstream canonical_headers;
for (const auto& header : signed_headers_map) {
canonical_headers << fmt::format("{}:{}", header.first, header.second) << '\n';
}
std::string payload_hash = apply_sha256(body_content);
std::string canonical_request = fmt::format("{}\n{}\n{}\n{}\n{}\n{}", method, canonical_uri, query_string, canonical_headers.str(), signed_headers_str, payload_hash);
std::string_view algorithm = "AWS4-HMAC-SHA256";
std::string credential_scope = fmt::format("{}/{}/{}/aws4_request", datestamp, region, service);
std::string string_to_sign = fmt::format("{}\n{}\n{}\n{}", algorithm, amz_date, credential_scope, apply_sha256(canonical_request));
hmac_sha256_digest signing_key = get_signature_key(secret_access_key, datestamp, region, service);
hmac_sha256_digest signature = hmac_sha256(std::string_view(signing_key.data(), signing_key.size()), string_to_sign);
return to_hex(bytes_view(reinterpret_cast<const int8_t*>(signature.data()), signature.size()));
}
future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username) {
schema_ptr schema = proxy.data_dictionary().find_schema("system_auth", "roles");
partition_key pk = partition_key::from_single_value(*schema, utf8_type->decompose(username));
@@ -141,7 +46,7 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::strin
service::storage_proxy::coordinator_query_result qr = co_await proxy.query(schema, std::move(command), std::move(partition_ranges), cl,
service::storage_proxy::coordinator_query_options(executor::default_timeout(), empty_service_permit(), client_state));
cql3::selection::result_set_builder builder(*selection, gc_clock::now(), cql_serialization_format::latest());
cql3::selection::result_set_builder builder(*selection, gc_clock::now());
query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *schema, *selection));
auto result_set = builder.build();

View File

@@ -20,14 +20,8 @@ class storage_proxy;
namespace alternator {
using hmac_sha256_digest = std::array<char, 32>;
using key_cache = utils::loading_cache<std::string, std::string, 1>;
std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
const std::vector<temporary_buffer<char>>& body_content, std::string_view region, std::string_view service, std::string_view query_string);
future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username);
}

View File

@@ -232,7 +232,14 @@ bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2,
if (it2->name == "S") {
return rjson::to_string_view(it1->value).starts_with(rjson::to_string_view(it2->value));
} else /* it2->name == "B" */ {
return base64_begins_with(rjson::to_string_view(it1->value), rjson::to_string_view(it2->value));
try {
return base64_begins_with(rjson::to_string_view(it1->value), rjson::to_string_view(it2->value));
} catch(std::invalid_argument&) {
// determine if any of the malformed values is from query and raise an exception if so
unwrap_bytes(it1->value, v1_from_query);
unwrap_bytes(it2->value, v2_from_query);
return false;
}
}
}
@@ -241,7 +248,7 @@ static bool is_set_of(const rjson::value& type1, const rjson::value& type2) {
}
// Check if two JSON-encoded values match with the CONTAINS relation
bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2, bool v1_from_query, bool v2_from_query) {
if (!v1) {
return false;
}
@@ -250,7 +257,12 @@ bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
if (kv1.name == "S" && kv2.name == "S") {
return rjson::to_string_view(kv1.value).find(rjson::to_string_view(kv2.value)) != std::string_view::npos;
} else if (kv1.name == "B" && kv2.name == "B") {
return rjson::base64_decode(kv1.value).find(rjson::base64_decode(kv2.value)) != bytes::npos;
auto d_kv1 = unwrap_bytes(kv1.value, v1_from_query);
auto d_kv2 = unwrap_bytes(kv2.value, v2_from_query);
if (!d_kv1 || !d_kv2) {
return false;
}
return d_kv1->find(*d_kv2) != bytes::npos;
} else if (is_set_of(kv1.name, kv2.name)) {
for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
if (*i == kv2.value) {
@@ -273,11 +285,11 @@ bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
}
// Check if two JSON-encoded values match with the NOT_CONTAINS relation
static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2, bool v1_from_query, bool v2_from_query) {
if (!v1) {
return false;
}
return !check_CONTAINS(v1, v2);
return !check_CONTAINS(v1, v2, v1_from_query, v2_from_query);
}
// Check if a JSON-encoded value equals any element of an array, which must have at least one element.
@@ -374,7 +386,12 @@ bool check_compare(const rjson::value* v1, const rjson::value& v2, const Compara
std::string_view(kv2.value.GetString(), kv2.value.GetStringLength()));
}
if (kv1.name == "B") {
return cmp(rjson::base64_decode(kv1.value), rjson::base64_decode(kv2.value));
auto d_kv1 = unwrap_bytes(kv1.value, v1_from_query);
auto d_kv2 = unwrap_bytes(kv2.value, v2_from_query);
if(!d_kv1 || !d_kv2) {
return false;
}
return cmp(*d_kv1, *d_kv2);
}
// cannot reach here, as check_comparable_type() verifies the type is one
// of the above options.
@@ -464,7 +481,13 @@ static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const r
bounds_from_query);
}
if (kv_v.name == "B") {
return check_BETWEEN(rjson::base64_decode(kv_v.value), rjson::base64_decode(kv_lb.value), rjson::base64_decode(kv_ub.value), bounds_from_query);
auto d_kv_v = unwrap_bytes(kv_v.value, v_from_query);
auto d_kv_lb = unwrap_bytes(kv_lb.value, lb_from_query);
auto d_kv_ub = unwrap_bytes(kv_ub.value, ub_from_query);
if(!d_kv_v || !d_kv_lb || !d_kv_ub) {
return false;
}
return check_BETWEEN(*d_kv_v, *d_kv_lb, *d_kv_ub, bounds_from_query);
}
if (v_from_query) {
throw api_error::validation(
@@ -557,7 +580,7 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
"got {} instead", argtype));
}
return check_CONTAINS(got, arg);
return check_CONTAINS(got, arg, false, true);
}
case comparison_operator_type::NOT_CONTAINS:
{
@@ -571,7 +594,7 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
"got {} instead", argtype));
}
return check_NOT_CONTAINS(got, arg);
return check_NOT_CONTAINS(got, arg, false, true);
}
}
throw std::logic_error(format("Internal error: corrupted operator enum: {}", int(op)));

View File

@@ -38,7 +38,7 @@ conditional_operator_type get_conditional_operator(const rjson::value& req);
bool verify_expected(const rjson::value& req, const rjson::value* previous_item);
bool verify_condition(const rjson::value& condition, bool require_all, const rjson::value* previous_item);
bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2);
bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2, bool v1_from_query, bool v2_from_query);
bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2, bool v1_from_query, bool v2_from_query);
bool verify_condition_expression(

View File

@@ -23,7 +23,7 @@ namespace alternator {
// api_error into a JSON object, and that is returned to the user.
class api_error final : public std::exception {
public:
using status_type = httpd::reply::status_type;
using status_type = http::reply::status_type;
status_type _http_code;
std::string _type;
std::string _msg;
@@ -77,7 +77,7 @@ public:
return api_error("TableNotFoundException", std::move(msg));
}
static api_error internal(std::string msg) {
return api_error("InternalServerError", std::move(msg), reply::status_type::internal_server_error);
return api_error("InternalServerError", std::move(msg), http::reply::status_type::internal_server_error);
}
// Provide the "std::exception" interface, to make it easier to print this

View File

@@ -13,12 +13,12 @@
#include <seastar/core/sleep.hh>
#include "alternator/executor.hh"
#include "log.hh"
#include "schema_builder.hh"
#include "schema/schema_builder.hh"
#include "data_dictionary/keyspace_metadata.hh"
#include "exceptions/exceptions.hh"
#include "timestamp.hh"
#include "types/map.hh"
#include "schema.hh"
#include "schema/schema.hh"
#include "query-request.hh"
#include "query-result-reader.hh"
#include "cql3/selection/selection.hh"
@@ -34,13 +34,14 @@
#include "expressions.hh"
#include "conditions.hh"
#include "cql3/constants.hh"
#include "cql3/util.hh"
#include <optional>
#include "utils/overloaded_functor.hh"
#include <seastar/json/json_elements.hh>
#include <boost/algorithm/cxx11/any_of.hpp>
#include "collection_mutation.hh"
#include "db/query_context.hh"
#include "schema.hh"
#include "schema/schema.hh"
#include "db/tags/extension.hh"
#include "db/tags/utils.hh"
#include "alternator/rmw_operation.hh"
@@ -50,11 +51,13 @@
#include <unordered_set>
#include "service/storage_proxy.hh"
#include "gms/gossiper.hh"
#include "schema_registry.hh"
#include "schema/schema_registry.hh"
#include "utils/error_injection.hh"
#include "db/schema_tables.hh"
#include "utils/rjson.hh"
using namespace std::chrono_literals;
logging::logger elogger("alternator-executor");
namespace alternator {
@@ -114,8 +117,7 @@ std::string json_string::to_json() const {
void executor::supplement_table_info(rjson::value& descr, const schema& schema, service::storage_proxy& sp) {
rjson::add(descr, "CreationDateTime", rjson::value(std::chrono::duration_cast<std::chrono::seconds>(gc_clock::now().time_since_epoch()).count()));
rjson::add(descr, "TableStatus", "ACTIVE");
auto schema_id_str = schema.id().to_sstring();
rjson::add(descr, "TableId", rjson::from_string(schema_id_str));
rjson::add(descr, "TableId", rjson::from_string(schema.id().to_sstring()));
executor::supplement_table_stream_info(descr, schema, sp);
}
@@ -127,6 +129,20 @@ void executor::supplement_table_info(rjson::value& descr, const schema& schema,
// See https://github.com/scylladb/scylla/issues/4480
static constexpr int max_table_name_length = 222;
static bool valid_table_name_chars(std::string_view name) {
for (auto c : name) {
if ((c < 'a' || c > 'z') &&
(c < 'A' || c > 'Z') &&
(c < '0' || c > '9') &&
c != '_' &&
c != '-' &&
c != '.') {
return false;
}
}
return true;
}
// The DynamoDB developer guide, https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.NamingRules
// specifies that table names "names must be between 3 and 255 characters long
// and can contain only the following characters: a-z, A-Z, 0-9, _ (underscore), - (dash), . (dot)
@@ -136,8 +152,7 @@ static void validate_table_name(const std::string& name) {
throw api_error::validation(
format("TableName must be at least 3 characters long and at most {} characters long", max_table_name_length));
}
static const std::regex valid_table_name_chars ("[a-zA-Z0-9_.-]*");
if (!std::regex_match(name.c_str(), valid_table_name_chars)) {
if (!valid_table_name_chars(name)) {
throw api_error::validation(
"TableName must satisfy regular expression pattern: [a-zA-Z0-9_.-]+");
}
@@ -153,11 +168,10 @@ static void validate_table_name(const std::string& name) {
// The view_name() function assumes the table_name has already been validated
// but validates the legality of index_name and the combination of both.
static std::string view_name(const std::string& table_name, std::string_view index_name, const std::string& delim = ":") {
static const std::regex valid_index_name_chars ("[a-zA-Z0-9_.-]*");
if (index_name.length() < 3) {
throw api_error::validation("IndexName must be at least 3 characters long");
}
if (!std::regex_match(index_name.data(), valid_index_name_chars)) {
if (!valid_table_name_chars(index_name)) {
throw api_error::validation(
format("IndexName '{}' must satisfy regular expression pattern: [a-zA-Z0-9_.-]+", index_name));
}
@@ -760,7 +774,6 @@ future<executor::request_return_type> executor::tag_resource(client_state& clien
co_return api_error::access_denied("Incorrect resource identifier");
}
schema_ptr schema = get_table_from_arn(_proxy, rjson::to_string_view(*arn));
std::map<sstring, sstring> tags_map = get_tags_of_table_or_throw(schema);
const rjson::value* tags = rjson::find(request, "Tags");
if (!tags || !tags->IsArray()) {
co_return api_error::validation("Cannot parse tags");
@@ -768,8 +781,9 @@ future<executor::request_return_type> executor::tag_resource(client_state& clien
if (tags->Size() < 1) {
co_return api_error::validation("The number of tags must be at least 1") ;
}
update_tags_map(*tags, tags_map, update_tags_action::add_tags);
co_await db::update_tags(_mm, schema, std::move(tags_map));
co_await db::modify_tags(_mm, schema->ks_name(), schema->cf_name(), [tags](std::map<sstring, sstring>& tags_map) {
update_tags_map(*tags, tags_map, update_tags_action::add_tags);
});
co_return json_string("");
}
@@ -787,9 +801,9 @@ future<executor::request_return_type> executor::untag_resource(client_state& cli
schema_ptr schema = get_table_from_arn(_proxy, rjson::to_string_view(*arn));
std::map<sstring, sstring> tags_map = get_tags_of_table_or_throw(schema);
update_tags_map(*tags, tags_map, update_tags_action::delete_tags);
co_await db::update_tags(_mm, schema, std::move(tags_map));
co_await db::modify_tags(_mm, schema->ks_name(), schema->cf_name(), [tags](std::map<sstring, sstring>& tags_map) {
update_tags_map(*tags, tags_map, update_tags_action::delete_tags);
});
co_return json_string("");
}
@@ -927,9 +941,10 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
if (!range_key.empty() && range_key != view_hash_key && range_key != view_range_key) {
add_column(view_builder, range_key, attribute_definitions, column_kind::clustering_key);
}
sstring where_clause = "\"" + view_hash_key + "\" IS NOT NULL";
sstring where_clause = format("{} IS NOT NULL", cql3::util::maybe_quote(view_hash_key));
if (!view_range_key.empty()) {
where_clause = where_clause + " AND \"" + view_hash_key + "\" IS NOT NULL";
where_clause = format("{} AND {} IS NOT NULL", where_clause,
cql3::util::maybe_quote(view_range_key));
}
where_clauses.push_back(std::move(where_clause));
view_builders.emplace_back(std::move(view_builder));
@@ -984,9 +999,10 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra
// Note above we don't need to add virtual columns, as all
// base columns were copied to view. TODO: reconsider the need
// for virtual columns when we support Projection.
sstring where_clause = "\"" + view_hash_key + "\" IS NOT NULL";
sstring where_clause = format("{} IS NOT NULL", cql3::util::maybe_quote(view_hash_key));
if (!view_range_key.empty()) {
where_clause = where_clause + " AND \"" + view_range_key + "\" IS NOT NULL";
where_clause = format("{} AND {} IS NOT NULL", where_clause,
cql3::util::maybe_quote(view_range_key));
}
where_clauses.push_back(std::move(where_clause));
view_builders.emplace_back(std::move(view_builder));
@@ -1529,7 +1545,7 @@ future<executor::request_return_type> rmw_operation::execute(service::storage_pr
// This is the old, unsafe, read before write which does first
// a read, then a write. TODO: remove this mode entirely.
return get_previous_item(proxy, client_state, schema(), _pk, _ck, permit, stats).then(
[this, &client_state, &proxy, trace_state, permit = std::move(permit)] (std::unique_ptr<rjson::value> previous_item) mutable {
[this, &proxy, trace_state, permit = std::move(permit)] (std::unique_ptr<rjson::value> previous_item) mutable {
std::optional<mutation> m = apply(std::move(previous_item), api::new_timestamp());
if (!m) {
return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("Failed condition."));
@@ -2302,7 +2318,7 @@ void executor::describe_single_item(const cql3::selection::selection& selection,
rjson::add_with_string_name(field, type_to_string((*column_it)->type), json_key_column_value(*cell, **column_it));
}
} else if (cell) {
auto deserialized = attrs_type()->deserialize(*cell, cql_serialization_format::latest());
auto deserialized = attrs_type()->deserialize(*cell);
auto keys_and_values = value_cast<map_type_impl::native_type>(deserialized);
for (auto entry : keys_and_values) {
std::string attr_name = value_cast<sstring>(entry.first);
@@ -2337,7 +2353,7 @@ std::optional<rjson::value> executor::describe_single_item(schema_ptr schema,
const std::optional<attrs_to_get>& attrs_to_get) {
rjson::value item = rjson::empty_object();
cql3::selection::result_set_builder builder(selection, gc_clock::now(), cql_serialization_format::latest());
cql3::selection::result_set_builder builder(selection, gc_clock::now());
query::result_view::consume(query_result, slice, cql3::selection::result_set_builder::visitor(builder, *schema, selection));
auto result_set = builder.build();
@@ -2360,7 +2376,7 @@ std::vector<rjson::value> executor::describe_multi_item(schema_ptr schema,
const cql3::selection::selection& selection,
const query::result& query_result,
const std::optional<attrs_to_get>& attrs_to_get) {
cql3::selection::result_set_builder builder(selection, gc_clock::now(), cql_serialization_format::latest());
cql3::selection::result_set_builder builder(selection, gc_clock::now());
query::result_view::consume(query_result, slice, cql3::selection::result_set_builder::visitor(builder, *schema, selection));
auto result_set = builder.build();
std::vector<rjson::value> ret;
@@ -3104,20 +3120,10 @@ future<executor::request_return_type> executor::get_item(client_state& client_st
});
}
// is_big() checks approximately if the given JSON value is "bigger" than
// the given big_size number of bytes. The goal is to *quickly* detect
// oversized JSON that, for example, is too large to be serialized to a
// contiguous string - we don't need an accurate size for that. Moreover,
// as soon as we detect that the JSON is indeed "big", we can return true
// and don't need to continue calculating its exact size.
// For simplicity, we use a recursive implementation. This is fine because
// Alternator limits the depth of JSONs it reads from inputs, and doesn't
// add more than a couple of levels in its own output construction.
static void check_big_object(const rjson::value& val, int& size_left);
static void check_big_array(const rjson::value& val, int& size_left);
static bool is_big(const rjson::value& val, int big_size = 100'000) {
bool is_big(const rjson::value& val, int big_size) {
if (val.IsString()) {
return ssize_t(val.GetStringLength()) > big_size;
} else if (val.IsObject()) {
@@ -3508,7 +3514,7 @@ public:
rjson::add_with_string_name(field, type_to_string((*_column_it)->type), json_key_column_value(bv, **_column_it));
}
} else {
auto deserialized = attrs_type()->deserialize(bv, cql_serialization_format::latest());
auto deserialized = attrs_type()->deserialize(bv);
auto keys_and_values = value_cast<map_type_impl::native_type>(deserialized);
for (auto entry : keys_and_values) {
std::string attr_name = value_cast<sstring>(entry.first);
@@ -3565,7 +3571,7 @@ public:
}
};
static std::tuple<rjson::value, size_t> describe_items(schema_ptr schema, const query::partition_slice& slice, const cql3::selection::selection& selection, std::unique_ptr<cql3::result_set> result_set, std::optional<attrs_to_get>&& attrs_to_get, filter&& filter) {
static std::tuple<rjson::value, size_t> describe_items(const cql3::selection::selection& selection, std::unique_ptr<cql3::result_set> result_set, std::optional<attrs_to_get>&& attrs_to_get, filter&& filter) {
describe_items_visitor visitor(selection.get_columns(), attrs_to_get, filter);
result_set->visit(visitor);
auto scanned_count = visitor.get_scanned_count();
@@ -3615,7 +3621,7 @@ static rjson::value encode_paging_state(const schema& schema, const service::pag
// We conditionally include these fields when reading CQL tables through alternator.
if (!is_alternator_keyspace(schema.ks_name()) && (!pos.has_key() || pos.get_bound_weight() != bound_weight::equal)) {
rjson::add_with_string_name(last_evaluated_key, scylla_paging_region, rjson::empty_object());
rjson::add(last_evaluated_key[scylla_paging_region.data()], "S", rjson::from_string(to_string(pos.region())));
rjson::add(last_evaluated_key[scylla_paging_region.data()], "S", rjson::from_string(fmt::to_string(pos.region())));
rjson::add_with_string_name(last_evaluated_key, scylla_paging_weight, rjson::empty_object());
rjson::add(last_evaluated_key[scylla_paging_weight.data()], "N", static_cast<int>(pos.get_bound_weight()));
}
@@ -3642,7 +3648,7 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr
if (exclusive_start_key) {
partition_key pk = pk_from_json(*exclusive_start_key, schema);
auto pos = position_in_partition(position_in_partition::partition_start_tag_t());
auto pos = position_in_partition::for_partition_start();
if (schema->clustering_key_size() > 0) {
pos = pos_from_json(*exclusive_start_key, schema);
}
@@ -3679,7 +3685,7 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr
}
auto paging_state = rs->get_metadata().paging_state();
bool has_filter = filter;
auto [items, size] = describe_items(schema, partition_slice, *selection, std::move(rs), std::move(attrs_to_get), std::move(filter));
auto [items, size] = describe_items(*selection, std::move(rs), std::move(attrs_to_get), std::move(filter));
if (paging_state) {
rjson::add(items, "LastEvaluatedKey", encode_paging_state(*schema, *paging_state));
}
@@ -3688,8 +3694,7 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr
// update our "filtered_row_matched_total" for all the rows matched, despited the filter
cql_stats.filtered_rows_matched_total += size;
}
// TODO: better threshold
if (size > 10) {
if (is_big(items)) {
return make_ready_future<executor::request_return_type>(make_streamed(std::move(items)));
}
return make_ready_future<executor::request_return_type>(make_jsonable(std::move(items)));

View File

@@ -239,4 +239,15 @@ public:
static void supplement_table_stream_info(rjson::value& descr, const schema& schema, service::storage_proxy& sp);
};
// is_big() checks approximately if the given JSON value is "bigger" than
// the given big_size number of bytes. The goal is to *quickly* detect
// oversized JSON that, for example, is too large to be serialized to a
// contiguous string - we don't need an accurate size for that. Moreover,
// as soon as we detect that the JSON is indeed "big", we can return true
// and don't need to continue calculating its exact size.
// For simplicity, we use a recursive implementation. This is fine because
// Alternator limits the depth of JSONs it reads from inputs, and doesn't
// add more than a couple of levels in its own output construction.
bool is_big(const rjson::value& val, int big_size = 100'000);
}

View File

@@ -634,7 +634,8 @@ std::unordered_map<std::string_view, function_handler_type*> function_handlers {
}
rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);
return to_bool_json(check_CONTAINS(v1.IsNull() ? nullptr : &v1, v2));
return to_bool_json(check_CONTAINS(v1.IsNull() ? nullptr : &v1, v2,
f._parameters[0].is_constant(), f._parameters[1].is_constant()));
}
},
};

View File

@@ -19,7 +19,7 @@
/*
* Parsed representation of expressions and their components.
*
* Types in alternator::parse namespace are used for holding the parse
* Types in alternator::parsed namespace are used for holding the parse
* tree - objects generated by the Antlr rules after parsing an expression.
* Because of the way Antlr works, all these objects are default-constructed
* first, and then assigned when the rule is completed, so all these types

View File

@@ -14,7 +14,7 @@
#include "rapidjson/writer.h"
#include "concrete_types.hh"
#include "cql3/type_json.hh"
#include "position_in_partition.hh"
#include "mutation/position_in_partition.hh"
static logging::logger slogger("alternator-serialization");
@@ -59,7 +59,9 @@ struct from_json_visitor {
bo.write(t.from_string(rjson::to_string_view(v)));
}
void operator()(const bytes_type_impl& t) const {
bo.write(rjson::base64_decode(v));
// FIXME: it's difficult at this point to get information if value was provided
// in request or comes from the storage, for now we assume it's user's fault.
bo.write(*unwrap_bytes(v, true));
}
void operator()(const boolean_type_impl& t) const {
bo.write(boolean_type->decompose(v.GetBool()));
@@ -73,7 +75,7 @@ struct from_json_visitor {
}
// default
void operator()(const abstract_type& t) const {
bo.write(from_json_object(t, v, cql_serialization_format::internal()));
bo.write(from_json_object(t, v));
}
};
@@ -198,7 +200,9 @@ bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column
format("The AttributeValue for a key attribute cannot contain an empty string value. Key: {}", column.name_as_text()));
}
if (column.type == bytes_type) {
return rjson::base64_decode(value);
// FIXME: it's difficult at this point to get information if value was provided
// in request or comes from the storage, for now we assume it's user's fault.
return *unwrap_bytes(value, true);
} else {
return column.type->from_string(value_view);
}
@@ -210,7 +214,7 @@ rjson::value json_key_column_value(bytes_view cell, const column_definition& col
std::string b64 = base64_encode(cell);
return rjson::from_string(b64);
} if (column.type == utf8_type) {
return rjson::from_string(std::string(reinterpret_cast<const char*>(cell.data()), cell.size()));
return rjson::from_string(reinterpret_cast<const char*>(cell.data()), cell.size());
} else if (column.type == decimal_type) {
// FIXME: use specialized Alternator number type, not the more
// general "decimal_type". A dedicated type can be more efficient
@@ -261,7 +265,6 @@ position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema)
if (bool(region_item) != bool(weight_item)) {
throw api_error::validation("Malformed value object: region and weight has to be either both missing or both present");
}
partition_region region;
bound_weight weight;
if (region_item) {
auto region_view = rjson::to_string_view(get_typed_value(*region_item, "S", scylla_paging_region, "key region"));
@@ -279,7 +282,7 @@ position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema)
return position_in_partition(region, weight, region == partition_region::clustered ? std::optional(std::move(ck)) : std::nullopt);
}
if (ck.is_empty()) {
return position_in_partition(position_in_partition::partition_start_tag_t());
return position_in_partition::for_partition_start();
}
return position_in_partition::for_key(std::move(ck));
}
@@ -319,6 +322,17 @@ std::optional<big_decimal> try_unwrap_number(const rjson::value& v) {
}
}
std::optional<bytes> unwrap_bytes(const rjson::value& value, bool from_query) {
try {
return rjson::base64_decode(value);
} catch (...) {
if (from_query) {
throw api_error::serialization(format("Invalid base64 data"));
}
return std::nullopt;
}
}
const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {
if (!v.IsObject() || v.MemberCount() != 1) {
return {"", nullptr};
@@ -348,7 +362,7 @@ rjson::value number_add(const rjson::value& v1, const rjson::value& v2) {
auto n1 = unwrap_number(v1, "UpdateExpression");
auto n2 = unwrap_number(v2, "UpdateExpression");
rjson::value ret = rjson::empty_object();
std::string str_ret = std::string((n1 + n2).to_string());
sstring str_ret = (n1 + n2).to_string();
rjson::add(ret, "N", rjson::from_string(str_ret));
return ret;
}
@@ -357,7 +371,7 @@ rjson::value number_subtract(const rjson::value& v1, const rjson::value& v2) {
auto n1 = unwrap_number(v1, "UpdateExpression");
auto n2 = unwrap_number(v2, "UpdateExpression");
rjson::value ret = rjson::empty_object();
std::string str_ret = std::string((n1 - n2).to_string());
sstring str_ret = (n1 - n2).to_string();
rjson::add(ret, "N", rjson::from_string(str_ret));
return ret;
}

View File

@@ -11,8 +11,8 @@
#include <string>
#include <string_view>
#include <optional>
#include "types.hh"
#include "schema_fwd.hh"
#include "types/types.hh"
#include "schema/schema_fwd.hh"
#include "keys.hh"
#include "utils/rjson.hh"
#include "utils/big_decimal.hh"
@@ -62,6 +62,11 @@ big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);
// when the given v does not encode a number.
std::optional<big_decimal> try_unwrap_number(const rjson::value& v);
// unwrap_bytes decodes byte value, on decoding failure it either raises api_error::serialization
// iff from_query is true or returns unset optional iff from_query is false.
// Therefore it's safe to dereference returned optional when called with from_query equal true.
std::optional<bytes> unwrap_bytes(const rjson::value& value, bool from_query);
// Check if a given JSON object encodes a set (i.e., it is a {"SS": [...]}, or "NS", "BS"
// and returns set's type and a pointer to that set. If the object does not encode a set,
// returned value is {"", nullptr}

View File

@@ -24,10 +24,13 @@
#include "gms/gossiper.hh"
#include "utils/overloaded_functor.hh"
#include "utils/fb_utilities.hh"
#include "utils/aws_sigv4.hh"
static logging::logger slogger("alternator-server");
using namespace httpd;
using request = http::request;
using reply = http::reply;
namespace alternator {
@@ -143,7 +146,7 @@ public:
std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
handle_CORS(*req, *rep, false);
return _f_handle(std::move(req), std::move(rep)).then(
[this](std::unique_ptr<reply> rep) {
[](std::unique_ptr<reply> rep) {
rep->set_mime_type("application/x-amz-json-1.0");
rep->done();
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
@@ -317,8 +320,13 @@ future<std::string> server::verify_signature(const request& req, const chunked_c
region = std::move(region),
service = std::move(service),
user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {
std::string signature = get_signature(user, *key_ptr, std::string_view(host), req._method,
datestamp, signed_headers_str, signed_headers_map, content, region, service, "");
std::string signature;
try {
signature = utils::aws::get_signature(user, *key_ptr, std::string_view(host), "/", req._method,
datestamp, signed_headers_str, signed_headers_map, &content, region, service, "");
} catch (const std::exception& e) {
throw api_error::invalid_signature(e.what());
}
if (signature != std::string_view(user_signature)) {
_key_cache.remove(user);

View File

@@ -27,11 +27,11 @@ using chunked_content = rjson::chunked_content;
class server {
static constexpr size_t content_length_limit = 16*MB;
using alternator_callback = std::function<future<executor::request_return_type>(executor&, executor::client_state&,
tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<request>)>;
tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<http::request>)>;
using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;
http_server _http_server;
http_server _https_server;
httpd::http_server _http_server;
httpd::http_server _https_server;
executor& _executor;
service::storage_proxy& _proxy;
gms::gossiper& _gossiper;
@@ -76,8 +76,8 @@ public:
private:
void set_routes(seastar::httpd::routes& r);
// If verification succeeds, returns the authenticated user's username
future<std::string> verify_signature(const seastar::httpd::request&, const chunked_content&);
future<executor::request_return_type> handle_api_request(std::unique_ptr<request> req);
future<std::string> verify_signature(const seastar::http::request&, const chunked_content&);
future<executor::request_return_type> handle_api_request(std::unique_ptr<http::request> req);
};
}

View File

@@ -27,13 +27,14 @@
#include "cql3/result_set.hh"
#include "cql3/type_json.hh"
#include "cql3/column_identifier.hh"
#include "schema_builder.hh"
#include "schema/schema_builder.hh"
#include "service/storage_proxy.hh"
#include "gms/feature.hh"
#include "gms/feature_service.hh"
#include "executor.hh"
#include "rmw_operation.hh"
#include "data_dictionary/data_dictionary.hh"
/**
* Base template type to implement rapidjson::internal::TypeHelper<...>:s
@@ -140,24 +141,43 @@ namespace alternator {
future<alternator::executor::request_return_type> alternator::executor::list_streams(client_state& client_state, service_permit permit, rjson::value request) {
_stats.api_operations.list_streams++;
auto limit = rjson::get_opt<int>(request, "Limit").value_or(std::numeric_limits<int>::max());
auto limit = rjson::get_opt<int>(request, "Limit").value_or(100);
auto streams_start = rjson::get_opt<stream_arn>(request, "ExclusiveStartStreamArn");
auto table = find_table(_proxy, request);
auto db = _proxy.data_dictionary();
auto cfs = db.get_tables();
auto i = cfs.begin();
auto e = cfs.end();
if (limit < 1) {
throw api_error::validation("Limit must be 1 or more");
}
// TODO: the unordered_map here is not really well suited for partial
// querying - we're sorting on local hash order, and creating a table
// between queries may or may not miss info. But that should be rare,
// and we can probably expect this to be a single call.
std::vector<data_dictionary::table> cfs;
if (table) {
auto log_name = cdc::log_name(table->cf_name());
try {
cfs.emplace_back(db.find_table(table->ks_name(), log_name));
} catch (data_dictionary::no_such_column_family&) {
cfs.clear();
}
} else {
cfs = db.get_tables();
}
// # 12601 (maybe?) - sort the set of tables on ID. This should ensure we never
// generate duplicates in a paged listing here. Can obviously miss things if they
// are added between paged calls and end up with a "smaller" UUID/ARN, but that
// is to be expected.
if (std::cmp_less(limit, cfs.size()) || streams_start) {
std::sort(cfs.begin(), cfs.end(), [](const data_dictionary::table& t1, const data_dictionary::table& t2) {
return t1.schema()->id().uuid() < t2.schema()->id().uuid();
});
}
auto i = cfs.begin();
auto e = cfs.end();
if (streams_start) {
i = std::find_if(i, e, [&](data_dictionary::table t) {
i = std::find_if(i, e, [&](const data_dictionary::table& t) {
return t.schema()->id().uuid() == streams_start
&& cdc::get_base_table(db.real_database(), *t.schema())
&& is_alternator_keyspace(t.schema()->ks_name())
@@ -181,14 +201,7 @@ future<alternator::executor::request_return_type> alternator::executor::list_str
if (!is_alternator_keyspace(ks_name)) {
continue;
}
if (table && ks_name != table->ks_name()) {
continue;
}
if (cdc::is_log_for_some_table(db.real_database(), ks_name, cf_name)) {
if (table && table != cdc::get_base_table(db.real_database(), *s)) {
continue;
}
rjson::value new_entry = rjson::empty_object();
last = i->schema()->id();
@@ -416,6 +429,8 @@ static std::chrono::seconds confidence_interval(data_dictionary::database db) {
return std::chrono::seconds(db.get_config().alternator_streams_time_window_s());
}
using namespace std::chrono_literals;
// Dynamo docs says no data shall live longer than 24h.
static constexpr auto dynamodb_streams_max_window = 24h;
@@ -493,7 +508,7 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl
// filter out cdc generations older than the table or now() - cdc::ttl (typically dynamodb_streams_max_window - 24h)
auto low_ts = std::max(as_timepoint(schema->id()), db_clock::now() - ttl);
return _sdks.cdc_get_versioned_streams(low_ts, { normal_token_owners }).then([this, db, shard_start, limit, ret = std::move(ret), stream_desc = std::move(stream_desc)] (std::map<db_clock::time_point, cdc::streams_version> topologies) mutable {
return _sdks.cdc_get_versioned_streams(low_ts, { normal_token_owners }).then([db, shard_start, limit, ret = std::move(ret), stream_desc = std::move(stream_desc)] (std::map<db_clock::time_point, cdc::streams_version> topologies) mutable {
auto e = topologies.end();
auto prev = e;
@@ -812,7 +827,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
}
if (!schema || !base || !is_alternator_keyspace(schema->ks_name())) {
throw api_error::resource_not_found(boost::lexical_cast<std::string>(iter.table));
throw api_error::resource_not_found(fmt::to_string(iter.table));
}
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
@@ -883,7 +898,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
return _proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), std::move(permit), client_state)).then(
[this, schema, partition_slice = std::move(partition_slice), selection = std::move(selection), start_time = std::move(start_time), limit, key_names = std::move(key_names), attr_names = std::move(attr_names), type, iter, high_ts] (service::storage_proxy::coordinator_query_result qr) mutable {
cql3::selection::result_set_builder builder(*selection, gc_clock::now(), cql_serialization_format::latest());
cql3::selection::result_set_builder builder(*selection, gc_clock::now());
query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *schema, *selection));
auto result_set = builder.build();
@@ -1012,7 +1027,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
// ugh. figure out if we are and end-of-shard
auto normal_token_owners = _proxy.get_token_metadata_ptr()->count_normal_token_owners();
return _sdks.cdc_current_generation_timestamp({ normal_token_owners }).then([this, iter, high_ts, start_time, ret = std::move(ret), nrecords](db_clock::time_point ts) mutable {
return _sdks.cdc_current_generation_timestamp({ normal_token_owners }).then([this, iter, high_ts, start_time, ret = std::move(ret)](db_clock::time_point ts) mutable {
auto& shard = iter.shard;
if (shard.time < ts && ts < high_ts) {
@@ -1029,8 +1044,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
rjson::add(ret, "NextShardIterator", iter);
}
_stats.api_operations.get_records_latency.add(std::chrono::steady_clock::now() - start_time);
// TODO: determine a better threshold...
if (nrecords > 10) {
if (is_big(ret)) {
return make_ready_future<executor::request_return_type>(make_streamed(std::move(ret)));
}
return make_ready_future<executor::request_return_type>(make_jsonable(std::move(ret)));

View File

@@ -8,6 +8,7 @@
#include <chrono>
#include <cstdint>
#include <exception>
#include <optional>
#include <seastar/core/sstring.hh>
#include <seastar/core/coroutine.hh>
@@ -17,6 +18,7 @@
#include <seastar/coroutine/maybe_yield.hh>
#include <boost/multiprecision/cpp_int.hpp>
#include "exceptions/exceptions.hh"
#include "gms/gossiper.hh"
#include "gms/inet_address.hh"
#include "inet_address_vectors.hh"
@@ -31,8 +33,8 @@
#include "service/pager/query_pagers.hh"
#include "gms/feature_service.hh"
#include "sstables/types.hh"
#include "mutation.hh"
#include "types.hh"
#include "mutation/mutation.hh"
#include "types/types.hh"
#include "types/map.hh"
#include "utils/rjson.hh"
#include "utils/big_decimal.hh"
@@ -92,24 +94,25 @@ future<executor::request_return_type> executor::update_time_to_live(client_state
}
sstring attribute_name(v->GetString(), v->GetStringLength());
std::map<sstring, sstring> tags_map = get_tags_of_table_or_throw(schema);
if (enabled) {
if (tags_map.contains(TTL_TAG_KEY)) {
co_return api_error::validation("TTL is already enabled");
co_await db::modify_tags(_mm, schema->ks_name(), schema->cf_name(), [&](std::map<sstring, sstring>& tags_map) {
if (enabled) {
if (tags_map.contains(TTL_TAG_KEY)) {
throw api_error::validation("TTL is already enabled");
}
tags_map[TTL_TAG_KEY] = attribute_name;
} else {
auto i = tags_map.find(TTL_TAG_KEY);
if (i == tags_map.end()) {
throw api_error::validation("TTL is already disabled");
} else if (i->second != attribute_name) {
throw api_error::validation(format(
"Requested to disable TTL on attribute {}, but a different attribute {} is enabled.",
attribute_name, i->second));
}
tags_map.erase(TTL_TAG_KEY);
}
tags_map[TTL_TAG_KEY] = attribute_name;
} else {
auto i = tags_map.find(TTL_TAG_KEY);
if (i == tags_map.end()) {
co_return api_error::validation("TTL is already disabled");
} else if (i->second != attribute_name) {
co_return api_error::validation(format(
"Requested to disable TTL on attribute {}, but a different attribute {} is enabled.",
attribute_name, i->second));
}
tags_map.erase(TTL_TAG_KEY);
}
co_await db::update_tags(_mm, schema, std::move(tags_map));
});
// Prepare the response, which contains a TimeToLiveSpecification
// basically identical to the request's
rjson::value response = rjson::empty_object();
@@ -548,13 +551,34 @@ static future<> scan_table_ranges(
co_return;
}
auto units = co_await get_units(page_sem, 1);
// We don't to limit page size in number of rows because there is a
// builtin limit of the page's size in bytes. Setting this limit to 1
// is useful for debugging the paging code with moderate-size data.
// We don't need to limit page size in number of rows because there is
// a builtin limit of the page's size in bytes. Setting this limit to
// 1 is useful for debugging the paging code with moderate-size data.
uint32_t limit = std::numeric_limits<uint32_t>::max();
// FIXME: which timeout?
// FIXME: if read times out, need to retry it.
std::unique_ptr<cql3::result_set> rs = co_await p->fetch_page(limit, gc_clock::now(), executor::default_timeout());
// Read a page, and if that times out, try again after a small sleep.
// If we didn't catch the timeout exception, it would cause the scan
// be aborted and only be restarted at the next scanning period.
// If we retry too many times, give up and restart the scan later.
std::unique_ptr<cql3::result_set> rs;
for (int retries=0; ; retries++) {
try {
// FIXME: which timeout?
rs = co_await p->fetch_page(limit, gc_clock::now(), executor::default_timeout());
break;
} catch(exceptions::read_timeout_exception&) {
tlogger.warn("expiration scanner read timed out, will retry: {}",
std::current_exception());
}
// If we didn't break out of this loop, add a minimal sleep
if (retries >= 10) {
// Don't get stuck forever asking the same page, maybe there's
// a bug or a real problem in several replicas. Give up on
// this scan an retry the scan from a random position later,
// in the next scan period.
throw runtime_exception("scanner thread failed after too many timeouts for the same page");
}
co_await sleep_abortable(std::chrono::seconds(1), abort_source);
}
auto rows = rs->rows();
auto meta = rs->get_metadata().get_names();
std::optional<unsigned> expiration_column;

15
amplify.yml Normal file
View File

@@ -0,0 +1,15 @@
version: 1
applications:
- frontend:
phases:
build:
commands:
- make setupenv
- make dirhtml
artifacts:
baseDirectory: _build/dirhtml
files:
- '**/*'
cache:
paths: []
appRoot: docs

70
api/CMakeLists.txt Normal file
View File

@@ -0,0 +1,70 @@
# Generate C++ sources from Swagger definitions
set(swagger_files
api-doc/authorization_cache.json
api-doc/cache_service.json
api-doc/collectd.json
api-doc/column_family.json
api-doc/commitlog.json
api-doc/compaction_manager.json
api-doc/config.json
api-doc/endpoint_snitch_info.json
api-doc/error_injection.json
api-doc/failure_detector.json
api-doc/gossiper.json
api-doc/hinted_handoff.json
api-doc/lsa.json
api-doc/messaging_service.json
api-doc/storage_proxy.json
api-doc/storage_service.json
api-doc/stream_manager.json
api-doc/system.json
api-doc/task_manager.json
api-doc/task_manager_test.json
api-doc/utils.json)
foreach(f ${swagger_files})
get_filename_component(fname "${f}" NAME_WE)
get_filename_component(dir "${f}" DIRECTORY)
seastar_generate_swagger(
TARGET scylla_swagger_gen_${fname}
VAR scylla_swagger_gen_${fname}_files
IN_FILE "${CMAKE_CURRENT_SOURCE_DIR}/${f}"
OUT_DIR "${scylla_gen_build_dir}/api/${dir}")
list(APPEND swagger_gen_files "${scylla_swagger_gen_${fname}_files}")
endforeach()
add_library(api)
target_sources(api
PRIVATE
api.cc
cache_service.cc
collectd.cc
column_family.cc
commitlog.cc
compaction_manager.cc
config.cc
endpoint_snitch.cc
error_injection.cc
authorization_cache.cc
failure_detector.cc
gossiper.cc
hinted_handoff.cc
lsa.cc
messaging_service.cc
storage_proxy.cc
storage_service.cc
stream_manager.cc
system.cc
task_manager.cc
task_manager_test.cc
${swagger_gen_files})
target_include_directories(api
PUBLIC
${CMAKE_SOURCE_DIR}
${scylla_gen_build_dir})
target_link_libraries(api
idl
wasmtime_bindings
Seastar::seastar
xxHash::xxhash)

View File

@@ -1228,7 +1228,7 @@
"operations":[
{
"method":"POST",
"summary":"Removes token (and all data associated with enpoint that had it) from the ring",
"summary":"Removes a node from the cluster. Replicated data that logically belonged to this node is redistributed among the remaining nodes.",
"type":"void",
"nickname":"remove_node",
"produces":[
@@ -1245,7 +1245,7 @@
},
{
"name":"ignore_nodes",
"description":"List of dead nodes to ingore in removenode operation",
"description":"Comma-separated list of dead nodes to ignore in removenode operation. Use the same method for all nodes to ignore: either Host IDs or ip addresses.",
"required":false,
"allowMultiple":false,
"type":"string",

View File

@@ -49,6 +49,14 @@
"type":"string",
"paramType":"path"
},
{
"name":"internal",
"description":"Boolean flag indicating whether internal tasks should be shown (false by default)",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"keyspace",
"description":"The keyspace to query about",
@@ -140,6 +148,57 @@
]
}
]
},
{
"path":"/task_manager/task_status_recursive/{task_id}",
"operations":[
{
"method":"GET",
"summary":"Get statuses of the task and all its descendants",
"type":"array",
"items":{
"type":"task_status"
},
"nickname":"get_task_status_recursively",
"produces":[
"application/json"
],
"parameters":[
{
"name":"task_id",
"description":"The uuid of a task to query about",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/task_manager/ttl",
"operations":[
{
"method":"POST",
"summary":"Set ttl in seconds and get last value",
"type":"long",
"nickname":"get_and_update_ttl",
"produces":[
"application/json"
],
"parameters":[
{
"name":"ttl",
"description":"The number of seconds for which the tasks will be kept in memory after it finishes",
"required":true,
"allowMultiple":false,
"type":"long",
"paramType":"query"
}
]
}
]
}
],
"models":{
@@ -160,6 +219,26 @@
"failed"
],
"description":"The state of a task"
},
"type":{
"type":"string",
"description":"The description of the task"
},
"keyspace":{
"type":"string",
"description":"The keyspace the task is working on (if applicable)"
},
"table":{
"type":"string",
"description":"The table the task is working on (if applicable)"
},
"entity":{
"type":"string",
"description":"Task-specific entity description"
},
"sequence_number":{
"type":"long",
"description":"The running sequence number of the task"
}
}
},
@@ -236,6 +315,13 @@
"progress_completed":{
"type":"double",
"description":"The number of units completed so far"
},
"children_ids":{
"type":"array",
"items":{
"type":"string"
},
"description":"Task IDs of children of this task"
}
}
}

View File

@@ -86,14 +86,6 @@
"type":"string",
"paramType":"query"
},
{
"name":"type",
"description":"The type of the task",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"entity",
"description":"Task-specific entity description",
@@ -156,30 +148,6 @@
]
}
]
},
{
"path":"/task_manager_test/ttl",
"operations":[
{
"method":"POST",
"summary":"Set ttl in seconds and get last value",
"type":"long",
"nickname":"get_and_update_ttl",
"produces":[
"application/json"
],
"parameters":[
{
"name":"ttl",
"description":"The number of seconds for which the tasks will be kept in memory after it finishes",
"required":true,
"allowMultiple":false,
"type":"long",
"paramType":"query"
}
]
}
]
}
}
]
}

View File

@@ -35,6 +35,7 @@
logging::logger apilog("api");
namespace api {
using namespace seastar::httpd;
static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {
try {
@@ -165,9 +166,15 @@ future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g) {
});
}
future<> set_server_load_sstable(http_context& ctx) {
future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {
return register_api(ctx, "column_family",
"The column family API", set_column_family);
"The column family API", [&sys_ks] (http_context& ctx, routes& r) {
set_column_family(ctx, r, sys_ks);
});
}
future<> unset_server_load_sstable(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_column_family(ctx, r); });
}
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms) {
@@ -187,6 +194,10 @@ future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_se
});
}
future<> unset_server_storage_proxy(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_storage_proxy(ctx, r); });
}
future<> set_server_stream_manager(http_context& ctx, sharded<streaming::stream_manager>& sm) {
return register_api(ctx, "stream_manager",
"The stream manager API", [&sm] (http_context& ctx, routes& r) {
@@ -253,25 +264,25 @@ future<> set_server_done(http_context& ctx) {
});
}
future<> set_server_task_manager(http_context& ctx) {
future<> set_server_task_manager(http_context& ctx, lw_shared_ptr<db::config> cfg) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
return ctx.http_server.set_routes([rb, &ctx, &cfg = *cfg](routes& r) {
rb->register_function(r, "task_manager",
"The task manager API");
set_task_manager(ctx, r);
set_task_manager(ctx, r, cfg);
});
}
#ifndef SCYLLA_BUILD_MODE_RELEASE
future<> set_server_task_manager_test(http_context& ctx, lw_shared_ptr<db::config> cfg) {
future<> set_server_task_manager_test(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, &cfg = *cfg](routes& r) mutable {
return ctx.http_server.set_routes([rb, &ctx](routes& r) mutable {
rb->register_function(r, "task_manager_test",
"The task manager test API");
set_task_manager_test(ctx, r, cfg);
set_task_manager_test(ctx, r);
});
}

View File

@@ -27,7 +27,7 @@ template<class T>
std::vector<sstring> container_to_vec(const T& container) {
std::vector<sstring> res;
for (auto i : container) {
res.push_back(boost::lexical_cast<std::string>(i));
res.push_back(fmt::to_string(i));
}
return res;
}
@@ -47,8 +47,8 @@ template<class T, class MAP>
std::vector<T>& map_to_key_value(const MAP& map, std::vector<T>& res) {
for (auto i : map) {
T val;
val.key = boost::lexical_cast<std::string>(i.first);
val.value = boost::lexical_cast<std::string>(i.second);
val.key = fmt::to_string(i.first);
val.value = fmt::to_string(i.second);
res.push_back(val);
}
return res;
@@ -65,7 +65,7 @@ template <typename MAP>
std::vector<sstring> map_keys(const MAP& map) {
std::vector<sstring> res;
for (const auto& i : map) {
res.push_back(boost::lexical_cast<std::string>(i.first));
res.push_back(fmt::to_string(i.first));
}
return res;
}
@@ -189,7 +189,7 @@ struct basic_ratio_holder : public json::jsonable {
typedef basic_ratio_holder<double> ratio_holder;
typedef basic_ratio_holder<int64_t> integral_ratio_holder;
class unimplemented_exception : public base_exception {
class unimplemented_exception : public httpd::base_exception {
public:
unimplemented_exception()
: base_exception("API call is not supported yet", reply::status_type::internal_server_error) {
@@ -238,7 +238,7 @@ public:
value = T{boost::lexical_cast<Base>(param)};
}
} catch (boost::bad_lexical_cast&) {
throw bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));
throw httpd::bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));
}
}
@@ -306,6 +306,6 @@ public:
}
};
utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimated_histogram& val);
httpd::utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimated_histogram& val);
}

View File

@@ -14,6 +14,9 @@
#include "tasks/task_manager.hh"
#include "seastarx.hh"
using request = http::request;
using reply = http::reply;
namespace service {
class load_meter;
@@ -99,10 +102,12 @@ future<> unset_server_authorization_cache(http_context& ctx);
future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl);
future<> unset_server_snapshot(http_context& ctx);
future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g);
future<> set_server_load_sstable(http_context& ctx);
future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks);
future<> unset_server_load_sstable(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);
future<> unset_server_messaging_service(http_context& ctx);
future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_service>& ss);
future<> unset_server_storage_proxy(http_context& ctx);
future<> set_server_stream_manager(http_context& ctx, sharded<streaming::stream_manager>& sm);
future<> unset_server_stream_manager(http_context& ctx);
future<> set_hinted_handoff(http_context& ctx, sharded<gms::gossiper>& g);
@@ -111,7 +116,7 @@ future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g);
future<> set_server_cache(http_context& ctx);
future<> set_server_compaction_manager(http_context& ctx);
future<> set_server_done(http_context& ctx);
future<> set_server_task_manager(http_context& ctx);
future<> set_server_task_manager_test(http_context& ctx, lw_shared_ptr<db::config> cfg);
future<> set_server_task_manager(http_context& ctx, lw_shared_ptr<db::config> cfg);
future<> set_server_task_manager_test(http_context& ctx);
}

View File

@@ -14,9 +14,10 @@
namespace api {
using namespace json;
using namespace seastar::httpd;
void set_authorization_cache(http_context& ctx, routes& r, sharded<auth::service> &auth_service) {
httpd::authorization_cache_json::authorization_cache_reset.set(r, [&auth_service] (std::unique_ptr<request> req) -> future<json::json_return_type> {
httpd::authorization_cache_json::authorization_cache_reset.set(r, [&auth_service] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
co_await auth_service.invoke_on_all([] (auth::service& auth) -> future<> {
auth.reset_authorization_cache();
return make_ready_future<>();

View File

@@ -12,7 +12,7 @@
namespace api {
void set_authorization_cache(http_context& ctx, routes& r, sharded<auth::service> &auth_service);
void unset_authorization_cache(http_context& ctx, routes& r);
void set_authorization_cache(http_context& ctx, httpd::routes& r, sharded<auth::service> &auth_service);
void unset_authorization_cache(http_context& ctx, httpd::routes& r);
}

View File

@@ -12,127 +12,128 @@
namespace api {
using namespace json;
using namespace seastar::httpd;
namespace cs = httpd::cache_service_json;
void set_cache_service(http_context& ctx, routes& r) {
cs::get_row_cache_save_period_in_seconds.set(r, [](std::unique_ptr<request> req) {
cs::get_row_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {
// We never save the cache
// Origin uses 0 for never
return make_ready_future<json::json_return_type>(0);
});
cs::set_row_cache_save_period_in_seconds.set(r, [](std::unique_ptr<request> req) {
cs::set_row_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
auto period = req->get_query_param("period");
return make_ready_future<json::json_return_type>(json_void());
});
cs::get_key_cache_save_period_in_seconds.set(r, [](std::unique_ptr<request> req) {
cs::get_key_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {
// We never save the cache
// Origin uses 0 for never
return make_ready_future<json::json_return_type>(0);
});
cs::set_key_cache_save_period_in_seconds.set(r, [](std::unique_ptr<request> req) {
cs::set_key_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
auto period = req->get_query_param("period");
return make_ready_future<json::json_return_type>(json_void());
});
cs::get_counter_cache_save_period_in_seconds.set(r, [](std::unique_ptr<request> req) {
cs::get_counter_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {
// We never save the cache
// Origin uses 0 for never
return make_ready_future<json::json_return_type>(0);
});
cs::set_counter_cache_save_period_in_seconds.set(r, [](std::unique_ptr<request> req) {
cs::set_counter_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
auto ccspis = req->get_query_param("ccspis");
return make_ready_future<json::json_return_type>(json_void());
});
cs::get_row_cache_keys_to_save.set(r, [](std::unique_ptr<request> req) {
cs::get_row_cache_keys_to_save.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
cs::set_row_cache_keys_to_save.set(r, [](std::unique_ptr<request> req) {
cs::set_row_cache_keys_to_save.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
auto rckts = req->get_query_param("rckts");
return make_ready_future<json::json_return_type>(json_void());
});
cs::get_key_cache_keys_to_save.set(r, [](std::unique_ptr<request> req) {
cs::get_key_cache_keys_to_save.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
cs::set_key_cache_keys_to_save.set(r, [](std::unique_ptr<request> req) {
cs::set_key_cache_keys_to_save.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
auto kckts = req->get_query_param("kckts");
return make_ready_future<json::json_return_type>(json_void());
});
cs::get_counter_cache_keys_to_save.set(r, [](std::unique_ptr<request> req) {
cs::get_counter_cache_keys_to_save.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
cs::set_counter_cache_keys_to_save.set(r, [](std::unique_ptr<request> req) {
cs::set_counter_cache_keys_to_save.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
auto cckts = req->get_query_param("cckts");
return make_ready_future<json::json_return_type>(json_void());
});
cs::invalidate_key_cache.set(r, [](std::unique_ptr<request> req) {
cs::invalidate_key_cache.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
});
cs::invalidate_counter_cache.set(r, [](std::unique_ptr<request> req) {
cs::invalidate_counter_cache.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
});
cs::set_row_cache_capacity_in_mb.set(r, [](std::unique_ptr<request> req) {
cs::set_row_cache_capacity_in_mb.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
auto capacity = req->get_query_param("capacity");
return make_ready_future<json::json_return_type>(json_void());
});
cs::set_key_cache_capacity_in_mb.set(r, [](std::unique_ptr<request> req) {
cs::set_key_cache_capacity_in_mb.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
auto period = req->get_query_param("period");
return make_ready_future<json::json_return_type>(json_void());
});
cs::set_counter_cache_capacity_in_mb.set(r, [](std::unique_ptr<request> req) {
cs::set_counter_cache_capacity_in_mb.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
auto capacity = req->get_query_param("capacity");
return make_ready_future<json::json_return_type>(json_void());
});
cs::save_caches.set(r, [](std::unique_ptr<request> req) {
cs::save_caches.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
});
cs::get_key_capacity.set(r, [] (std::unique_ptr<request> req) {
cs::get_key_capacity.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// we don't support keys cache,
@@ -140,7 +141,7 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_key_hits.set(r, [] (std::unique_ptr<request> req) {
cs::get_key_hits.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// we don't support keys cache,
@@ -148,7 +149,7 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_key_requests.set(r, [] (std::unique_ptr<request> req) {
cs::get_key_requests.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// we don't support keys cache,
@@ -156,7 +157,7 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_key_hit_rate.set(r, [] (std::unique_ptr<request> req) {
cs::get_key_hit_rate.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// we don't support keys cache,
@@ -164,21 +165,21 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_key_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
cs::get_key_hits_moving_avrage.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// See above
return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));
});
cs::get_key_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
cs::get_key_requests_moving_avrage.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// See above
return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));
});
cs::get_key_size.set(r, [] (std::unique_ptr<request> req) {
cs::get_key_size.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// we don't support keys cache,
@@ -186,7 +187,7 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_key_entries.set(r, [] (std::unique_ptr<request> req) {
cs::get_key_entries.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// we don't support keys cache,
@@ -194,7 +195,7 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<request> req) {
cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {
return db.row_cache_tracker().region().occupancy().used_space();
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
@@ -202,26 +203,26 @@ void set_cache_service(http_context& ctx, routes& r) {
});
});
cs::get_row_hits.set(r, [&ctx] (std::unique_ptr<request> req) {
cs::get_row_hits.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t(0), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.count();
}, std::plus<uint64_t>());
});
cs::get_row_requests.set(r, [&ctx] (std::unique_ptr<request> req) {
cs::get_row_requests.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t(0), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.count() + cf.get_row_cache().stats().misses.count();
}, std::plus<uint64_t>());
});
cs::get_row_hit_rate.set(r, [&ctx] (std::unique_ptr<request> req) {
cs::get_row_hit_rate.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, ratio_holder(), [](const replica::column_family& cf) {
return ratio_holder(cf.get_row_cache().stats().hits.count() + cf.get_row_cache().stats().misses.count(),
cf.get_row_cache().stats().hits.count());
}, std::plus<ratio_holder>());
});
cs::get_row_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
cs::get_row_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
@@ -229,7 +230,7 @@ void set_cache_service(http_context& ctx, routes& r) {
});
});
cs::get_row_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
cs::get_row_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.rate() + cf.get_row_cache().stats().misses.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
@@ -237,7 +238,7 @@ void set_cache_service(http_context& ctx, routes& r) {
});
});
cs::get_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cs::get_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
// In origin row size is the weighted size.
// We currently do not support weights, so we use num entries instead
return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {
@@ -247,7 +248,7 @@ void set_cache_service(http_context& ctx, routes& r) {
});
});
cs::get_row_entries.set(r, [&ctx] (std::unique_ptr<request> req) {
cs::get_row_entries.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {
return db.row_cache_tracker().partitions();
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
@@ -255,7 +256,7 @@ void set_cache_service(http_context& ctx, routes& r) {
});
});
cs::get_counter_capacity.set(r, [] (std::unique_ptr<request> req) {
cs::get_counter_capacity.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// we don't support counter cache,
@@ -263,7 +264,7 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_counter_hits.set(r, [] (std::unique_ptr<request> req) {
cs::get_counter_hits.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// we don't support counter cache,
@@ -271,7 +272,7 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_counter_requests.set(r, [] (std::unique_ptr<request> req) {
cs::get_counter_requests.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// we don't support counter cache,
@@ -279,7 +280,7 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_counter_hit_rate.set(r, [] (std::unique_ptr<request> req) {
cs::get_counter_hit_rate.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// we don't support counter cache,
@@ -287,21 +288,21 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_counter_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
cs::get_counter_hits_moving_avrage.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// See above
return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));
});
cs::get_counter_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
cs::get_counter_requests_moving_avrage.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// See above
return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));
});
cs::get_counter_size.set(r, [] (std::unique_ptr<request> req) {
cs::get_counter_size.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// we don't support counter cache,
@@ -309,7 +310,7 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_counter_entries.set(r, [] (std::unique_ptr<request> req) {
cs::get_counter_entries.set(r, [] (std::unique_ptr<http::request> req) {
// TBD
// FIXME
// we don't support counter cache,

View File

@@ -12,6 +12,6 @@
namespace api {
void set_cache_service(http_context& ctx, routes& r);
void set_cache_service(http_context& ctx, httpd::routes& r);
}

View File

@@ -52,7 +52,7 @@ static const char* str_to_regex(const sstring& v) {
}
void set_collectd(http_context& ctx, routes& r) {
cd::get_collectd.set(r, [&ctx](std::unique_ptr<request> req) {
cd::get_collectd.set(r, [](std::unique_ptr<request> req) {
auto id = ::make_shared<scollectd::type_instance_id>(req->param["pluginid"],
req->get_query_param("instance"), req->get_query_param("type"),

View File

@@ -12,6 +12,6 @@
namespace api {
void set_collectd(http_context& ctx, routes& r);
void set_collectd(http_context& ctx, httpd::routes& r);
}

View File

@@ -17,6 +17,7 @@
#include "db/system_keyspace.hh"
#include "db/data_listeners.hh"
#include "storage_service.hh"
#include "compaction/compaction_manager.hh"
#include "unimplemented.hh"
extern logging::logger apilog;
@@ -24,7 +25,6 @@ extern logging::logger apilog;
namespace api {
using namespace httpd;
using namespace std;
using namespace json;
namespace cf = httpd::column_family_json;
@@ -56,7 +56,7 @@ const table_id& get_uuid(const sstring& name, const replica::database& db) {
return get_uuid(ks, cf, db);
}
future<> foreach_column_family(http_context& ctx, const sstring& name, function<void(replica::column_family&)> f) {
future<> foreach_column_family(http_context& ctx, const sstring& name, std::function<void(replica::column_family&)> f) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.invoke_on_all([f, uuid](replica::database& db) {
@@ -303,16 +303,16 @@ ratio_holder filter_recent_false_positive_as_ratio_holder(const sstables::shared
return ratio_holder(f + sst->filter_get_recent_true_positive(), f);
}
void set_column_family(http_context& ctx, routes& r) {
void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace>& sys_ks) {
cf::get_column_family_name.set(r, [&ctx] (const_req req){
vector<sstring> res;
std::vector<sstring> res;
for (auto i: ctx.db.local().get_column_families_mapping()) {
res.push_back(i.first.first + ":" + i.first.second);
}
return res;
});
cf::get_column_family.set(r, [&ctx] (std::unique_ptr<request> req){
cf::get_column_family.set(r, [&ctx] (std::unique_ptr<http::request> req){
std::list<cf::column_family_info> res;
for (auto i: ctx.db.local().get_column_families_mapping()) {
cf::column_family_info info;
@@ -325,22 +325,22 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_column_family_name_keyspace.set(r, [&ctx] (const_req req){
vector<sstring> res;
std::vector<sstring> res;
for (auto i = ctx.db.local().get_keyspaces().cbegin(); i!= ctx.db.local().get_keyspaces().cend(); i++) {
res.push_back(i->first);
}
return res;
});
cf::get_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t{0}, [](replica::column_family& cf) {
return cf.active_memtable().partition_count();
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));
}, std::plus<>());
});
cf::get_all_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t{0}, [](replica::column_family& cf) {
return cf.active_memtable().partition_count();
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));
}, std::plus<>());
});
@@ -352,27 +352,35 @@ void set_column_family(http_context& ctx, routes& r) {
return 0;
});
cf::get_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {
return cf.active_memtable().region().occupancy().total_space();
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {
return active_memtable->region().occupancy().total_space();
}), uint64_t(0));
}, std::plus<int64_t>());
});
cf::get_all_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {
return cf.active_memtable().region().occupancy().total_space();
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {
return active_memtable->region().occupancy().total_space();
}), uint64_t(0));
}, std::plus<int64_t>());
});
cf::get_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {
return cf.active_memtable().region().occupancy().used_space();
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {
return active_memtable->region().occupancy().used_space();
}), uint64_t(0));
}, std::plus<int64_t>());
});
cf::get_all_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {
return cf.active_memtable().region().occupancy().used_space();
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {
return active_memtable->region().occupancy().used_space();
}), uint64_t(0));
}, std::plus<int64_t>());
});
@@ -384,14 +392,14 @@ void set_column_family(http_context& ctx, routes& r) {
return 0;
});
cf::get_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
warn(unimplemented::cause::INDEXES);
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {
return cf.occupancy().total_space();
}, std::plus<int64_t>());
});
cf::get_all_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
warn(unimplemented::cause::INDEXES);
return ctx.db.map_reduce0([](const replica::database& db){
return db.dirty_memory_region_group().real_memory_used();
@@ -400,30 +408,32 @@ void set_column_family(http_context& ctx, routes& r) {
});
});
cf::get_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
warn(unimplemented::cause::INDEXES);
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {
return cf.occupancy().used_space();
}, std::plus<int64_t>());
});
cf::get_all_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
warn(unimplemented::cause::INDEXES);
return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {
return cf.active_memtable().region().occupancy().used_space();
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {
return active_memtable->region().occupancy().used_space();
}), uint64_t(0));
}, std::plus<int64_t>());
});
cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats(ctx,req->param["name"] ,&replica::column_family_stats::memtable_switch_count);
});
cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats(ctx, &replica::column_family_stats::memtable_switch_count);
});
// FIXME: this refers to partitions, not rows.
cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](replica::column_family& cf) {
utils::estimated_histogram res(0);
for (auto sstables = cf.get_sstables(); auto& i : *sstables) {
@@ -435,7 +445,7 @@ void set_column_family(http_context& ctx, routes& r) {
});
// FIXME: this refers to partitions, not rows.
cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {
uint64_t res = 0;
for (auto sstables = cf.get_sstables(); auto& i : *sstables) {
@@ -446,7 +456,7 @@ void set_column_family(http_context& ctx, routes& r) {
std::plus<uint64_t>());
});
cf::get_estimated_column_count_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_estimated_column_count_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](replica::column_family& cf) {
utils::estimated_histogram res(0);
for (auto sstables = cf.get_sstables(); auto& i : *sstables) {
@@ -457,149 +467,149 @@ void set_column_family(http_context& ctx, routes& r) {
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_all_compression_ratio.set(r, [] (std::unique_ptr<request> req) {
cf::get_all_compression_ratio.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats(ctx,req->param["name"] ,&replica::column_family_stats::pending_flushes);
});
cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats(ctx, &replica::column_family_stats::pending_flushes);
});
cf::get_read.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_read.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats_count(ctx,req->param["name"] ,&replica::column_family_stats::reads);
});
cf::get_all_read.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_read.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats_count(ctx, &replica::column_family_stats::reads);
});
cf::get_write.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_write.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats_count(ctx, req->param["name"] ,&replica::column_family_stats::writes);
});
cf::get_all_write.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_write.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats_count(ctx, &replica::column_family_stats::writes);
});
cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::reads);
});
cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &replica::column_family_stats::reads);
});
cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats_sum(ctx,req->param["name"] ,&replica::column_family_stats::reads);
});
cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats_sum(ctx, req->param["name"] ,&replica::column_family_stats::writes);
});
cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_histogram(ctx, &replica::column_family_stats::writes);
});
cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_rate_and_histogram(ctx, &replica::column_family_stats::writes);
});
cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::writes);
});
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &replica::column_family_stats::writes);
});
cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_histogram(ctx, &replica::column_family_stats::writes);
});
cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_rate_and_histogram(ctx, &replica::column_family_stats::writes);
});
cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {
return cf.get_compaction_strategy().estimated_pending_compactions(cf.as_table_state());
return cf.estimate_pending_compactions();
}, std::plus<int64_t>());
});
cf::get_all_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_pending_compactions.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {
return cf.get_compaction_strategy().estimated_pending_compactions(cf.as_table_state());
return cf.estimate_pending_compactions();
}, std::plus<int64_t>());
});
cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats(ctx, req->param["name"], &replica::column_family_stats::live_sstable_count);
});
cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_stats(ctx, &replica::column_family_stats::live_sstable_count);
});
cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_unleveled_sstables(ctx, req->param["name"]);
});
cf::get_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return sum_sstable(ctx, req->param["name"], false);
});
cf::get_all_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return sum_sstable(ctx, false);
});
cf::get_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return sum_sstable(ctx, req->param["name"], true);
});
cf::get_all_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return sum_sstable(ctx, true);
});
// FIXME: this refers to partitions, not rows.
cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_partition_size, min_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_all_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_min_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, INT64_MAX, min_partition_size, min_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_partition_size, max_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_all_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_max_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, int64_t(0), max_partition_size, max_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());
});
// FIXME: this refers to partitions, not rows.
cf::get_all_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_mean_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());
});
cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
@@ -608,7 +618,7 @@ void set_column_family(http_context& ctx, routes& r) {
}, std::plus<uint64_t>());
});
cf::get_all_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
@@ -617,7 +627,7 @@ void set_column_family(http_context& ctx, routes& r) {
}, std::plus<uint64_t>());
});
cf::get_recent_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_recent_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
@@ -626,7 +636,7 @@ void set_column_family(http_context& ctx, routes& r) {
}, std::plus<uint64_t>());
});
cf::get_all_recent_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_recent_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
@@ -635,31 +645,31 @@ void set_column_family(http_context& ctx, routes& r) {
}, std::plus<uint64_t>());
});
cf::get_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (replica::column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
});
cf::get_all_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, ratio_holder(), [] (replica::column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
});
cf::get_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (replica::column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_recent_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
});
cf::get_all_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, ratio_holder(), [] (replica::column_family& cf) {
return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_recent_false_positive_as_ratio_holder), ratio_holder());
}, std::plus<>());
});
cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
@@ -668,7 +678,7 @@ void set_column_family(http_context& ctx, routes& r) {
}, std::plus<uint64_t>());
});
cf::get_all_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
@@ -677,7 +687,7 @@ void set_column_family(http_context& ctx, routes& r) {
}, std::plus<uint64_t>());
});
cf::get_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
@@ -686,7 +696,7 @@ void set_column_family(http_context& ctx, routes& r) {
}, std::plus<uint64_t>());
});
cf::get_all_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
@@ -695,7 +705,7 @@ void set_column_family(http_context& ctx, routes& r) {
}, std::plus<uint64_t>());
});
cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
@@ -704,7 +714,7 @@ void set_column_family(http_context& ctx, routes& r) {
}, std::plus<uint64_t>());
});
cf::get_all_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (replica::column_family& cf) {
auto sstables = cf.get_sstables();
return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {
@@ -713,7 +723,7 @@ void set_column_family(http_context& ctx, routes& r) {
}, std::plus<uint64_t>());
});
cf::get_compression_metadata_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {
cf::get_compression_metadata_off_heap_memory_used.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
// FIXME
// We are missing the off heap memory calculation
@@ -723,33 +733,33 @@ void set_column_family(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cf::get_all_compression_metadata_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {
cf::get_all_compression_metadata_off_heap_memory_used.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
cf::get_speculative_retries.set(r, [] (std::unique_ptr<request> req) {
cf::get_speculative_retries.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
cf::get_all_speculative_retries.set(r, [] (std::unique_ptr<request> req) {
cf::get_all_speculative_retries.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
cf::get_key_cache_hit_rate.set(r, [] (std::unique_ptr<request> req) {
cf::get_key_cache_hit_rate.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
cf::get_true_snapshots_size.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_true_snapshots_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
auto uuid = get_uuid(req->param["name"], ctx.db.local());
return ctx.db.local().find_column_family(uuid).get_snapshot_details().then([](
const std::unordered_map<sstring, replica::column_family::snapshot_details>& sd) {
@@ -761,26 +771,26 @@ void set_column_family(http_context& ctx, routes& r) {
});
});
cf::get_all_true_snapshots_size.set(r, [] (std::unique_ptr<request> req) {
cf::get_all_true_snapshots_size.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
cf::get_row_cache_hit_out_of_range.set(r, [] (std::unique_ptr<request> req) {
cf::get_row_cache_hit_out_of_range.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
cf::get_all_row_cache_hit_out_of_range.set(r, [] (std::unique_ptr<request> req) {
cf::get_all_row_cache_hit_out_of_range.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
cf::get_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_row_cache_hit.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, req->param["name"], utils::rate_moving_average(), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
@@ -788,7 +798,7 @@ void set_column_family(http_context& ctx, routes& r) {
});
});
cf::get_all_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_row_cache_hit.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
@@ -796,7 +806,7 @@ void set_column_family(http_context& ctx, routes& r) {
});
});
cf::get_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_row_cache_miss.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, req->param["name"], utils::rate_moving_average(), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().misses.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
@@ -804,7 +814,7 @@ void set_column_family(http_context& ctx, routes& r) {
});
});
cf::get_all_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_row_cache_miss.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().misses.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
@@ -813,40 +823,40 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {
return cf.get_stats().cas_prepare.histogram();
});
});
cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {
return cf.get_stats().cas_accept.histogram();
});
});
cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {
return cf.get_stats().cas_learn.histogram();
});
});
cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](replica::column_family& cf) {
return cf.get_stats().estimated_sstable_per_read;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::tombstone_scanned);
});
cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::live_scanned);
});
cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<request> req) {
cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
@@ -860,7 +870,7 @@ void set_column_family(http_context& ctx, routes& r) {
return !cf.is_auto_compaction_disabled_by_user();
});
cf::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
cf::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {
auto g = replica::database::autocompaction_toggle_guard(db);
return foreach_column_family(ctx, req->param["name"], [](replica::column_family &cf) {
@@ -871,7 +881,7 @@ void set_column_family(http_context& ctx, routes& r) {
});
});
cf::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
cf::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {
auto g = replica::database::autocompaction_toggle_guard(db);
return foreach_column_family(ctx, req->param["name"], [](replica::column_family &cf) {
@@ -882,11 +892,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
});
cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {
cf::get_built_indexes.set(r, [&ctx, &sys_ks](std::unique_ptr<http::request> req) {
auto ks_cf = parse_fully_qualified_cf_name(req->param["name"]);
auto&& ks = std::get<0>(ks_cf);
auto&& cf_name = std::get<1>(ks_cf);
return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace_view_build_progress>& vb) mutable {
return sys_ks.local().load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace_view_build_progress>& vb) mutable {
std::set<sstring> vp;
for (auto b : vb) {
if (b.view.first == ks) {
@@ -920,7 +930,7 @@ void set_column_family(http_context& ctx, routes& r) {
return std::vector<sstring>();
});
cf::get_compression_ratio.set(r, [&ctx](std::unique_ptr<request> req) {
cf::get_compression_ratio.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto uuid = get_uuid(req->param["name"], ctx.db.local());
return ctx.db.map_reduce(sum_ratio<double>(), [uuid](replica::database& db) {
@@ -931,19 +941,19 @@ void set_column_family(http_context& ctx, routes& r) {
});
});
cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {
return cf.get_stats().reads.histogram();
});
});
cf::get_write_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
cf::get_write_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {
return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {
return cf.get_stats().writes.histogram();
});
});
cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<request> req) {
cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<http::request> req) {
sstring strategy = req->get_query_param("class_name");
return foreach_column_family(ctx, req->param["name"], [strategy](replica::column_family& cf) {
cf.set_compaction_strategy(sstables::compaction_strategy::type(strategy));
@@ -956,19 +966,19 @@ void set_column_family(http_context& ctx, routes& r) {
return ctx.db.local().find_column_family(get_uuid(req.param["name"], ctx.db.local())).get_compaction_strategy().name();
});
cf::set_compression_parameters.set(r, [&ctx](std::unique_ptr<request> req) {
cf::set_compression_parameters.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
});
cf::set_crc_check_chance.set(r, [&ctx](std::unique_ptr<request> req) {
cf::set_crc_check_chance.set(r, [](std::unique_ptr<http::request> req) {
// TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
});
cf::get_sstable_count_per_level.set(r, [&ctx](std::unique_ptr<request> req) {
cf::get_sstable_count_per_level.set(r, [&ctx](std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, req->param["name"], std::vector<uint64_t>(), [](const replica::column_family& cf) {
return cf.sstable_count_per_level();
}, concat_sstable_count_per_level).then([](const std::vector<uint64_t>& res) {
@@ -976,7 +986,7 @@ void set_column_family(http_context& ctx, routes& r) {
});
});
cf::get_sstables_for_key.set(r, [&ctx](std::unique_ptr<request> req) {
cf::get_sstables_for_key.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto key = req->get_query_param("key");
auto uuid = get_uuid(req->param["name"], ctx.db.local());
@@ -992,7 +1002,7 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::toppartitions.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::toppartitions.set(r, [&ctx] (std::unique_ptr<http::request> req) {
auto name = req->param["name"];
auto [ks, cf] = parse_fully_qualified_cf_name(name);
@@ -1008,15 +1018,127 @@ void set_column_family(http_context& ctx, routes& r) {
});
});
cf::force_major_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
cf::force_major_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
if (req->get_query_param("split_output") != "") {
fail(unimplemented::cause::API);
}
return foreach_column_family(ctx, req->param["name"], [](replica::column_family &cf) {
return cf.compact_all_sstables();
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
auto [ks, cf] = parse_fully_qualified_cf_name(req->param["name"]);
auto keyspace = validate_keyspace(ctx, ks);
std::vector<table_id> table_infos = {ctx.db.local().find_uuid(ks, cf)};
auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, std::move(table_infos));
co_await task->done();
co_return json_void();
});
}
void unset_column_family(http_context& ctx, routes& r) {
cf::get_column_family_name.unset(r);
cf::get_column_family.unset(r);
cf::get_column_family_name_keyspace.unset(r);
cf::get_memtable_columns_count.unset(r);
cf::get_all_memtable_columns_count.unset(r);
cf::get_memtable_on_heap_size.unset(r);
cf::get_all_memtable_on_heap_size.unset(r);
cf::get_memtable_off_heap_size.unset(r);
cf::get_all_memtable_off_heap_size.unset(r);
cf::get_memtable_live_data_size.unset(r);
cf::get_all_memtable_live_data_size.unset(r);
cf::get_cf_all_memtables_on_heap_size.unset(r);
cf::get_all_cf_all_memtables_on_heap_size.unset(r);
cf::get_cf_all_memtables_off_heap_size.unset(r);
cf::get_all_cf_all_memtables_off_heap_size.unset(r);
cf::get_cf_all_memtables_live_data_size.unset(r);
cf::get_all_cf_all_memtables_live_data_size.unset(r);
cf::get_memtable_switch_count.unset(r);
cf::get_all_memtable_switch_count.unset(r);
cf::get_estimated_row_size_histogram.unset(r);
cf::get_estimated_row_count.unset(r);
cf::get_estimated_column_count_histogram.unset(r);
cf::get_all_compression_ratio.unset(r);
cf::get_pending_flushes.unset(r);
cf::get_all_pending_flushes.unset(r);
cf::get_read.unset(r);
cf::get_all_read.unset(r);
cf::get_write.unset(r);
cf::get_all_write.unset(r);
cf::get_read_latency_histogram_depricated.unset(r);
cf::get_read_latency_histogram.unset(r);
cf::get_read_latency.unset(r);
cf::get_write_latency.unset(r);
cf::get_all_read_latency_histogram_depricated.unset(r);
cf::get_all_read_latency_histogram.unset(r);
cf::get_write_latency_histogram_depricated.unset(r);
cf::get_write_latency_histogram.unset(r);
cf::get_all_write_latency_histogram_depricated.unset(r);
cf::get_all_write_latency_histogram.unset(r);
cf::get_pending_compactions.unset(r);
cf::get_all_pending_compactions.unset(r);
cf::get_live_ss_table_count.unset(r);
cf::get_all_live_ss_table_count.unset(r);
cf::get_unleveled_sstables.unset(r);
cf::get_live_disk_space_used.unset(r);
cf::get_all_live_disk_space_used.unset(r);
cf::get_total_disk_space_used.unset(r);
cf::get_all_total_disk_space_used.unset(r);
cf::get_min_row_size.unset(r);
cf::get_all_min_row_size.unset(r);
cf::get_max_row_size.unset(r);
cf::get_all_max_row_size.unset(r);
cf::get_mean_row_size.unset(r);
cf::get_all_mean_row_size.unset(r);
cf::get_bloom_filter_false_positives.unset(r);
cf::get_all_bloom_filter_false_positives.unset(r);
cf::get_recent_bloom_filter_false_positives.unset(r);
cf::get_all_recent_bloom_filter_false_positives.unset(r);
cf::get_bloom_filter_false_ratio.unset(r);
cf::get_all_bloom_filter_false_ratio.unset(r);
cf::get_recent_bloom_filter_false_ratio.unset(r);
cf::get_all_recent_bloom_filter_false_ratio.unset(r);
cf::get_bloom_filter_disk_space_used.unset(r);
cf::get_all_bloom_filter_disk_space_used.unset(r);
cf::get_bloom_filter_off_heap_memory_used.unset(r);
cf::get_all_bloom_filter_off_heap_memory_used.unset(r);
cf::get_index_summary_off_heap_memory_used.unset(r);
cf::get_all_index_summary_off_heap_memory_used.unset(r);
cf::get_compression_metadata_off_heap_memory_used.unset(r);
cf::get_all_compression_metadata_off_heap_memory_used.unset(r);
cf::get_speculative_retries.unset(r);
cf::get_all_speculative_retries.unset(r);
cf::get_key_cache_hit_rate.unset(r);
cf::get_true_snapshots_size.unset(r);
cf::get_all_true_snapshots_size.unset(r);
cf::get_row_cache_hit_out_of_range.unset(r);
cf::get_all_row_cache_hit_out_of_range.unset(r);
cf::get_row_cache_hit.unset(r);
cf::get_all_row_cache_hit.unset(r);
cf::get_row_cache_miss.unset(r);
cf::get_all_row_cache_miss.unset(r);
cf::get_cas_prepare.unset(r);
cf::get_cas_propose.unset(r);
cf::get_cas_commit.unset(r);
cf::get_sstables_per_read_histogram.unset(r);
cf::get_tombstone_scanned_histogram.unset(r);
cf::get_live_scanned_histogram.unset(r);
cf::get_col_update_time_delta_histogram.unset(r);
cf::get_auto_compaction.unset(r);
cf::enable_auto_compaction.unset(r);
cf::disable_auto_compaction.unset(r);
cf::get_built_indexes.unset(r);
cf::get_compression_metadata_off_heap_memory_used.unset(r);
cf::get_compression_parameters.unset(r);
cf::get_compression_ratio.unset(r);
cf::get_read_latency_estimated_histogram.unset(r);
cf::get_write_latency_estimated_histogram.unset(r);
cf::set_compaction_strategy_class.unset(r);
cf::get_compaction_strategy_class.unset(r);
cf::set_compression_parameters.unset(r);
cf::set_crc_check_chance.unset(r);
cf::get_sstable_count_per_level.unset(r);
cf::get_sstables_for_key.unset(r);
cf::toppartitions.unset(r);
cf::force_major_compaction.unset(r);
}
}

View File

@@ -14,9 +14,14 @@
#include <seastar/core/future-util.hh>
#include <any>
namespace db {
class system_keyspace;
}
namespace api {
void set_column_family(http_context& ctx, routes& r);
void set_column_family(http_context& ctx, httpd::routes& r, sharded<db::system_keyspace>& sys_ks);
void unset_column_family(http_context& ctx, httpd::routes& r);
const table_id& get_uuid(const sstring& name, const replica::database& db);
future<> foreach_column_family(http_context& ctx, const sstring& name, std::function<void(replica::column_family&)> f);

View File

@@ -13,6 +13,7 @@
#include <vector>
namespace api {
using namespace seastar::httpd;
template<typename T>
static auto acquire_cl_metric(http_context& ctx, std::function<T (db::commitlog*)> func) {

View File

@@ -12,6 +12,6 @@
namespace api {
void set_commitlog(http_context& ctx, routes& r);
void set_commitlog(http_context& ctx, httpd::routes& r);
}

View File

@@ -22,6 +22,7 @@ namespace api {
namespace cm = httpd::compaction_manager_json;
using namespace json;
using namespace seastar::httpd;
static future<json::json_return_type> get_cm_stats(http_context& ctx,
int64_t compaction_manager::stats::*f) {
@@ -41,9 +42,8 @@ static std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_ha
return std::move(a);
}
void set_compaction_manager(http_context& ctx, routes& r) {
cm::get_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
cm::get_compactions.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return ctx.db.map_reduce0([](replica::database& db) {
std::vector<cm::summary> summaries;
const compaction_manager& cm = db.get_compaction_manager();
@@ -65,12 +65,12 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
});
cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.db.map_reduce0([&ctx](replica::database& db) {
return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&ctx, &db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {
return do_for_each(db.get_column_families(), [&tasks](const std::pair<table_id, seastar::lw_shared_ptr<replica::table>>& i) {
cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return ctx.db.map_reduce0([](replica::database& db) {
return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {
return do_for_each(db.get_column_families(), [&tasks](const std::pair<table_id, seastar::lw_shared_ptr<replica::table>>& i) -> future<> {
replica::table& cf = *i.second.get();
tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf.as_table_state());
tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.estimate_pending_compactions();
return make_ready_future<>();
}).then([&tasks] {
return std::move(tasks);
@@ -91,14 +91,14 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
});
cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<request> req) {
cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
// FIXME
warn(unimplemented::cause::API);
return make_ready_future<json::json_return_type>(json_void());
});
cm::stop_compaction.set(r, [&ctx] (std::unique_ptr<request> req) {
cm::stop_compaction.set(r, [&ctx] (std::unique_ptr<http::request> req) {
auto type = req->get_query_param("type");
return ctx.db.invoke_on_all([type] (replica::database& db) {
auto& cm = db.get_compaction_manager();
@@ -108,7 +108,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
});
cm::stop_keyspace_compaction.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
cm::stop_keyspace_compaction.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto ks_name = validate_keyspace(ctx, req->param);
auto table_names = parse_tables(ks_name, ctx, req->query_parameters, "tables");
if (table_names.empty()) {
@@ -119,41 +119,43 @@ void set_compaction_manager(http_context& ctx, routes& r) {
auto& cm = db.get_compaction_manager();
return parallel_for_each(table_names, [&db, &cm, &ks_name, type] (sstring& table_name) {
auto& t = db.find_column_family(ks_name, table_name);
return cm.stop_compaction(type, &t.as_table_state());
return t.parallel_foreach_table_state([&] (compaction::table_state& ts) {
return cm.stop_compaction(type, &ts);
});
});
});
co_return json_void();
});
cm::get_pending_tasks.set(r, [&ctx] (std::unique_ptr<request> req) {
cm::get_pending_tasks.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {
return cf.get_compaction_strategy().estimated_pending_compactions(cf.as_table_state());
return cf.estimate_pending_compactions();
}, std::plus<int64_t>());
});
cm::get_completed_tasks.set(r, [&ctx] (std::unique_ptr<request> req) {
cm::get_completed_tasks.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return get_cm_stats(ctx, &compaction_manager::stats::completed_tasks);
});
cm::get_total_compactions_completed.set(r, [] (std::unique_ptr<request> req) {
cm::get_total_compactions_completed.set(r, [] (std::unique_ptr<http::request> req) {
// FIXME
// We are currently dont have an API for compaction
// so returning a 0 as the number of total compaction is ok
return make_ready_future<json::json_return_type>(0);
});
cm::get_bytes_compacted.set(r, [] (std::unique_ptr<request> req) {
cm::get_bytes_compacted.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
// FIXME
warn(unimplemented::cause::API);
return make_ready_future<json::json_return_type>(0);
});
cm::get_compaction_history.set(r, [] (std::unique_ptr<request> req) {
std::function<future<>(output_stream<char>&&)> f = [](output_stream<char>&& s) {
return do_with(output_stream<char>(std::move(s)), true, [] (output_stream<char>& s, bool& first){
return s.write("[").then([&s, &first] {
return db::system_keyspace::get_compaction_history([&s, &first](const db::system_keyspace::compaction_history_entry& entry) mutable {
cm::get_compaction_history.set(r, [&ctx] (std::unique_ptr<http::request> req) {
std::function<future<>(output_stream<char>&&)> f = [&ctx](output_stream<char>&& s) {
return do_with(output_stream<char>(std::move(s)), true, [&ctx] (output_stream<char>& s, bool& first){
return s.write("[").then([&ctx, &s, &first] {
return ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable {
cm::history h;
h.id = entry.id.to_sstring();
h.ks = std::move(entry.ks);
@@ -183,7 +185,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(std::move(f));
});
cm::get_compaction_info.set(r, [] (std::unique_ptr<request> req) {
cm::get_compaction_info.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
// FIXME
warn(unimplemented::cause::API);

View File

@@ -12,6 +12,6 @@
namespace api {
void set_compaction_manager(http_context& ctx, routes& r);
void set_compaction_manager(http_context& ctx, httpd::routes& r);
}

View File

@@ -13,6 +13,7 @@
#include <boost/algorithm/string/replace.hpp>
namespace api {
using namespace seastar::httpd;
template<class T>
json::json_return_type get_json_return_type(const T& val) {

View File

@@ -13,5 +13,5 @@
namespace api {
void set_config(std::shared_ptr<api_registry_builder20> rb, http_context& ctx, routes& r, const db::config& cfg);
void set_config(std::shared_ptr<httpd::api_registry_builder20> rb, http_context& ctx, httpd::routes& r, const db::config& cfg);
}

View File

@@ -15,6 +15,7 @@
#include "utils/fb_utilities.hh"
namespace api {
using namespace seastar::httpd;
void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_ptr>& snitch) {
static auto host_or_broadcast = [](const_req req) {
@@ -25,10 +26,10 @@ void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_p
httpd::endpoint_snitch_info_json::get_datacenter.set(r, [&ctx](const_req req) {
auto& topology = ctx.shared_token_metadata.local().get()->get_topology();
auto ep = host_or_broadcast(req);
if (!topology.has_endpoint(ep, locator::topology::pending::yes)) {
if (!topology.has_endpoint(ep)) {
// Cannot return error here, nodetool status can race, request
// info about just-left node and not handle it nicely
return sstring(locator::production_snitch_base::default_dc);
return locator::endpoint_dc_rack::default_location.dc;
}
return topology.get_datacenter(ep);
});
@@ -36,10 +37,10 @@ void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_p
httpd::endpoint_snitch_info_json::get_rack.set(r, [&ctx](const_req req) {
auto& topology = ctx.shared_token_metadata.local().get()->get_topology();
auto ep = host_or_broadcast(req);
if (!topology.has_endpoint(ep, locator::topology::pending::yes)) {
if (!topology.has_endpoint(ep)) {
// Cannot return error here, nodetool status can race, request
// info about just-left node and not handle it nicely
return sstring(locator::production_snitch_base::default_rack);
return locator::endpoint_dc_rack::default_location.rack;
}
return topology.get_rack(ep);
});

View File

@@ -16,7 +16,7 @@ class snitch_ptr;
namespace api {
void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_ptr>&);
void unset_endpoint_snitch(http_context& ctx, routes& r);
void set_endpoint_snitch(http_context& ctx, httpd::routes& r, sharded<locator::snitch_ptr>&);
void unset_endpoint_snitch(http_context& ctx, httpd::routes& r);
}

View File

@@ -15,6 +15,7 @@
#include <seastar/core/future-util.hh>
namespace api {
using namespace seastar::httpd;
namespace hf = httpd::error_injection_json;

View File

@@ -12,6 +12,6 @@
namespace api {
void set_error_injection(http_context& ctx, routes& r);
void set_error_injection(http_context& ctx, httpd::routes& r);
}

View File

@@ -8,10 +8,11 @@
#include "failure_detector.hh"
#include "api/api-doc/failure_detector.json.hh"
#include "gms/failure_detector.hh"
#include "gms/application_state.hh"
#include "gms/gossiper.hh"
namespace api {
using namespace seastar::httpd;
namespace fd = httpd::failure_detector_json;
@@ -20,18 +21,18 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
std::vector<fd::endpoint_state> res;
for (auto i : g.get_endpoint_states()) {
fd::endpoint_state val;
val.addrs = boost::lexical_cast<std::string>(i.first);
val.addrs = fmt::to_string(i.first);
val.is_alive = i.second.is_alive();
val.generation = i.second.get_heart_beat_state().get_generation();
val.version = i.second.get_heart_beat_state().get_heart_beat_version();
val.generation = i.second.get_heart_beat_state().get_generation().value();
val.version = i.second.get_heart_beat_state().get_heart_beat_version().value();
val.update_time = i.second.get_update_timestamp().time_since_epoch().count();
for (auto a : i.second.get_application_state_map()) {
fd::version_value version_val;
// We return the enum index and not it's name to stay compatible to origin
// method that the state index are static but the name can be changed.
version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(a.first);
version_val.value = a.second.value;
version_val.version = a.second.version;
version_val.value = a.second.value();
version_val.version = a.second.version().value();
val.application_state.push(version_val);
}
res.push_back(val);
@@ -62,7 +63,9 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
});
fd::set_phi_convict_threshold.set(r, [](std::unique_ptr<request> req) {
double phi = atof(req->get_query_param("phi").c_str());
// TBD
unimplemented();
std::ignore = atof(req->get_query_param("phi").c_str());
return make_ready_future<json::json_return_type>("");
});
@@ -77,15 +80,9 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
});
fd::get_endpoint_phi_values.set(r, [](std::unique_ptr<request> req) {
std::map<gms::inet_address, gms::arrival_window> map;
// We no longer have a phi failure detector,
// just returning the empty value is good enough.
std::vector<fd::endpoint_phi_value> res;
auto now = gms::arrival_window::clk::now();
for (auto& p : map) {
fd::endpoint_phi_value val;
val.endpoint = p.first.to_sstring();
val.phi = p.second.phi(now);
res.emplace_back(std::move(val));
}
return make_ready_future<json::json_return_type>(res);
});
}

View File

@@ -18,6 +18,6 @@ class gossiper;
namespace api {
void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g);
void set_failure_detector(http_context& ctx, httpd::routes& r, gms::gossiper& g);
}

View File

@@ -11,6 +11,7 @@
#include "gms/gossiper.hh"
namespace api {
using namespace seastar::httpd;
using namespace json;
void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {
@@ -19,9 +20,11 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {
return container_to_vec(res);
});
httpd::gossiper_json::get_live_endpoint.set(r, [&g] (const_req req) {
auto res = g.get_live_members();
return container_to_vec(res);
httpd::gossiper_json::get_live_endpoint.set(r, [&g] (std::unique_ptr<request> req) {
return g.get_live_members_synchronized().then([] (auto res) {
return make_ready_future<json::json_return_type>(container_to_vec(res));
});
});
httpd::gossiper_json::get_endpoint_downtime.set(r, [&g] (const_req req) {
@@ -29,21 +32,21 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {
return g.get_endpoint_downtime(ep);
});
httpd::gossiper_json::get_current_generation_number.set(r, [&g] (std::unique_ptr<request> req) {
httpd::gossiper_json::get_current_generation_number.set(r, [&g] (std::unique_ptr<http::request> req) {
gms::inet_address ep(req->param["addr"]);
return g.get_current_generation_number(ep).then([] (int res) {
return make_ready_future<json::json_return_type>(res);
return g.get_current_generation_number(ep).then([] (gms::generation_type res) {
return make_ready_future<json::json_return_type>(res.value());
});
});
httpd::gossiper_json::get_current_heart_beat_version.set(r, [&g] (std::unique_ptr<request> req) {
httpd::gossiper_json::get_current_heart_beat_version.set(r, [&g] (std::unique_ptr<http::request> req) {
gms::inet_address ep(req->param["addr"]);
return g.get_current_heart_beat_version(ep).then([] (int res) {
return make_ready_future<json::json_return_type>(res);
return g.get_current_heart_beat_version(ep).then([] (gms::version_type res) {
return make_ready_future<json::json_return_type>(res.value());
});
});
httpd::gossiper_json::assassinate_endpoint.set(r, [&g](std::unique_ptr<request> req) {
httpd::gossiper_json::assassinate_endpoint.set(r, [&g](std::unique_ptr<http::request> req) {
if (req->get_query_param("unsafe") != "True") {
return g.assassinate_endpoint(req->param["addr"]).then([] {
return make_ready_future<json::json_return_type>(json_void());
@@ -54,7 +57,7 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {
});
});
httpd::gossiper_json::force_remove_endpoint.set(r, [&g](std::unique_ptr<request> req) {
httpd::gossiper_json::force_remove_endpoint.set(r, [&g](std::unique_ptr<http::request> req) {
gms::inet_address ep(req->param["addr"]);
return g.force_remove_endpoint(ep).then([] {
return make_ready_future<json::json_return_type>(json_void());

View File

@@ -18,6 +18,6 @@ class gossiper;
namespace api {
void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g);
void set_gossiper(http_context& ctx, httpd::routes& r, gms::gossiper& g);
}

View File

@@ -19,10 +19,11 @@
namespace api {
using namespace json;
using namespace seastar::httpd;
namespace hh = httpd::hinted_handoff_json;
void set_hinted_handoff(http_context& ctx, routes& r, gms::gossiper& g) {
hh::create_hints_sync_point.set(r, [&ctx, &g] (std::unique_ptr<request> req) -> future<json::json_return_type> {
hh::create_hints_sync_point.set(r, [&ctx, &g] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto parse_hosts_list = [&g] (sstring arg) {
std::vector<sstring> hosts_str = split(arg, ",");
std::vector<gms::inet_address> hosts;
@@ -52,7 +53,7 @@ void set_hinted_handoff(http_context& ctx, routes& r, gms::gossiper& g) {
});
});
hh::get_hints_sync_point.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
hh::get_hints_sync_point.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
db::hints::sync_point sync_point;
const sstring encoded = req->get_query_param("id");
try {
@@ -93,42 +94,42 @@ void set_hinted_handoff(http_context& ctx, routes& r, gms::gossiper& g) {
});
});
hh::list_endpoints_pending_hints.set(r, [] (std::unique_ptr<request> req) {
hh::list_endpoints_pending_hints.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
std::vector<sstring> res;
return make_ready_future<json::json_return_type>(res);
});
hh::truncate_all_hints.set(r, [] (std::unique_ptr<request> req) {
hh::truncate_all_hints.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
sstring host = req->get_query_param("host");
return make_ready_future<json::json_return_type>(json_void());
});
hh::schedule_hint_delivery.set(r, [] (std::unique_ptr<request> req) {
hh::schedule_hint_delivery.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
sstring host = req->get_query_param("host");
return make_ready_future<json::json_return_type>(json_void());
});
hh::pause_hints_delivery.set(r, [] (std::unique_ptr<request> req) {
hh::pause_hints_delivery.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
sstring pause = req->get_query_param("pause");
return make_ready_future<json::json_return_type>(json_void());
});
hh::get_create_hint_count.set(r, [] (std::unique_ptr<request> req) {
hh::get_create_hint_count.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
sstring host = req->get_query_param("host");
return make_ready_future<json::json_return_type>(0);
});
hh::get_not_stored_hints_count.set(r, [] (std::unique_ptr<request> req) {
hh::get_not_stored_hints_count.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
sstring host = req->get_query_param("host");

View File

@@ -18,7 +18,7 @@ class gossiper;
namespace api {
void set_hinted_handoff(http_context& ctx, routes& r, gms::gossiper& g);
void unset_hinted_handoff(http_context& ctx, routes& r);
void set_hinted_handoff(http_context& ctx, httpd::routes& r, gms::gossiper& g);
void unset_hinted_handoff(http_context& ctx, httpd::routes& r);
}

View File

@@ -16,6 +16,7 @@
#include "replica/database.hh"
namespace api {
using namespace seastar::httpd;
static logging::logger alogger("lsa-api");

View File

@@ -12,6 +12,6 @@
namespace api {
void set_lsa(http_context& ctx, routes& r);
void set_lsa(http_context& ctx, httpd::routes& r);
}

View File

@@ -13,6 +13,7 @@
#include <iostream>
#include <sstream>
using namespace seastar::httpd;
using namespace httpd::messaging_service_json;
using namespace netw;
@@ -28,7 +29,7 @@ std::vector<message_counter> map_to_message_counters(
std::vector<message_counter> res;
for (auto i : map) {
res.push_back(message_counter());
res.back().key = boost::lexical_cast<sstring>(i.first);
res.back().key = fmt::to_string(i.first);
res.back().value = i.second;
}
return res;

View File

@@ -14,7 +14,7 @@ namespace netw { class messaging_service; }
namespace api {
void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms);
void unset_messaging_service(http_context& ctx, routes& r);
void set_messaging_service(http_context& ctx, httpd::routes& r, sharded<netw::messaging_service>& ms);
void unset_messaging_service(http_context& ctx, httpd::routes& r);
}

View File

@@ -20,6 +20,7 @@ namespace api {
namespace sp = httpd::storage_proxy_json;
using proxy = service::storage_proxy;
using namespace seastar::httpd;
using namespace json;
utils::time_estimated_histogram timed_rate_moving_average_summary_merge(utils::time_estimated_histogram a, const utils::timed_rate_moving_average_summary_and_histogram& b) {
@@ -184,75 +185,75 @@ sum_timer_stats_storage_proxy(distributed<proxy>& d,
}
void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_service>& ss) {
sp::get_total_hints.set(r, [](std::unique_ptr<request> req) {
sp::get_total_hints.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req) {
const auto& filter = service::get_storage_proxy().local().get_hints_host_filter();
sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<http::request> req) {
const auto& filter = ctx.sp.local().get_hints_host_filter();
return make_ready_future<json::json_return_type>(!filter.is_disabled_for_all());
});
sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
sp::set_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto enable = req->get_query_param("enable");
auto filter = (enable == "true" || enable == "1")
? db::hints::host_filter(db::hints::host_filter::enabled_for_all_tag {})
: db::hints::host_filter(db::hints::host_filter::disabled_for_all_tag {});
return service::get_storage_proxy().invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {
return ctx.sp.invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {
return sp.change_hints_host_filter(filter);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
sp::get_hinted_handoff_enabled_by_dc.set(r, [](std::unique_ptr<request> req) {
sp::get_hinted_handoff_enabled_by_dc.set(r, [&ctx](std::unique_ptr<http::request> req) {
std::vector<sstring> res;
const auto& filter = service::get_storage_proxy().local().get_hints_host_filter();
const auto& filter = ctx.sp.local().get_hints_host_filter();
const auto& dcs = filter.get_dcs();
res.reserve(res.size());
std::copy(dcs.begin(), dcs.end(), std::back_inserter(res));
return make_ready_future<json::json_return_type>(res);
});
sp::set_hinted_handoff_enabled_by_dc_list.set(r, [](std::unique_ptr<request> req) {
sp::set_hinted_handoff_enabled_by_dc_list.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto dcs = req->get_query_param("dcs");
auto filter = db::hints::host_filter::parse_from_dc_list(std::move(dcs));
return service::get_storage_proxy().invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {
return ctx.sp.invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {
return sp.change_hints_host_filter(filter);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
sp::get_max_hint_window.set(r, [](std::unique_ptr<request> req) {
sp::get_max_hint_window.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::set_max_hint_window.set(r, [](std::unique_ptr<request> req) {
sp::set_max_hint_window.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("ms");
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_max_hints_in_progress.set(r, [](std::unique_ptr<request> req) {
sp::get_max_hints_in_progress.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(1);
});
sp::set_max_hints_in_progress.set(r, [](std::unique_ptr<request> req) {
sp::set_max_hints_in_progress.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("qs");
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_hints_in_progress.set(r, [](std::unique_ptr<request> req) {
sp::get_hints_in_progress.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
@@ -262,7 +263,7 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se
return ctx.db.local().get_config().request_timeout_in_ms()/1000.0;
});
sp::set_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
sp::set_rpc_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
@@ -273,7 +274,7 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se
return ctx.db.local().get_config().read_request_timeout_in_ms()/1000.0;
});
sp::set_read_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
sp::set_read_rpc_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
@@ -284,7 +285,7 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se
return ctx.db.local().get_config().write_request_timeout_in_ms()/1000.0;
});
sp::set_write_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
sp::set_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
@@ -295,7 +296,7 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se
return ctx.db.local().get_config().counter_write_request_timeout_in_ms()/1000.0;
});
sp::set_counter_write_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
sp::set_counter_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
@@ -306,7 +307,7 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se
return ctx.db.local().get_config().cas_contention_timeout_in_ms()/1000.0;
});
sp::set_cas_contention_timeout.set(r, [](std::unique_ptr<request> req) {
sp::set_cas_contention_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
@@ -317,7 +318,7 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se
return ctx.db.local().get_config().range_request_timeout_in_ms()/1000.0;
});
sp::set_range_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
sp::set_range_rpc_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
@@ -328,32 +329,32 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se
return ctx.db.local().get_config().truncate_request_timeout_in_ms()/1000.0;
});
sp::set_truncate_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
sp::set_truncate_rpc_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
return make_ready_future<json::json_return_type>(json_void());
});
sp::reload_trigger_classes.set(r, [](std::unique_ptr<request> req) {
sp::reload_trigger_classes.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_read_repair_attempted.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_repair_attempted.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_attempts);
});
sp::get_read_repair_repaired_blocking.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_repair_repaired_blocking.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_repaired_blocking);
});
sp::get_read_repair_repaired_background.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_repair_repaired_background.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_repaired_background);
});
sp::get_schema_versions.set(r, [&ss](std::unique_ptr<request> req) {
sp::get_schema_versions.set(r, [&ss](std::unique_ptr<http::request> req) {
return ss.local().describe_schema_versions().then([] (auto result) {
std::vector<sp::mapper_list> res;
for (auto e : result) {
@@ -366,122 +367,122 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se
});
});
sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_timeouts);
});
sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_unavailables);
});
sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_timeouts);
});
sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_unavailables);
});
sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_write_unfinished_commit);
});
sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::cas_write_contention);
});
sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_write_condition_not_met);
});
sp::get_cas_write_metrics_failed_read_round_optimization.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_cas_write_metrics_failed_read_round_optimization.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_failed_read_round_optimization);
});
sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_read_unfinished_commit);
});
sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::cas_read_contention);
});
sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::read_timeouts);
});
sp::get_read_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_metrics_unavailables.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::read_unavailables);
});
sp::get_range_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_range_metrics_timeouts.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::range_slice_timeouts);
});
sp::get_range_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_range_metrics_unavailables.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::range_slice_unavailables);
});
sp::get_write_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_write_metrics_timeouts.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::write_timeouts);
});
sp::get_write_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_write_metrics_unavailables.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::write_unavailables);
});
sp::get_read_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::read_timeouts);
});
sp::get_read_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::read_unavailables);
});
sp::get_range_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_range_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::range_slice_timeouts);
});
sp::get_range_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_range_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::range_slice_unavailables);
});
sp::get_write_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_write_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::write_timeouts);
});
sp::get_write_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_write_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::write_unavailables);
});
sp::get_range_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_range_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);
});
sp::get_write_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_write_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::write);
});
sp::get_read_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read);
});
sp::get_range_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_range_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);
});
sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::write);
});
sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::cas_write);
});
sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::cas_read);
});
sp::get_view_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_view_write_metrics_latency_histogram.set(r, [](std::unique_ptr<http::request> req) {
//TBD
// FIXME
// No View metrics are available, so just return empty moving average
@@ -489,32 +490,101 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se
return make_ready_future<json::json_return_type>(get_empty_moving_average());
});
sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read);
});
sp::get_read_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::read);
});
sp::get_read_latency.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_latency.set(r, [&ctx](std::unique_ptr<http::request> req) {
return total_latency(ctx, &service::storage_proxy_stats::stats::read);
});
sp::get_write_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_write_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::write);
});
sp::get_write_latency.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_write_latency.set(r, [&ctx](std::unique_ptr<http::request> req) {
return total_latency(ctx, &service::storage_proxy_stats::stats::write);
});
sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {
return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);
});
sp::get_range_latency.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_range_latency.set(r, [&ctx](std::unique_ptr<http::request> req) {
return total_latency(ctx, &service::storage_proxy_stats::stats::range);
});
}
void unset_storage_proxy(http_context& ctx, routes& r) {
sp::get_total_hints.unset(r);
sp::get_hinted_handoff_enabled.unset(r);
sp::set_hinted_handoff_enabled.unset(r);
sp::get_hinted_handoff_enabled_by_dc.unset(r);
sp::set_hinted_handoff_enabled_by_dc_list.unset(r);
sp::get_max_hint_window.unset(r);
sp::set_max_hint_window.unset(r);
sp::get_max_hints_in_progress.unset(r);
sp::set_max_hints_in_progress.unset(r);
sp::get_hints_in_progress.unset(r);
sp::get_rpc_timeout.unset(r);
sp::set_rpc_timeout.unset(r);
sp::get_read_rpc_timeout.unset(r);
sp::set_read_rpc_timeout.unset(r);
sp::get_write_rpc_timeout.unset(r);
sp::set_write_rpc_timeout.unset(r);
sp::get_counter_write_rpc_timeout.unset(r);
sp::set_counter_write_rpc_timeout.unset(r);
sp::get_cas_contention_timeout.unset(r);
sp::set_cas_contention_timeout.unset(r);
sp::get_range_rpc_timeout.unset(r);
sp::set_range_rpc_timeout.unset(r);
sp::get_truncate_rpc_timeout.unset(r);
sp::set_truncate_rpc_timeout.unset(r);
sp::reload_trigger_classes.unset(r);
sp::get_read_repair_attempted.unset(r);
sp::get_read_repair_repaired_blocking.unset(r);
sp::get_read_repair_repaired_background.unset(r);
sp::get_schema_versions.unset(r);
sp::get_cas_read_timeouts.unset(r);
sp::get_cas_read_unavailables.unset(r);
sp::get_cas_write_timeouts.unset(r);
sp::get_cas_write_unavailables.unset(r);
sp::get_cas_write_metrics_unfinished_commit.unset(r);
sp::get_cas_write_metrics_contention.unset(r);
sp::get_cas_write_metrics_condition_not_met.unset(r);
sp::get_cas_write_metrics_failed_read_round_optimization.unset(r);
sp::get_cas_read_metrics_unfinished_commit.unset(r);
sp::get_cas_read_metrics_contention.unset(r);
sp::get_read_metrics_timeouts.unset(r);
sp::get_read_metrics_unavailables.unset(r);
sp::get_range_metrics_timeouts.unset(r);
sp::get_range_metrics_unavailables.unset(r);
sp::get_write_metrics_timeouts.unset(r);
sp::get_write_metrics_unavailables.unset(r);
sp::get_read_metrics_timeouts_rates.unset(r);
sp::get_read_metrics_unavailables_rates.unset(r);
sp::get_range_metrics_timeouts_rates.unset(r);
sp::get_range_metrics_unavailables_rates.unset(r);
sp::get_write_metrics_timeouts_rates.unset(r);
sp::get_write_metrics_unavailables_rates.unset(r);
sp::get_range_metrics_latency_histogram_depricated.unset(r);
sp::get_write_metrics_latency_histogram_depricated.unset(r);
sp::get_read_metrics_latency_histogram_depricated.unset(r);
sp::get_range_metrics_latency_histogram.unset(r);
sp::get_write_metrics_latency_histogram.unset(r);
sp::get_cas_write_metrics_latency_histogram.unset(r);
sp::get_cas_read_metrics_latency_histogram.unset(r);
sp::get_view_write_metrics_latency_histogram.unset(r);
sp::get_read_metrics_latency_histogram.unset(r);
sp::get_read_estimated_histogram.unset(r);
sp::get_read_latency.unset(r);
sp::get_write_estimated_histogram.unset(r);
sp::get_write_latency.unset(r);
sp::get_range_estimated_histogram.unset(r);
sp::get_range_latency.unset(r);
}
}

View File

@@ -15,6 +15,7 @@ namespace service { class storage_service; }
namespace api {
void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_service>& ss);
void set_storage_proxy(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss);
void unset_storage_proxy(http_context& ctx, httpd::routes& r);
}

File diff suppressed because it is too large Load Diff

View File

@@ -8,6 +8,8 @@
#pragma once
#include <iostream>
#include <seastar/core/sharded.hh>
#include "api.hh"
#include "db/data_listeners.hh"
@@ -34,28 +36,52 @@ class gossiper;
namespace api {
// verify that the keyspace is found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective keyspace error.
sstring validate_keyspace(http_context& ctx, sstring ks_name);
// verify that the keyspace parameter is found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective keyspace error.
sstring validate_keyspace(http_context& ctx, const parameters& param);
sstring validate_keyspace(http_context& ctx, const httpd::parameters& param);
// splits a request parameter assumed to hold a comma-separated list of table names
// verify that the tables are found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective no_such_column_family error.
// Returns an empty vector if no parameter was found.
// If the parameter is found and empty, returns a list of all table names in the keyspace.
std::vector<sstring> parse_tables(const sstring& ks_name, http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);
void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_service>& ss, gms::gossiper& g, sharded<cdc::generation_service>& cdc_gs, sharded<db::system_keyspace>& sys_ls);
void set_sstables_loader(http_context& ctx, routes& r, sharded<sstables_loader>& sst_loader);
void unset_sstables_loader(http_context& ctx, routes& r);
void set_view_builder(http_context& ctx, routes& r, sharded<db::view::view_builder>& vb);
void unset_view_builder(http_context& ctx, routes& r);
void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair);
void unset_repair(http_context& ctx, routes& r);
void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl);
void unset_transport_controller(http_context& ctx, routes& r);
void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl);
void unset_rpc_controller(http_context& ctx, routes& r);
void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_ctl);
void unset_snapshot(http_context& ctx, routes& r);
struct table_info {
sstring name;
table_id id;
};
// splits a request parameter assumed to hold a comma-separated list of table names
// verify that the tables are found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective no_such_column_family error.
// Returns a vector of all table infos given by the parameter, or
// if the parameter is not found or is empty, returns a list of all table infos in the keyspace.
std::vector<table_info> parse_table_infos(const sstring& ks_name, http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);
void set_storage_service(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss, gms::gossiper& g, sharded<cdc::generation_service>& cdc_gs, sharded<db::system_keyspace>& sys_ls);
void set_sstables_loader(http_context& ctx, httpd::routes& r, sharded<sstables_loader>& sst_loader);
void unset_sstables_loader(http_context& ctx, httpd::routes& r);
void set_view_builder(http_context& ctx, httpd::routes& r, sharded<db::view::view_builder>& vb);
void unset_view_builder(http_context& ctx, httpd::routes& r);
void set_repair(http_context& ctx, httpd::routes& r, sharded<repair_service>& repair);
void unset_repair(http_context& ctx, httpd::routes& r);
void set_transport_controller(http_context& ctx, httpd::routes& r, cql_transport::controller& ctl);
void unset_transport_controller(http_context& ctx, httpd::routes& r);
void set_rpc_controller(http_context& ctx, httpd::routes& r, thrift_controller& ctl);
void unset_rpc_controller(http_context& ctx, httpd::routes& r);
void set_snapshot(http_context& ctx, httpd::routes& r, sharded<db::snapshot_ctl>& snap_ctl);
void unset_snapshot(http_context& ctx, httpd::routes& r);
seastar::future<json::json_return_type> run_toppartitions_query(db::toppartitions_query& q, http_context &ctx, bool legacy_request = false);
}
} // namespace api
namespace std {
std::ostream& operator<<(std::ostream& os, const api::table_info& ti);
} // namespace std

View File

@@ -14,6 +14,7 @@
#include "gms/gossiper.hh"
namespace api {
using namespace seastar::httpd;
namespace hs = httpd::stream_manager_json;
@@ -21,7 +22,7 @@ static void set_summaries(const std::vector<streaming::stream_summary>& from,
json::json_list<hs::stream_summary>& to) {
if (!from.empty()) {
hs::stream_summary res;
res.cf_id = boost::lexical_cast<std::string>(from.front().cf_id);
res.cf_id = fmt::to_string(from.front().cf_id);
// For each stream_session, we pretend we are sending/receiving one
// file, to make it compatible with nodetool.
res.files = 1;
@@ -38,7 +39,7 @@ static hs::progress_info get_progress_info(const streaming::progress_info& info)
res.current_bytes = info.current_bytes;
res.direction = info.dir;
res.file_name = info.file_name;
res.peer = boost::lexical_cast<std::string>(info.peer);
res.peer = fmt::to_string(info.peer);
res.session_index = 0;
res.total_bytes = info.total_bytes;
return res;
@@ -61,7 +62,7 @@ static hs::stream_state get_state(
state.plan_id = result_future.plan_id.to_sstring();
for (auto info : result_future.get_coordinator().get()->get_all_session_info()) {
hs::stream_info si;
si.peer = boost::lexical_cast<std::string>(info.peer);
si.peer = fmt::to_string(info.peer);
si.session_index = 0;
si.state = info.state;
si.connecting = si.peer;

View File

@@ -12,7 +12,7 @@
namespace api {
void set_stream_manager(http_context& ctx, routes& r, sharded<streaming::stream_manager>& sm);
void unset_stream_manager(http_context& ctx, routes& r);
void set_stream_manager(http_context& ctx, httpd::routes& r, sharded<streaming::stream_manager>& sm);
void unset_stream_manager(http_context& ctx, httpd::routes& r);
}

View File

@@ -17,6 +17,7 @@
extern logging::logger apilog;
namespace api {
using namespace seastar::httpd;
namespace hs = httpd::system_json;

View File

@@ -12,6 +12,6 @@
namespace api {
void set_system(http_context& ctx, routes& r);
void set_system(http_context& ctx, httpd::routes& r);
}

View File

@@ -22,6 +22,7 @@ namespace api {
namespace tm = httpd::task_manager_json;
using namespace json;
using namespace seastar::httpd;
inline bool filter_tasks(tasks::task_manager::task_ptr task, std::unordered_map<sstring, sstring>& query_params) {
return (!query_params.contains("keyspace") || query_params["keyspace"] == task->get_status().keyspace) &&
@@ -30,17 +31,32 @@ inline bool filter_tasks(tasks::task_manager::task_ptr task, std::unordered_map<
struct full_task_status {
tasks::task_manager::task::status task_status;
std::string type;
tasks::task_manager::task::progress progress;
std::string module;
tasks::task_id parent_id;
tasks::is_abortable abortable;
std::vector<std::string> children_ids;
};
struct task_stats {
task_stats(tasks::task_manager::task_ptr task) : task_id(task->id().to_sstring()), state(task->get_status().state) {}
task_stats(tasks::task_manager::task_ptr task)
: task_id(task->id().to_sstring())
, state(task->get_status().state)
, type(task->type())
, keyspace(task->get_status().keyspace)
, table(task->get_status().table)
, entity(task->get_status().entity)
, sequence_number(task->get_status().sequence_number)
{ }
sstring task_id;
tasks::task_manager::task_state state;
std::string type;
std::string keyspace;
std::string table;
std::string entity;
uint64_t sequence_number;
};
tm::task_status make_status(full_task_status status) {
@@ -52,7 +68,7 @@ tm::task_status make_status(full_task_status status) {
tm::task_status res{};
res.id = status.task_status.id.to_sstring();
res.type = status.task_status.type;
res.type = status.type;
res.state = status.task_status.state;
res.is_abortable = bool(status.abortable);
res.start_time = st;
@@ -67,37 +83,45 @@ tm::task_status make_status(full_task_status status) {
res.progress_units = status.task_status.progress_units;
res.progress_total = status.progress.total;
res.progress_completed = status.progress.completed;
res.children_ids = std::move(status.children_ids);
return res;
}
future<json::json_return_type> retrieve_status(tasks::task_manager::foreign_task_ptr task) {
future<full_task_status> retrieve_status(const tasks::task_manager::foreign_task_ptr& task) {
if (task.get() == nullptr) {
co_return coroutine::return_exception(httpd::bad_param_exception("Task not found"));
}
auto progress = co_await task->get_progress();
full_task_status s;
s.task_status = task->get_status();
s.type = task->type();
s.parent_id = task->get_parent_id();
s.abortable = task->is_abortable();
s.module = task->get_module_name();
s.progress.completed = progress.completed;
s.progress.total = progress.total;
co_return make_status(s);
std::vector<std::string> ct{task->get_children().size()};
boost::transform(task->get_children(), ct.begin(), [] (const auto& child) {
return child->id().to_sstring();
});
s.children_ids = std::move(ct);
co_return s;
}
void set_task_manager(http_context& ctx, routes& r) {
tm::get_modules.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
void set_task_manager(http_context& ctx, routes& r, db::config& cfg) {
tm::get_modules.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
std::vector<std::string> v = boost::copy_range<std::vector<std::string>>(ctx.tm.local().get_modules() | boost::adaptors::map_keys);
co_return v;
});
tm::get_tasks.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
tm::get_tasks.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
using chunked_stats = utils::chunked_vector<task_stats>;
std::vector<chunked_stats> res = co_await ctx.tm.map([&req] (tasks::task_manager& tm) {
auto internal = tasks::is_internal{req_param<bool>(*req, "internal", false)};
std::vector<chunked_stats> res = co_await ctx.tm.map([&req, internal] (tasks::task_manager& tm) {
chunked_stats local_res;
auto module = tm.find_module(req->param["module"]);
const auto& filtered_tasks = module->get_tasks() | boost::adaptors::filtered([&params = req->query_parameters] (const auto& task) {
return filter_tasks(task.second, params);
const auto& filtered_tasks = module->get_tasks() | boost::adaptors::filtered([&params = req->query_parameters, internal] (const auto& task) {
return (internal || !task.second->is_internal()) && filter_tasks(task.second, params);
});
for (auto& [task_id, task] : filtered_tasks) {
local_res.push_back(task_stats{task});
@@ -124,7 +148,7 @@ void set_task_manager(http_context& ctx, routes& r) {
co_return std::move(f);
});
tm::get_task_status.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
tm::get_task_status.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
auto task = co_await tasks::task_manager::invoke_on_task(ctx.tm, id, std::function([] (tasks::task_manager::task_ptr task) -> future<tasks::task_manager::foreign_task_ptr> {
auto state = task->get_status().state;
@@ -133,10 +157,11 @@ void set_task_manager(http_context& ctx, routes& r) {
}
co_return std::move(task);
}));
co_return co_await retrieve_status(std::move(task));
auto s = co_await retrieve_status(task);
co_return make_status(s);
});
tm::abort_task.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
tm::abort_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
co_await tasks::task_manager::invoke_on_task(ctx.tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {
if (!task->is_abortable()) {
@@ -147,7 +172,7 @@ void set_task_manager(http_context& ctx, routes& r) {
co_return json_void();
});
tm::wait_task.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
tm::wait_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
auto task = co_await tasks::task_manager::invoke_on_task(ctx.tm, id, std::function([] (tasks::task_manager::task_ptr task) {
return task->done().then_wrapped([task] (auto f) {
@@ -156,7 +181,55 @@ void set_task_manager(http_context& ctx, routes& r) {
return make_foreign(task);
});
}));
co_return co_await retrieve_status(std::move(task));
auto s = co_await retrieve_status(task);
co_return make_status(s);
});
tm::get_task_status_recursively.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& _ctx = ctx;
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
std::queue<tasks::task_manager::foreign_task_ptr> q;
utils::chunked_vector<full_task_status> res;
// Get requested task.
auto task = co_await tasks::task_manager::invoke_on_task(_ctx.tm, id, std::function([] (tasks::task_manager::task_ptr task) -> future<tasks::task_manager::foreign_task_ptr> {
auto state = task->get_status().state;
if (state == tasks::task_manager::task_state::done || state == tasks::task_manager::task_state::failed) {
task->unregister_task();
}
co_return task;
}));
// Push children's statuses in BFS order.
q.push(co_await task.copy()); // Task cannot be moved since we need it to be alive during whole loop execution.
while (!q.empty()) {
auto& current = q.front();
res.push_back(co_await retrieve_status(current));
for (size_t i = 0; i < current->get_children().size(); ++i) {
q.push(co_await current->get_children()[i].copy());
}
q.pop();
}
std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {
auto s = std::move(os);
auto res = std::move(r);
co_await s.write("[");
std::string delim = "";
for (auto& status: res) {
co_await s.write(std::exchange(delim, ", "));
co_await formatter::write(s, make_status(status));
}
co_await s.write("]");
co_await s.close();
};
co_return f;
});
tm::get_and_update_ttl.set(r, [&cfg] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
uint32_t ttl = cfg.task_ttl_seconds();
co_await cfg.task_ttl_seconds.set_value_on_all_shards(req->query_parameters["ttl"], utils::config_file::config_source::API);
co_return json::json_return_type(ttl);
});
}

View File

@@ -9,9 +9,10 @@
#pragma once
#include "api.hh"
#include "db/config.hh"
namespace api {
void set_task_manager(http_context& ctx, routes& r);
void set_task_manager(http_context& ctx, httpd::routes& r, db::config& cfg);
}

View File

@@ -18,9 +18,10 @@ namespace api {
namespace tmt = httpd::task_manager_test_json;
using namespace json;
using namespace seastar::httpd;
void set_task_manager_test(http_context& ctx, routes& r, db::config& cfg) {
tmt::register_test_module.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
void set_task_manager_test(http_context& ctx, routes& r) {
tmt::register_test_module.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
co_await ctx.tm.invoke_on_all([] (tasks::task_manager& tm) {
auto m = make_shared<tasks::test_module>(tm);
tm.register_module("test", m);
@@ -28,7 +29,7 @@ void set_task_manager_test(http_context& ctx, routes& r, db::config& cfg) {
co_return json_void();
});
tmt::unregister_test_module.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
tmt::unregister_test_module.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
co_await ctx.tm.invoke_on_all([] (tasks::task_manager& tm) -> future<> {
auto module_name = "test";
auto module = tm.find_module(module_name);
@@ -37,7 +38,7 @@ void set_task_manager_test(http_context& ctx, routes& r, db::config& cfg) {
co_return json_void();
});
tmt::register_test_task.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
tmt::register_test_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
sharded<tasks::task_manager>& tms = ctx.tm;
auto it = req->query_parameters.find("task_id");
auto id = it != req->query_parameters.end() ? tasks::task_id{utils::UUID{it->second}} : tasks::task_id::create_null_id();
@@ -47,12 +48,10 @@ void set_task_manager_test(http_context& ctx, routes& r, db::config& cfg) {
std::string keyspace = it != req->query_parameters.end() ? it->second : "";
it = req->query_parameters.find("table");
std::string table = it != req->query_parameters.end() ? it->second : "";
it = req->query_parameters.find("type");
std::string type = it != req->query_parameters.end() ? it->second : "";
it = req->query_parameters.find("entity");
std::string entity = it != req->query_parameters.end() ? it->second : "";
it = req->query_parameters.find("parent_id");
tasks::task_manager::parent_data data;
tasks::task_info data;
if (it != req->query_parameters.end()) {
data.id = tasks::task_id{utils::UUID{it->second}};
auto parent_ptr = co_await tasks::task_manager::lookup_task_on_all_shards(ctx.tm, data.id);
@@ -60,7 +59,7 @@ void set_task_manager_test(http_context& ctx, routes& r, db::config& cfg) {
}
auto module = tms.local().find_module("test");
id = co_await module->make_task<tasks::test_task_impl>(shard, id, keyspace, table, type, entity, data);
id = co_await module->make_task<tasks::test_task_impl>(shard, id, keyspace, table, entity, data);
co_await tms.invoke_on(shard, [id] (tasks::task_manager& tm) {
auto it = tm.get_all_tasks().find(id);
if (it != tm.get_all_tasks().end()) {
@@ -70,7 +69,7 @@ void set_task_manager_test(http_context& ctx, routes& r, db::config& cfg) {
co_return id.to_sstring();
});
tmt::unregister_test_task.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
tmt::unregister_test_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->query_parameters["task_id"]}};
co_await tasks::task_manager::invoke_on_task(ctx.tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {
tasks::test_task test_task{task};
@@ -79,7 +78,7 @@ void set_task_manager_test(http_context& ctx, routes& r, db::config& cfg) {
co_return json_void();
});
tmt::finish_test_task.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {
tmt::finish_test_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};
auto it = req->query_parameters.find("error");
bool fail = it != req->query_parameters.end();
@@ -96,12 +95,6 @@ void set_task_manager_test(http_context& ctx, routes& r, db::config& cfg) {
});
co_return json_void();
});
tmt::get_and_update_ttl.set(r, [&ctx, &cfg] (std::unique_ptr<request> req) -> future<json::json_return_type> {
uint32_t ttl = cfg.task_ttl_seconds();
cfg.task_ttl_seconds.set(boost::lexical_cast<uint32_t>(req->query_parameters["ttl"]));
co_return json::json_return_type(ttl);
});
}
}

View File

@@ -11,11 +11,10 @@
#pragma once
#include "api.hh"
#include "db/config.hh"
namespace api {
void set_task_manager_test(http_context& ctx, routes& r, db::config& cfg);
void set_task_manager_test(http_context& ctx, httpd::routes& r);
}

35
auth/CMakeLists.txt Normal file
View File

@@ -0,0 +1,35 @@
include(add_whole_archive)
add_library(scylla_auth STATIC)
target_sources(scylla_auth
PRIVATE
allow_all_authenticator.cc
allow_all_authorizer.cc
authenticated_user.cc
authenticator.cc
common.cc
default_authorizer.cc
password_authenticator.cc
passwords.cc
permission.cc
permissions_cache.cc
resource.cc
role_or_anonymous.cc
roles-metadata.cc
sasl_challenge.cc
service.cc
standard_role_manager.cc
transitional.cc)
target_include_directories(scylla_auth
PUBLIC
${CMAKE_SOURCE_DIR})
target_link_libraries(scylla_auth
PUBLIC
Seastar::seastar
xxHash::xxhash
PRIVATE
cql3
idl
wasmtime_bindings)
add_whole_archive(auth scylla_auth)

View File

@@ -10,24 +10,12 @@
#include "auth/authenticated_user.hh"
#include <iostream>
namespace auth {
authenticated_user::authenticated_user(std::string_view name)
: name(sstring(name)) {
}
std::ostream& operator<<(std::ostream& os, const authenticated_user& u) {
if (!u.name) {
os << "anonymous";
} else {
os << *u.name;
}
return os;
}
static const authenticated_user the_anonymous_user{};
const authenticated_user& anonymous_user() noexcept {

View File

@@ -12,7 +12,6 @@
#include <string_view>
#include <functional>
#include <iosfwd>
#include <optional>
#include <seastar/core/sstring.hh>
@@ -38,11 +37,6 @@ public:
explicit authenticated_user(std::string_view name);
};
///
/// The user name, or "anonymous".
///
std::ostream& operator<<(std::ostream&, const authenticated_user&);
inline bool operator==(const authenticated_user& u1, const authenticated_user& u2) noexcept {
return u1.name == u2.name;
}
@@ -59,6 +53,21 @@ inline bool is_anonymous(const authenticated_user& u) noexcept {
}
///
/// The user name, or "anonymous".
///
template <>
struct fmt::formatter<auth::authenticated_user> : fmt::formatter<std::string_view> {
template <typename FormatContext>
auto format(const auth::authenticated_user& u, FormatContext& ctx) const {
if (u.name) {
return fmt::format_to(ctx.out(), "{}", *u.name);
} else {
return fmt::format_to(ctx.out(), "{}", "anonymous");
}
}
};
namespace std {
template <>

View File

@@ -1,24 +0,0 @@
/*
* Copyright (C) 2018-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include "auth/authentication_options.hh"
#include <iostream>
namespace auth {
std::ostream& operator<<(std::ostream& os, authentication_option a) {
switch (a) {
case authentication_option::password: os << "PASSWORD"; break;
case authentication_option::options: os << "OPTIONS"; break;
}
return os;
}
}

View File

@@ -26,8 +26,6 @@ enum class authentication_option {
options
};
std::ostream& operator<<(std::ostream&, authentication_option);
using authentication_option_set = std::unordered_set<authentication_option>;
using custom_options = std::unordered_map<sstring, sstring>;
@@ -49,3 +47,18 @@ public:
};
}
template <>
struct fmt::formatter<auth::authentication_option> : fmt::formatter<std::string_view> {
template <typename FormatContext>
auto format(const auth::authentication_option a, FormatContext& ctx) const {
using enum auth::authentication_option;
switch (a) {
case password:
return formatter<std::string_view>::format("PASSWORD", ctx);
case options:
return formatter<std::string_view>::format("OPTIONS", ctx);
}
std::abort();
}
};

View File

@@ -14,7 +14,7 @@
#include "cql3/query_processor.hh"
#include "cql3/statements/create_table_statement.hh"
#include "replica/database.hh"
#include "schema_builder.hh"
#include "schema/schema_builder.hh"
#include "service/migration_manager.hh"
#include "timeout_config.hh"

View File

@@ -30,8 +30,6 @@ namespace replica {
class database;
}
class timeout_config;
namespace service {
class migration_manager;
}

View File

@@ -74,7 +74,7 @@ future<bool> default_authorizer::any_granted() const {
query,
db::consistency_level::LOCAL_ONE,
{},
cql3::query_processor::cache_internal::yes).then([this](::shared_ptr<cql3::untyped_result_set> results) {
cql3::query_processor::cache_internal::yes).then([](::shared_ptr<cql3::untyped_result_set> results) {
return !results->empty();
});
}

View File

@@ -18,7 +18,7 @@ extern "C" {
namespace auth::passwords {
static thread_local crypt_data tlcrypt = { 0, };
static thread_local crypt_data tlcrypt = {};
namespace detail {

View File

@@ -21,7 +21,8 @@ const auth::permission_set auth::permissions::ALL = auth::permission_set::of<
auth::permission::SELECT,
auth::permission::MODIFY,
auth::permission::AUTHORIZE,
auth::permission::DESCRIBE>();
auth::permission::DESCRIBE,
auth::permission::EXECUTE>();
const auth::permission_set auth::permissions::NONE;
@@ -34,7 +35,8 @@ static const std::unordered_map<sstring, auth::permission> permission_names({
{"SELECT", auth::permission::SELECT},
{"MODIFY", auth::permission::MODIFY},
{"AUTHORIZE", auth::permission::AUTHORIZE},
{"DESCRIBE", auth::permission::DESCRIBE}});
{"DESCRIBE", auth::permission::DESCRIBE},
{"EXECUTE", auth::permission::EXECUTE}});
const sstring& auth::permissions::to_string(permission p) {
for (auto& v : permission_names) {

View File

@@ -38,6 +38,8 @@ enum class permission {
AUTHORIZE, // required for GRANT and REVOKE.
DESCRIBE, // required on the root-level role resource to list all roles.
// function/aggregate/procedure calls
EXECUTE,
};
typedef enum_set<
@@ -51,7 +53,8 @@ typedef enum_set<
permission::SELECT,
permission::MODIFY,
permission::AUTHORIZE,
permission::DESCRIBE>> permission_set;
permission::DESCRIBE,
permission::EXECUTE>> permission_set;
bool operator<(const permission_set&, const permission_set&);

View File

@@ -16,30 +16,26 @@
#include <boost/algorithm/string/join.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include "service/storage_proxy.hh"
#include "data_dictionary/user_types_metadata.hh"
#include "cql3/util.hh"
#include "db/marshal/type_parser.hh"
namespace auth {
std::ostream& operator<<(std::ostream& os, resource_kind kind) {
switch (kind) {
case resource_kind::data: os << "data"; break;
case resource_kind::role: os << "role"; break;
case resource_kind::service_level: os << "service_level"; break;
}
return os;
}
static const std::unordered_map<resource_kind, std::string_view> roots{
{resource_kind::data, "data"},
{resource_kind::role, "roles"},
{resource_kind::service_level, "service_levels"}};
{resource_kind::service_level, "service_levels"},
{resource_kind::functions, "functions"}};
static const std::unordered_map<resource_kind, std::size_t> max_parts{
{resource_kind::data, 2},
{resource_kind::role, 1},
{resource_kind::service_level, 0}};
{resource_kind::service_level, 0},
{resource_kind::functions, 2}};
static permission_set applicable_permissions(const data_resource_view& dv) {
if (dv.table()) {
@@ -82,6 +78,15 @@ static permission_set applicable_permissions(const service_level_resource_view &
permission::AUTHORIZE>();
}
static permission_set applicable_permissions(const functions_resource_view& fv) {
return permission_set::of<
permission::CREATE,
permission::ALTER,
permission::DROP,
permission::AUTHORIZE,
permission::EXECUTE>();
}
resource::resource(resource_kind kind) : _kind(kind) {
_parts.emplace_back(roots.at(kind));
}
@@ -106,6 +111,31 @@ resource::resource(role_resource_t, std::string_view role) : resource(resource_k
resource::resource(service_level_resource_t): resource(resource_kind::service_level) {
}
resource::resource(functions_resource_t) : resource(resource_kind::functions) {
}
resource::resource(functions_resource_t, std::string_view keyspace) : resource(resource_kind::functions) {
_parts.emplace_back(keyspace);
}
resource::resource(functions_resource_t, std::string_view keyspace, std::string_view function_signature) : resource(resource_kind::functions) {
_parts.emplace_back(keyspace);
_parts.emplace_back(function_signature);
}
resource::resource(functions_resource_t, std::string_view keyspace, std::string_view function_name, std::vector<::shared_ptr<cql3::cql3_type::raw>> function_args) : resource(resource_kind::functions) {
_parts.emplace_back(keyspace);
_parts.emplace_back(function_name);
if (function_args.empty()) {
_parts.emplace_back("");
return;
}
for (auto& arg_type : function_args) {
// We can't validate the UDTs here, so we just use the raw cql type names.
_parts.emplace_back(arg_type->to_string());
}
}
sstring resource::name() const {
return boost::algorithm::join(_parts, "/");
}
@@ -127,6 +157,7 @@ permission_set resource::applicable_permissions() const {
case resource_kind::data: ps = ::auth::applicable_permissions(data_resource_view(*this)); break;
case resource_kind::role: ps = ::auth::applicable_permissions(role_resource_view(*this)); break;
case resource_kind::service_level: ps = ::auth::applicable_permissions(service_level_resource_view(*this)); break;
case resource_kind::functions: ps = ::auth::applicable_permissions(functions_resource_view(*this)); break;
}
return ps;
@@ -149,6 +180,7 @@ std::ostream& operator<<(std::ostream& os, const resource& r) {
case resource_kind::data: return os << data_resource_view(r);
case resource_kind::role: return os << role_resource_view(r);
case resource_kind::service_level: return os << service_level_resource_view(r);
case resource_kind::functions: return os << functions_resource_view(r);
}
return os;
@@ -165,6 +197,109 @@ std::ostream &operator<<(std::ostream &os, const service_level_resource_view &v)
return os;
}
sstring encode_signature(std::string_view name, std::vector<data_type> args) {
return format("{}[{}]", name,
fmt::join(args | boost::adaptors::transformed([] (const data_type t) {
return t->name();
}), "^"));
}
std::pair<sstring, std::vector<data_type>> decode_signature(std::string_view encoded_signature) {
auto name_delim = encoded_signature.find_last_of('[');
std::string_view function_name = encoded_signature.substr(0, name_delim);
encoded_signature.remove_prefix(name_delim + 1);
encoded_signature.remove_suffix(1);
if (encoded_signature.empty()) {
return {sstring(function_name), {}};
}
std::vector<std::string_view> raw_types;
boost::split(raw_types, encoded_signature, boost::is_any_of("^"));
std::vector<data_type> decoded_types = boost::copy_range<std::vector<data_type>>(
raw_types | boost::adaptors::transformed([] (std::string_view raw_type) {
return db::marshal::type_parser::parse(raw_type);
})
);
return {sstring(function_name), decoded_types};
}
// Purely for Cassandra compatibility, types in the function signature are
// decoded from their verbose form (org.apache.cassandra.db.marshal.Int32Type)
// to the short form (int)
static sstring decoded_signature_string(std::string_view encoded_signature) {
auto [function_name, arg_types] = decode_signature(encoded_signature);
return format("{}({})", cql3::util::maybe_quote(sstring(function_name)),
boost::algorithm::join(arg_types | boost::adaptors::transformed([] (data_type t) {
return t->cql3_type_name();
}), ", "));
}
std::ostream &operator<<(std::ostream &os, const functions_resource_view &v) {
const auto keyspace = v.keyspace();
const auto function_signature = v.function_signature();
const auto name = v.function_name();
const auto args = v.function_args();
if (!keyspace) {
os << "<all functions>";
} else if (name) {
os << "<function " << *keyspace << '.' << cql3::util::maybe_quote(sstring(*name)) << '(';
for (auto arg : *args) {
os << arg << ',';
}
os << ")>";
} else if (!function_signature) {
os << "<all functions in " << *keyspace << '>';
} else {
os << "<function " << *keyspace << '.' << decoded_signature_string(*function_signature) << '>';
}
return os;
}
functions_resource_view::functions_resource_view(const resource& r) : _resource(r) {
if (r._kind != resource_kind::functions) {
throw resource_kind_mismatch(resource_kind::functions, r._kind);
}
}
std::optional<std::string_view> functions_resource_view::keyspace() const {
if (_resource._parts.size() == 1) {
return {};
}
return _resource._parts[1];
}
std::optional<std::string_view> functions_resource_view::function_signature() const {
if (_resource._parts.size() <= 2 || _resource._parts.size() > 3) {
return {};
}
return _resource._parts[2];
}
std::optional<std::string_view> functions_resource_view::function_name() const {
if (_resource._parts.size() <= 3) {
return {};
}
return _resource._parts[2];
}
std::optional<std::vector<std::string_view>> functions_resource_view::function_args() const {
if (_resource._parts.size() <= 3) {
return {};
}
std::vector<std::string_view> parts;
if (_resource._parts[3] == "") {
return {};
}
for (size_t i = 3; i < _resource._parts.size(); i++) {
parts.push_back(_resource._parts[i]);
}
return parts;
}
data_resource_view::data_resource_view(const resource& r) : _resource(r) {
if (r._kind != resource_kind::data) {
throw resource_kind_mismatch(resource_kind::data, r._kind);

View File

@@ -18,6 +18,7 @@
#include <vector>
#include <unordered_set>
#include <boost/range/adaptor/transformed.hpp>
#include <seastar/core/print.hh>
#include <seastar/core/sstring.hh>
@@ -25,6 +26,7 @@
#include "seastarx.hh"
#include "utils/hash.hh"
#include "utils/small_vector.hh"
#include "cql3/cql3_type.hh"
namespace auth {
@@ -36,11 +38,9 @@ public:
};
enum class resource_kind {
data, role, service_level
data, role, service_level, functions
};
std::ostream& operator<<(std::ostream&, resource_kind);
///
/// Type tag for constructing data resources.
///
@@ -56,10 +56,15 @@ struct role_resource_t final {};
///
struct service_level_resource_t final {};
///
/// Type tag for constructing function resources.
///
struct functions_resource_t final {};
///
/// Resources are entities that users can be granted permissions on.
///
/// There are data (keyspaces and tables) and role resources. There may be other kinds of resources in the future.
/// There are data (keyspaces and tables), role and function resources. There may be other kinds of resources in the future.
///
/// When they are stored as system metadata, resources have the form `root/part_0/part_1/.../part_n`. Each kind of
/// resource has a specific root prefix, followed by a maximum of `n` parts (where `n` is distinct for each kind of
@@ -83,6 +88,11 @@ public:
resource(data_resource_t, std::string_view keyspace, std::string_view table);
resource(role_resource_t, std::string_view role);
resource(service_level_resource_t);
explicit resource(functions_resource_t);
resource(functions_resource_t, std::string_view keyspace);
resource(functions_resource_t, std::string_view keyspace, std::string_view function_signature);
resource(functions_resource_t, std::string_view keyspace, std::string_view function_name,
std::vector<::shared_ptr<cql3::cql3_type::raw>> function_args);
resource_kind kind() const noexcept {
return _kind;
@@ -104,6 +114,7 @@ private:
friend class data_resource_view;
friend class role_resource_view;
friend class service_level_resource_view;
friend class functions_resource_view;
friend bool operator<(const resource&, const resource&);
friend bool operator==(const resource&, const resource&);
@@ -182,6 +193,25 @@ public:
std::ostream& operator<<(std::ostream&, const service_level_resource_view&);
///
/// A "function" view of \ref resource.
///
class functions_resource_view final {
const resource& _resource;
public:
///
/// \throws \ref resource_kind_mismatch if the argument is not a "function" resource.
///
explicit functions_resource_view(const resource&);
std::optional<std::string_view> keyspace() const;
std::optional<std::string_view> function_signature() const;
std::optional<std::string_view> function_name() const;
std::optional<std::vector<std::string_view>> function_args() const;
};
std::ostream& operator<<(std::ostream&, const functions_resource_view&);
///
/// Parse a resource from its name.
///
@@ -210,8 +240,49 @@ inline resource make_service_level_resource() {
return resource(service_level_resource_t{});
}
const resource& root_function_resource();
inline resource make_functions_resource() {
return resource(functions_resource_t{});
}
inline resource make_functions_resource(std::string_view keyspace) {
return resource(functions_resource_t{}, keyspace);
}
inline resource make_functions_resource(std::string_view keyspace, std::string_view function_signature) {
return resource(functions_resource_t{}, keyspace, function_signature);
}
inline resource make_functions_resource(std::string_view keyspace, std::string_view function_name, std::vector<::shared_ptr<cql3::cql3_type::raw>> function_signature) {
return resource(functions_resource_t{}, keyspace, function_name, function_signature);
}
sstring encode_signature(std::string_view name, std::vector<data_type> args);
std::pair<sstring, std::vector<data_type>> decode_signature(std::string_view encoded_signature);
}
template <>
struct fmt::formatter<auth::resource_kind> : fmt::formatter<std::string_view> {
template <typename FormatContext>
auto format(const auth::resource_kind kind, FormatContext& ctx) const {
using enum auth::resource_kind;
switch (kind) {
case data:
return formatter<std::string_view>::format("data", ctx);
case role:
return formatter<std::string_view>::format("role", ctx);
case service_level:
return formatter<std::string_view>::format("service_level", ctx);
case functions:
return formatter<std::string_view>::format("functions", ctx);
}
std::abort();
}
};
namespace std {
template <>
@@ -228,6 +299,10 @@ struct hash<auth::resource> {
return utils::tuple_hash()(std::make_tuple(auth::resource_kind::service_level));
}
static size_t hash_function(const auth::functions_resource_view& fv) {
return utils::tuple_hash()(std::make_tuple(auth::resource_kind::functions, fv.keyspace(), fv.function_signature()));
}
size_t operator()(const auth::resource& r) const {
std::size_t value;
@@ -235,6 +310,7 @@ struct hash<auth::resource> {
case auth::resource_kind::data: value = hash_data(auth::data_resource_view(r)); break;
case auth::resource_kind::role: value = hash_role(auth::role_resource_view(r)); break;
case auth::resource_kind::service_level: value = hash_service_level(auth::service_level_resource_view(r)); break;
case auth::resource_kind::functions: value = hash_function(auth::functions_resource_view(r)); break;
}
return value;

View File

@@ -20,17 +20,19 @@
#include "auth/allow_all_authorizer.hh"
#include "auth/common.hh"
#include "auth/role_or_anonymous.hh"
#include "cql3/functions/functions.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
#include "db/config.hh"
#include "db/consistency_level_type.hh"
#include "db/functions/function_name.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "service/migration_manager.hh"
#include "utils/class_registrator.hh"
#include "locator/abstract_replication_strategy.hh"
#include "data_dictionary/keyspace_metadata.hh"
#include "mutation.hh"
#include "mutation/mutation.hh"
namespace auth {
@@ -346,6 +348,22 @@ future<bool> service::exists(const resource& r) const {
}
case resource_kind::service_level:
return make_ready_future<bool>(true);
case resource_kind::functions: {
const auto& db = _qp.db();
functions_resource_view v(r);
const auto keyspace = v.keyspace();
if (!keyspace) {
return make_ready_future<bool>(true);
}
const auto function_signature = v.function_signature();
if (!function_signature) {
return make_ready_future<bool>(db.has_keyspace(sstring(*keyspace)));
}
auto [name, function_args] = auth::decode_signature(*function_signature);
return make_ready_future<bool>(cql3::functions::functions::find(db::functions::function_name{sstring(*keyspace), name}, function_args));
}
}
return make_ready_future<bool>(false);

View File

@@ -470,7 +470,7 @@ standard_role_manager::grant(std::string_view grantee_name, std::string_view rol
future<>
standard_role_manager::revoke(std::string_view revokee_name, std::string_view role_name) {
return this->exists(role_name).then([this, revokee_name, role_name](bool role_exists) {
return this->exists(role_name).then([role_name](bool role_exists) {
if (!role_exists) {
throw nonexistant_role(sstring(role_name));
}

View File

@@ -14,7 +14,7 @@
#endif
#ifndef STRINGIFY
// We need to levels of indirection
// We need two levels of indirection
// to make a string out of the macro name.
// The outer level expands the macro
// and the inner level makes a string out of the expanded macro.

View File

@@ -50,15 +50,7 @@ bytes from_hex(sstring_view s) {
}
sstring to_hex(bytes_view b) {
static char digits[] = "0123456789abcdef";
sstring out = uninitialized_string(b.size() * 2);
unsigned end = b.size();
for (unsigned i = 0; i != end; ++i) {
uint8_t x = b[i];
out[2*i] = digits[x >> 4];
out[2*i+1] = digits[x & 0xf];
}
return out;
return fmt::to_string(fmt_hex(b));
}
sstring to_hex(const bytes& b) {
@@ -70,12 +62,14 @@ sstring to_hex(const bytes_opt& b) {
}
std::ostream& operator<<(std::ostream& os, const bytes& b) {
return os << to_hex(b);
fmt::print(os, "{}", b);
return os;
}
std::ostream& operator<<(std::ostream& os, const bytes_opt& b) {
if (b) {
return os << *b;
fmt::print(os, "{}", *b);
return os;
}
return os << "null";
}
@@ -83,11 +77,13 @@ std::ostream& operator<<(std::ostream& os, const bytes_opt& b) {
namespace std {
std::ostream& operator<<(std::ostream& os, const bytes_view& b) {
return os << to_hex(b);
fmt::print(os, "{}", fmt_hex(b));
return os;
}
}
std::ostream& operator<<(std::ostream& os, const fmt_hex& b) {
return os << to_hex(b.v);
fmt::print(os, "{}", b);
return os;
}

View File

@@ -9,8 +9,9 @@
#pragma once
#include "seastarx.hh"
#include <fmt/format.h>
#include <seastar/core/sstring.hh>
#include "hashing.hh"
#include "utils/hashing.hh"
#include <optional>
#include <iosfwd>
#include <functional>
@@ -37,8 +38,8 @@ inline bytes_view to_bytes_view(sstring_view view) {
}
struct fmt_hex {
bytes_view& v;
fmt_hex(bytes_view& v) noexcept : v(v) {}
const bytes_view& v;
fmt_hex(const bytes_view& v) noexcept : v(v) {}
};
std::ostream& operator<<(std::ostream& os, const fmt_hex& hex);
@@ -51,6 +52,89 @@ sstring to_hex(const bytes_opt& b);
std::ostream& operator<<(std::ostream& os, const bytes& b);
std::ostream& operator<<(std::ostream& os, const bytes_opt& b);
template <>
struct fmt::formatter<fmt_hex> {
size_t _group_size_in_bytes = 0;
char _delimiter = ' ';
public:
// format_spec := [group_size[delimeter]]
// group_size := a char from '0' to '9'
// delimeter := a char other than '{' or '}'
//
// by default, the given bytes are printed without delimeter, just
// like a string. so a string view of {0x20, 0x01, 0x0d, 0xb8} is
// printed like:
// "20010db8".
//
// but the format specifier can be used to customize how the bytes
// are printed. for instance, to print an bytes_view like IPv6. so
// the format specfier would be "{:2:}", where
// - "2": bytes are printed in groups of 2 bytes
// - ":": each group is delimeted by ":"
// and the formatted output will look like:
// "2001:0db8:0000"
//
// or we can mimic how the default format of used by hexdump using
// "{:2 }", where
// - "2": bytes are printed in group of 2 bytes
// - " ": each group is delimeted by " "
// and the formatted output will look like:
// "2001 0db8 0000"
//
// or we can just print each bytes and separate them by a dash using
// "{:1-}"
// and the formatted output will look like:
// "20-01-0b-b8-00-00"
constexpr auto parse(fmt::format_parse_context& ctx) {
// get the delimeter if any
auto it = ctx.begin();
auto end = ctx.end();
if (it != end) {
int group_size = *it++ - '0';
if (group_size < 0 ||
static_cast<size_t>(group_size) > sizeof(uint64_t)) {
throw format_error("invalid group_size");
}
_group_size_in_bytes = group_size;
if (it != end) {
// optional delimiter
_delimiter = *it++;
}
}
if (it != end && *it != '}') {
throw format_error("invalid format");
}
return it;
}
template <typename FormatContext>
auto format(const ::fmt_hex& s, FormatContext& ctx) const {
auto out = ctx.out();
const auto& v = s.v;
if (_group_size_in_bytes > 0) {
for (size_t i = 0, size = v.size(); i < size; i++) {
if (i != 0 && i % _group_size_in_bytes == 0) {
fmt::format_to(out, "{}{:02x}", _delimiter, std::byte(v[i]));
} else {
fmt::format_to(out, "{:02x}", std::byte(v[i]));
}
}
} else {
for (auto b : v) {
fmt::format_to(out, "{:02x}", std::byte(b));
}
}
return out;
}
};
template <>
struct fmt::formatter<bytes> : fmt::formatter<fmt_hex> {
template <typename FormatContext>
auto format(const ::bytes& s, FormatContext& ctx) const {
return fmt::formatter<::fmt_hex>::format(::fmt_hex(bytes_view(s)), ctx);
}
};
namespace std {
// Must be in std:: namespace, or ADL fails

View File

@@ -12,7 +12,7 @@
#include "bytes.hh"
#include "utils/managed_bytes.hh"
#include "hashing.hh"
#include "utils/hashing.hh"
#include <seastar/core/simple-stream.hh>
#include <seastar/core/loop.hh>
#include <bit>
@@ -457,7 +457,9 @@ public:
_begin.ptr->size = _size;
_current = nullptr;
_size = 0;
return managed_bytes(std::exchange(_begin.ptr, {}));
auto begin_ptr = _begin.ptr;
_begin.ptr = nullptr;
return managed_bytes(begin_ptr);
} else {
return managed_bytes();
}

View File

@@ -10,10 +10,10 @@
#include <vector>
#include "row_cache.hh"
#include "mutation_fragment.hh"
#include "mutation/mutation_fragment.hh"
#include "query-request.hh"
#include "partition_snapshot_row_cursor.hh"
#include "range_tombstone_assembler.hh"
#include "mutation/range_tombstone_assembler.hh"
#include "read_context.hh"
#include "readers/delegating_v2.hh"
#include "clustering_key_filter.hh"
@@ -41,7 +41,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {
move_to_underlying,
// Invariants:
// - Upper bound of the read is min(_next_row.position(), _upper_bound)
// - Upper bound of the read is *_underlying_upper_bound
// - _next_row_in_range = _next.position() < _upper_bound
// - _last_row points at a direct predecessor of the next row which is going to be read.
// Used for populating continuity.
@@ -51,46 +51,6 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {
end_of_stream
};
enum class source {
cache = 0,
underlying = 1,
};
// Merges range tombstone change streams coming from underlying and the cache.
// Ensures no range tombstone change fragment is emitted when there is no
// actual change in the effective tombstone.
class range_tombstone_change_merger {
const schema& _schema;
position_in_partition _pos;
tombstone _current_tombstone;
std::array<tombstone, 2> _tombstones;
private:
std::optional<range_tombstone_change> do_flush(position_in_partition pos, bool end_of_range) {
std::optional<range_tombstone_change> ret;
position_in_partition::tri_compare cmp(_schema);
const auto res = cmp(_pos, pos);
const auto should_flush = end_of_range ? res <= 0 : res < 0;
if (should_flush) {
auto merged_tomb = std::max(_tombstones.front(), _tombstones.back());
if (merged_tomb != _current_tombstone) {
_current_tombstone = merged_tomb;
ret.emplace(_pos, _current_tombstone);
}
_pos = std::move(pos);
}
return ret;
}
public:
range_tombstone_change_merger(const schema& s) : _schema(s), _pos(position_in_partition::before_all_clustered_rows()), _tombstones{}
{ }
std::optional<range_tombstone_change> apply(source src, range_tombstone_change&& rtc) {
auto ret = do_flush(rtc.position(), false);
_tombstones[static_cast<size_t>(src)] = rtc.tombstone();
return ret;
}
std::optional<range_tombstone_change> flush(position_in_partition_view pos, bool end_of_range) {
return do_flush(position_in_partition(pos), end_of_range);
}
};
partition_snapshot_ptr _snp;
query::clustering_key_filter_ranges _ck_ranges; // Query schema domain, reversed reads use native order
@@ -103,8 +63,11 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {
// Holds the lower bound of a position range which hasn't been processed yet.
// Only rows with positions < _lower_bound have been emitted, and only
// range_tombstones with positions <= _lower_bound.
// range_tombstone_changes with positions <= _lower_bound.
//
// Invariant: !_lower_bound.is_clustering_row()
position_in_partition _lower_bound; // Query schema domain
// Invariant: !_upper_bound.is_clustering_row()
position_in_partition_view _upper_bound; // Query schema domain
std::optional<position_in_partition> _underlying_upper_bound; // Query schema domain
@@ -121,22 +84,19 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {
read_context& _read_context;
partition_snapshot_row_cursor _next_row;
range_tombstone_change_generator _rt_gen; // cache -> reader
range_tombstone_assembler _rt_assembler; // underlying -> cache
range_tombstone_change_merger _rt_merger; // {cache, underlying} -> reader
// When the read moves to the underlying, the read range will be
// (_lower_bound, x], where x is either _next_row.position() or _upper_bound.
// In the former case (x is _next_row.position()), underlying can emit
// a range tombstone change for after_key(x), which is outside the range.
// We can't push this fragment into the buffer straight away, the cache may
// have fragments with smaller position. So we save it here and flush it when
// a fragment with a larger position is seen.
std::optional<mutation_fragment_v2> _queued_underlying_fragment;
// Holds the currently active range tombstone of the output mutation fragment stream.
// While producing the stream, at any given time, _current_tombstone applies to the
// key range which extends at least to _lower_bound. When consuming subsequent interval,
// which will advance _lower_bound further, be it from underlying or from cache,
// a decision is made whether the range tombstone in the next interval is the same as
// the current one or not. If it is different, then range_tombstone_change is emitted
// with the old _lower_bound value (start of the next interval).
tombstone _current_tombstone;
state _state = state::before_static_row;
bool _next_row_in_range = false;
bool _has_rt = false;
// True iff current population interval, since the previous clustering row, starts before all clustered rows.
// We cannot just look at _lower_bound, because emission of range tombstones changes _lower_bound and
@@ -145,11 +105,6 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {
// Valid when _state == reading_from_underlying.
bool _population_range_starts_before_all_rows;
// Whether _lower_bound was changed within current fill_buffer().
// If it did not then we cannot break out of it (e.g. on preemption) because
// forward progress is not guaranteed in case iterators are getting constantly invalidated.
bool _lower_bound_changed = false;
// Points to the underlying reader conforming to _schema,
// either to *_underlying_holder or _read_context.underlying().underlying().
flat_mutation_reader_v2* _underlying = nullptr;
@@ -163,14 +118,11 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {
void move_to_next_range();
void move_to_range(query::clustering_row_ranges::const_iterator);
void move_to_next_entry();
void maybe_drop_last_entry() noexcept;
void flush_tombstones(position_in_partition_view, bool end_of_range = false);
void maybe_drop_last_entry(tombstone) noexcept;
void add_to_buffer(const partition_snapshot_row_cursor&);
void add_clustering_row_to_buffer(mutation_fragment_v2&&);
void add_to_buffer(range_tombstone_change&&, source);
void do_add_to_buffer(range_tombstone_change&&);
void add_range_tombstone_to_buffer(range_tombstone&&);
void add_to_buffer(mutation_fragment_v2&&);
void add_to_buffer(range_tombstone_change&&);
void offer_from_underlying(mutation_fragment_v2&&);
future<> read_from_underlying();
void start_reading_from_underlying();
bool after_current_range(position_in_partition_view position);
@@ -189,7 +141,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {
bool ensure_population_lower_bound();
void maybe_add_to_cache(const mutation_fragment_v2& mf);
void maybe_add_to_cache(const clustering_row& cr);
void maybe_add_to_cache(const range_tombstone_change& rtc);
bool maybe_add_to_cache(const range_tombstone_change& rtc);
void maybe_add_to_cache(const static_row& sr);
void maybe_set_static_row_continuous();
void finish_reader() {
@@ -244,8 +196,6 @@ public:
, _read_context_holder()
, _read_context(ctx) // ctx is owned by the caller, who's responsible for closing it.
, _next_row(*_schema, *_snp, false, _read_context.is_reversed())
, _rt_gen(*_schema)
, _rt_merger(*_schema)
{
clogger.trace("csm {}: table={}.{}, reversed={}, snap={}", fmt::ptr(this), _schema->ks_name(), _schema->cf_name(), _read_context.is_reversed(),
fmt::ptr(&*_snp));
@@ -373,13 +323,31 @@ future<> cache_flat_mutation_reader::do_fill_buffer() {
}
_state = state::reading_from_underlying;
_population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema) && !_read_context.is_reversed();
_underlying_upper_bound = _next_row_in_range ? position_in_partition::before_key(_next_row.position())
: position_in_partition(_upper_bound);
if (!_read_context.partition_exists()) {
clogger.trace("csm {}: partition does not exist", fmt::ptr(this));
if (_current_tombstone) {
clogger.trace("csm {}: move_to_underlying: emit rtc({}, null)", fmt::ptr(this), _lower_bound);
push_mutation_fragment(mutation_fragment_v2(*_schema, _permit, range_tombstone_change(_lower_bound, {})));
_current_tombstone = {};
}
return read_from_underlying();
}
_underlying_upper_bound = _next_row_in_range ? position_in_partition(_next_row.position())
: position_in_partition(_upper_bound);
return _underlying->fast_forward_to(position_range{_lower_bound, *_underlying_upper_bound}).then([this] {
return read_from_underlying();
if (!_current_tombstone) {
return read_from_underlying();
}
return _underlying->peek().then([this] (mutation_fragment_v2* mf) {
position_in_partition::equal_compare eq(*_schema);
if (!mf || !mf->is_range_tombstone_change()
|| !eq(mf->as_range_tombstone_change().position(), _lower_bound)) {
clogger.trace("csm {}: move_to_underlying: emit rtc({}, null)", fmt::ptr(this), _lower_bound);
push_mutation_fragment(mutation_fragment_v2(*_schema, _permit, range_tombstone_change(_lower_bound, {})));
_current_tombstone = {};
}
return read_from_underlying();
});
});
}
if (_state == state::reading_from_underlying) {
@@ -388,8 +356,8 @@ future<> cache_flat_mutation_reader::do_fill_buffer() {
// assert(_state == state::reading_from_cache)
return _lsa_manager.run_in_read_section([this] {
auto next_valid = _next_row.iterators_valid();
clogger.trace("csm {}: reading_from_cache, range=[{}, {}), next={}, valid={}", fmt::ptr(this), _lower_bound,
_upper_bound, _next_row.position(), next_valid);
clogger.trace("csm {}: reading_from_cache, range=[{}, {}), next={}, valid={}, rt={}", fmt::ptr(this), _lower_bound,
_upper_bound, _next_row.position(), next_valid, _current_tombstone);
// We assume that if there was eviction, and thus the range may
// no longer be continuous, the cursor was invalidated.
if (!next_valid) {
@@ -403,13 +371,9 @@ future<> cache_flat_mutation_reader::do_fill_buffer() {
}
_next_row.maybe_refresh();
clogger.trace("csm {}: next={}", fmt::ptr(this), _next_row);
_lower_bound_changed = false;
while (_state == state::reading_from_cache) {
copy_from_cache_to_buffer();
// We need to check _lower_bound_changed even if is_buffer_full() because
// we may have emitted only a range tombstone which overlapped with _lower_bound
// and thus didn't cause _lower_bound to change.
if ((need_preempt() || is_buffer_full()) && _lower_bound_changed) {
if (need_preempt() || is_buffer_full()) {
break;
}
}
@@ -423,37 +387,38 @@ future<> cache_flat_mutation_reader::read_from_underlying() {
[this] { return _state != state::reading_from_underlying || is_buffer_full(); },
[this] (mutation_fragment_v2 mf) {
_read_context.cache().on_row_miss();
maybe_add_to_cache(mf);
add_to_buffer(std::move(mf));
offer_from_underlying(std::move(mf));
},
[this] {
_lower_bound = std::move(*_underlying_upper_bound);
_underlying_upper_bound.reset();
_state = state::reading_from_cache;
_lsa_manager.run_in_update_section([this] {
auto same_pos = _next_row.maybe_refresh();
clogger.trace("csm {}: underlying done, in_range={}, same={}, next={}", fmt::ptr(this), _next_row_in_range, same_pos, _next_row);
if (!same_pos) {
_read_context.cache().on_mispopulate(); // FIXME: Insert dummy entry at _upper_bound.
_read_context.cache().on_mispopulate(); // FIXME: Insert dummy entry at _lower_bound.
_next_row_in_range = !after_current_range(_next_row.position());
if (!_next_row.continuous()) {
_last_row = nullptr; // We did not populate the full range up to _lower_bound, break continuity
start_reading_from_underlying();
}
return;
}
if (_next_row_in_range) {
maybe_update_continuity();
if (!_next_row.dummy()) {
_lower_bound = position_in_partition::before_key(_next_row.key());
} else {
_lower_bound = _next_row.position();
}
} else {
if (no_clustering_row_between(*_schema, _upper_bound, _next_row.position())) {
this->maybe_update_continuity();
} else if (can_populate()) {
if (can_populate()) {
const schema& table_s = table_schema();
rows_entry::tri_compare cmp(table_s);
auto& rows = _snp->version()->partition().mutable_clustered_rows();
if (query::is_single_row(*_schema, *_ck_ranges_curr)) {
// If there are range tombstones which apply to the row then
// we cannot insert an empty entry here because if those range
// tombstones got evicted by now, we will insert an entry
// with missing range tombstone information.
// FIXME: try to set the range tombstone when possible.
if (!_has_rt) {
with_allocator(_snp->region().allocator(), [&] {
auto e = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(_ck_ranges_curr->start()->value()));
@@ -466,9 +431,10 @@ future<> cache_flat_mutation_reader::read_from_underlying() {
// Also works in reverse read mode.
// It preserves the continuity of the range the entry falls into.
it->set_continuous(next->continuous());
clogger.trace("csm {}: inserted empty row at {}, cont={}", fmt::ptr(this), it->position(), it->continuous());
clogger.trace("csm {}: inserted empty row at {}, cont={}, rt={}", fmt::ptr(this), it->position(), it->continuous(), it->range_tombstone());
}
});
}
} else if (ensure_population_lower_bound()) {
with_allocator(_snp->region().allocator(), [&] {
auto e = alloc_strategy_unique_ptr<rows_entry>(
@@ -476,17 +442,19 @@ future<> cache_flat_mutation_reader::read_from_underlying() {
// Use _next_row iterator only as a hint, because there could be insertions after _upper_bound.
auto insert_result = rows.insert_before_hint(_next_row.get_iterator_in_latest_version(), std::move(e), cmp);
if (insert_result.second) {
clogger.trace("csm {}: inserted dummy at {}", fmt::ptr(this), _upper_bound);
clogger.trace("csm {}: L{}: inserted dummy at {}", fmt::ptr(this), __LINE__, _upper_bound);
_snp->tracker()->insert(*insert_result.first);
}
if (_read_context.is_reversed()) [[unlikely]] {
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), _last_row.position());
clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), _last_row.position(), insert_result.first->position(), _current_tombstone);
_last_row->set_continuous(true);
_last_row->set_range_tombstone(_current_tombstone);
} else {
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), insert_result.first->position());
clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), insert_result.first->position(), _last_row.position(), _current_tombstone);
insert_result.first->set_continuous(true);
insert_result.first->set_range_tombstone(_current_tombstone);
}
maybe_drop_last_entry();
maybe_drop_last_entry(_current_tombstone);
});
}
} else {
@@ -515,55 +483,103 @@ bool cache_flat_mutation_reader::ensure_population_lower_bound() {
// Continuity flag we will later set for the upper bound extends to the previous row in the same version,
// so we need to ensure we have an entry in the latest version.
if (!_last_row.is_in_latest_version()) {
with_allocator(_snp->region().allocator(), [&] {
auto& rows = _snp->version()->partition().mutable_clustered_rows();
rows_entry::tri_compare cmp(table_schema());
// FIXME: Avoid the copy by inserting an incomplete clustering row
auto e = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(table_schema(), *_last_row));
e->set_continuous(false);
auto insert_result = rows.insert_before_hint(rows.end(), std::move(e), cmp);
if (insert_result.second) {
auto it = insert_result.first;
clogger.trace("csm {}: inserted lower bound dummy at {}", fmt::ptr(this), it->position());
_snp->tracker()->insert(*it);
}
_last_row.set_latest(insert_result.first);
rows_entry::tri_compare cmp(*_schema);
partition_snapshot_row_cursor cur(*_schema, *_snp, false, _read_context.is_reversed());
if (!cur.advance_to(_last_row.position())) {
return false;
}
if (cmp(cur.position(), _last_row.position()) != 0) {
return false;
}
auto res = with_allocator(_snp->region().allocator(), [&] {
return cur.ensure_entry_in_latest();
});
_last_row.set_latest(res.it);
if (res.inserted) {
clogger.trace("csm {}: inserted lower bound dummy at {}", fmt::ptr(this), _last_row.position());
}
}
return true;
}
inline
void cache_flat_mutation_reader::maybe_update_continuity() {
if (can_populate() && ensure_population_lower_bound()) {
position_in_partition::equal_compare eq(*_schema);
if (can_populate()
&& ensure_population_lower_bound()
&& !eq(_last_row.position(), _next_row.position())) {
with_allocator(_snp->region().allocator(), [&] {
rows_entry& e = _next_row.ensure_entry_in_latest().row;
auto& rows = _snp->version()->partition().mutable_clustered_rows();
const schema& table_s = table_schema();
rows_entry::tri_compare table_cmp(table_s);
if (_read_context.is_reversed()) [[unlikely]] {
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), _last_row.position());
_last_row->set_continuous(true);
if (_current_tombstone != _last_row->range_tombstone() && !_last_row->dummy()) {
with_allocator(_snp->region().allocator(), [&] {
auto e2 = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(table_s,
position_in_partition_view::before_key(_last_row->position()),
is_dummy::yes,
is_continuous::yes));
auto insert_result = rows.insert(std::move(e2), table_cmp);
if (insert_result.second) {
clogger.trace("csm {}: L{}: inserted dummy at {}", fmt::ptr(this), __LINE__, insert_result.first->position());
_snp->tracker()->insert(*insert_result.first);
}
clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), insert_result.first->position(),
_last_row.position(), _current_tombstone);
insert_result.first->set_continuous(true);
insert_result.first->set_range_tombstone(_current_tombstone);
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), _last_row.position());
_last_row->set_continuous(true);
});
} else {
clogger.trace("csm {}: set_continuous({}), rt={}", fmt::ptr(this), _last_row.position(), _current_tombstone);
_last_row->set_continuous(true);
_last_row->set_range_tombstone(_current_tombstone);
}
} else {
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), e.position());
e.set_continuous(true);
if (_current_tombstone != e.range_tombstone() && !e.dummy()) {
with_allocator(_snp->region().allocator(), [&] {
auto e2 = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(table_s,
position_in_partition_view::before_key(e.position()),
is_dummy::yes,
is_continuous::yes));
// Use _next_row iterator only as a hint because there could be insertions before
// _next_row.get_iterator_in_latest_version(), either from concurrent reads,
// from _next_row.ensure_entry_in_latest().
auto insert_result = rows.insert_before_hint(_next_row.get_iterator_in_latest_version(), std::move(e2), table_cmp);
if (insert_result.second) {
clogger.trace("csm {}: L{}: inserted dummy at {}", fmt::ptr(this), __LINE__, insert_result.first->position());
_snp->tracker()->insert(*insert_result.first);
}
clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), insert_result.first->position(),
_last_row.position(), _current_tombstone);
insert_result.first->set_continuous(true);
insert_result.first->set_range_tombstone(_current_tombstone);
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), e.position());
e.set_continuous(true);
});
} else {
clogger.trace("csm {}: set_continuous({}), rt={}", fmt::ptr(this), e.position(), _current_tombstone);
e.set_range_tombstone(_current_tombstone);
e.set_continuous(true);
}
}
maybe_drop_last_entry();
maybe_drop_last_entry(_current_tombstone);
});
} else {
_read_context.cache().on_mispopulate();
}
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const mutation_fragment_v2& mf) {
if (mf.is_range_tombstone_change()) {
maybe_add_to_cache(mf.as_range_tombstone_change());
} else {
assert(mf.is_clustering_row());
const clustering_row& cr = mf.as_clustering_row();
maybe_add_to_cache(cr);
}
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
if (!can_populate()) {
@@ -572,16 +588,9 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
_read_context.cache().on_mispopulate();
return;
}
auto rt_opt = _rt_assembler.flush(*_schema, position_in_partition::after_key(cr.key()));
clogger.trace("csm {}: populate({})", fmt::ptr(this), clustering_row::printer(*_schema, cr));
_lsa_manager.run_in_update_section_with_allocator([this, &cr, &rt_opt] {
mutation_partition& mp = _snp->version()->partition();
if (rt_opt) {
clogger.trace("csm {}: populate flushed rt({})", fmt::ptr(this), *rt_opt);
mp.mutable_row_tombstones().apply_monotonically(table_schema(), to_table_domain(range_tombstone(*rt_opt)));
}
clogger.trace("csm {}: populate({}), rt={}", fmt::ptr(this), clustering_row::printer(*_schema, cr), _current_tombstone);
_lsa_manager.run_in_update_section_with_allocator([this, &cr] {
mutation_partition_v2& mp = _snp->version()->partition();
rows_entry::tri_compare cmp(table_schema());
if (_read_context.digest_requested()) {
@@ -590,6 +599,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
auto new_entry = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(table_schema(), cr.key(), cr.as_deletable_row()));
new_entry->set_continuous(false);
new_entry->set_range_tombstone(_current_tombstone);
auto it = _next_row.iterators_valid() ? _next_row.get_iterator_in_latest_version()
: mp.clustered_rows().lower_bound(cr.key(), cmp);
auto insert_result = mp.mutable_clustered_rows().insert_before_hint(it, std::move(new_entry), cmp);
@@ -603,9 +613,14 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
if (_read_context.is_reversed()) [[unlikely]] {
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), _last_row.position());
_last_row->set_continuous(true);
// _current_tombstone must also apply to _last_row itself (if it's non-dummy)
// because otherwise there would be a rtc after it, either creating a different entry,
// or clearing _last_row if population did not happen.
_last_row->set_range_tombstone(_current_tombstone);
} else {
clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), e.position());
e.set_continuous(true);
e.set_range_tombstone(_current_tombstone);
}
} else {
_read_context.cache().on_mispopulate();
@@ -617,6 +632,72 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
});
}
inline
bool cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone_change& rtc) {
rows_entry::tri_compare q_cmp(*_schema);
clogger.trace("csm {}: maybe_add_to_cache({})", fmt::ptr(this), rtc);
// Don't emit the closing range tombstone change, we may continue from cache with the same tombstone.
// The following relies on !_underlying_upper_bound->is_clustering_row()
if (q_cmp(rtc.position(), *_underlying_upper_bound) == 0) {
_lower_bound = rtc.position();
return false;
}
auto prev = std::exchange(_current_tombstone, rtc.tombstone());
if (_current_tombstone == prev) {
return false;
}
if (!can_populate()) {
// _current_tombstone is now invalid and remains so for this reader. No need to change it.
_last_row = nullptr;
_population_range_starts_before_all_rows = false;
_read_context.cache().on_mispopulate();
return true;
}
_lsa_manager.run_in_update_section_with_allocator([&] {
mutation_partition_v2& mp = _snp->version()->partition();
rows_entry::tri_compare cmp(table_schema());
auto new_entry = alloc_strategy_unique_ptr<rows_entry>(
current_allocator().construct<rows_entry>(table_schema(), to_table_domain(rtc.position()), is_dummy::yes, is_continuous::no));
auto it = _next_row.iterators_valid() ? _next_row.get_iterator_in_latest_version()
: mp.clustered_rows().lower_bound(to_table_domain(rtc.position()), cmp);
auto insert_result = mp.mutable_clustered_rows().insert_before_hint(it, std::move(new_entry), cmp);
it = insert_result.first;
if (insert_result.second) {
_snp->tracker()->insert(*it);
}
rows_entry& e = *it;
if (ensure_population_lower_bound()) {
// underlying may emit range_tombstone_change fragments with the same position.
// In such case, the range to which the tombstone from the first fragment applies is empty and should be ignored.
if (q_cmp(_last_row.position(), it->position()) < 0) {
if (_read_context.is_reversed()) [[unlikely]] {
clogger.trace("csm {}: set_continuous({}), rt={}", fmt::ptr(this), _last_row.position(), prev);
_last_row->set_continuous(true);
_last_row->set_range_tombstone(prev);
} else {
clogger.trace("csm {}: set_continuous({}), rt={}", fmt::ptr(this), e.position(), prev);
e.set_continuous(true);
e.set_range_tombstone(prev);
}
}
} else {
_read_context.cache().on_mispopulate();
}
with_allocator(standard_allocator(), [&] {
_last_row = partition_snapshot_row_weakref(*_snp, it, true);
});
_population_range_starts_before_all_rows = false;
});
return true;
}
inline
bool cache_flat_mutation_reader::after_current_range(position_in_partition_view p) {
position_in_partition::tri_compare cmp(*_schema);
@@ -632,19 +713,35 @@ void cache_flat_mutation_reader::start_reading_from_underlying() {
inline
void cache_flat_mutation_reader::copy_from_cache_to_buffer() {
clogger.trace("csm {}: copy_from_cache, next={}, next_row_in_range={}", fmt::ptr(this), _next_row.position(), _next_row_in_range);
clogger.trace("csm {}: copy_from_cache, next_row_in_range={}, next={}", fmt::ptr(this), _next_row_in_range, _next_row);
_next_row.touch();
position_in_partition_view next_lower_bound = _next_row.dummy() ? _next_row.position() : position_in_partition_view::after_key(_next_row.key());
auto upper_bound = _next_row_in_range ? next_lower_bound : _upper_bound;
if (_snp->range_tombstones(_lower_bound, upper_bound, [&] (range_tombstone rts) {
add_range_tombstone_to_buffer(std::move(rts));
return stop_iteration(_lower_bound_changed && is_buffer_full());
}, _read_context.is_reversed()) == stop_iteration::no) {
return;
if (_next_row.range_tombstone() != _current_tombstone) {
position_in_partition::equal_compare eq(*_schema);
auto upper_bound = _next_row_in_range ? position_in_partition_view::before_key(_next_row.position()) : _upper_bound;
if (!eq(_lower_bound, upper_bound)) {
position_in_partition new_lower_bound(upper_bound);
auto tomb = _next_row.range_tombstone();
clogger.trace("csm {}: rtc({}, {}) ...{}", fmt::ptr(this), _lower_bound, tomb, new_lower_bound);
push_mutation_fragment(mutation_fragment_v2(*_schema, _permit, range_tombstone_change(_lower_bound, tomb)));
_current_tombstone = tomb;
_lower_bound = std::move(new_lower_bound);
_read_context.cache()._tracker.on_range_tombstone_read();
}
}
// We add the row to the buffer even when it's full.
// This simplifies the code. For more info see #3139.
if (_next_row_in_range) {
if (_next_row.range_tombstone_for_row() != _current_tombstone) [[unlikely]] {
auto tomb = _next_row.range_tombstone_for_row();
auto new_lower_bound = position_in_partition::before_key(_next_row.position());
clogger.trace("csm {}: rtc({}, {})", fmt::ptr(this), new_lower_bound, tomb);
push_mutation_fragment(mutation_fragment_v2(*_schema, _permit, range_tombstone_change(new_lower_bound, tomb)));
_lower_bound = std::move(new_lower_bound);
_current_tombstone = tomb;
_read_context.cache()._tracker.on_range_tombstone_read();
}
add_to_buffer(_next_row);
move_to_next_entry();
} else {
@@ -660,10 +757,11 @@ void cache_flat_mutation_reader::move_to_end() {
inline
void cache_flat_mutation_reader::move_to_next_range() {
if (_queued_underlying_fragment) {
add_to_buffer(*std::exchange(_queued_underlying_fragment, {}));
if (_current_tombstone) {
clogger.trace("csm {}: move_to_next_range: emit rtc({}, null)", fmt::ptr(this), _upper_bound);
push_mutation_fragment(mutation_fragment_v2(*_schema, _permit, range_tombstone_change(_upper_bound, {})));
_current_tombstone = {};
}
flush_tombstones(position_in_partition::for_range_end(*_ck_ranges_curr), true);
auto next_it = std::next(_ck_ranges_curr);
if (next_it == _ck_ranges_end) {
move_to_end();
@@ -680,8 +778,6 @@ void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::con
_last_row = nullptr;
_lower_bound = std::move(lb);
_upper_bound = std::move(ub);
_rt_gen.trim(_lower_bound);
_lower_bound_changed = true;
_ck_ranges_curr = next_it;
auto adjacent = _next_row.advance_to(_lower_bound);
_next_row_in_range = !after_current_range(_next_row.position());
@@ -722,7 +818,7 @@ void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::con
// _next_row must have a greater position than _last_row.
// Invalidates references but keeps the _next_row valid.
inline
void cache_flat_mutation_reader::maybe_drop_last_entry() noexcept {
void cache_flat_mutation_reader::maybe_drop_last_entry(tombstone rt) noexcept {
// Drop dummy entry if it falls inside a continuous range.
// This prevents unnecessary dummy entries from accumulating in cache and slowing down scans.
//
@@ -733,9 +829,12 @@ void cache_flat_mutation_reader::maybe_drop_last_entry() noexcept {
&& !_read_context.is_reversed() // FIXME
&& _last_row->dummy()
&& _last_row->continuous()
&& _last_row->range_tombstone() == rt
&& _snp->at_latest_version()
&& _snp->at_oldest_version()) {
clogger.trace("csm {}: dropping unnecessary dummy at {}", fmt::ptr(this), _last_row->position());
with_allocator(_snp->region().allocator(), [&] {
cache_tracker& tracker = _read_context.cache()._tracker;
tracker.get_lru().remove(*_last_row);
@@ -769,57 +868,38 @@ void cache_flat_mutation_reader::move_to_next_entry() {
if (!_next_row.continuous()) {
start_reading_from_underlying();
} else {
maybe_drop_last_entry();
maybe_drop_last_entry(_next_row.range_tombstone());
}
}
}
void cache_flat_mutation_reader::flush_tombstones(position_in_partition_view pos, bool end_of_range) {
// Ensure position is appropriate for range tombstone bound
pos = position_in_partition_view::after_key(pos);
clogger.trace("csm {}: flush_tombstones({}) end_of_range: {}", fmt::ptr(this), pos, end_of_range);
_rt_gen.flush(pos, [this] (range_tombstone_change&& rtc) {
add_to_buffer(std::move(rtc), source::cache);
}, end_of_range);
if (auto rtc_opt = _rt_merger.flush(pos, end_of_range)) {
do_add_to_buffer(std::move(*rtc_opt));
}
}
inline
void cache_flat_mutation_reader::add_to_buffer(mutation_fragment_v2&& mf) {
clogger.trace("csm {}: add_to_buffer({})", fmt::ptr(this), mutation_fragment_v2::printer(*_schema, mf));
position_in_partition::less_compare less(*_schema);
if (_underlying_upper_bound && less(*_underlying_upper_bound, mf.position())) {
_queued_underlying_fragment = std::move(mf);
return;
}
flush_tombstones(mf.position());
void cache_flat_mutation_reader::offer_from_underlying(mutation_fragment_v2&& mf) {
clogger.trace("csm {}: offer_from_underlying({})", fmt::ptr(this), mutation_fragment_v2::printer(*_schema, mf));
if (mf.is_clustering_row()) {
maybe_add_to_cache(mf.as_clustering_row());
add_clustering_row_to_buffer(std::move(mf));
} else {
assert(mf.is_range_tombstone_change());
add_to_buffer(std::move(mf).as_range_tombstone_change(), source::underlying);
auto& chg = mf.as_range_tombstone_change();
if (maybe_add_to_cache(chg)) {
add_to_buffer(std::move(mf).as_range_tombstone_change());
}
}
}
inline
void cache_flat_mutation_reader::add_to_buffer(const partition_snapshot_row_cursor& row) {
position_in_partition::less_compare less(*_schema);
if (_queued_underlying_fragment && less(_queued_underlying_fragment->position(), row.position())) {
add_to_buffer(*std::exchange(_queued_underlying_fragment, {}));
}
if (!row.dummy()) {
_read_context.cache().on_row_hit();
if (_read_context.digest_requested()) {
row.latest_row().cells().prepare_hash(table_schema(), column_kind::regular_column);
}
flush_tombstones(position_in_partition_view::for_key(row.key()));
add_clustering_row_to_buffer(mutation_fragment_v2(*_schema, _permit, row.row()));
} else {
if (less(_lower_bound, row.position())) {
_lower_bound = row.position();
_lower_bound_changed = true;
}
_read_context.cache()._tracker.on_dummy_row_hit();
}
@@ -832,67 +912,24 @@ inline
void cache_flat_mutation_reader::add_clustering_row_to_buffer(mutation_fragment_v2&& mf) {
clogger.trace("csm {}: add_clustering_row_to_buffer({})", fmt::ptr(this), mutation_fragment_v2::printer(*_schema, mf));
auto& row = mf.as_clustering_row();
auto new_lower_bound = position_in_partition::after_key(row.key());
auto new_lower_bound = position_in_partition::after_key(*_schema, row.key());
push_mutation_fragment(std::move(mf));
_lower_bound = std::move(new_lower_bound);
_lower_bound_changed = true;
if (row.tomb()) {
_read_context.cache()._tracker.on_row_tombstone_read();
}
}
inline
void cache_flat_mutation_reader::add_to_buffer(range_tombstone_change&& rtc, source src) {
void cache_flat_mutation_reader::add_to_buffer(range_tombstone_change&& rtc) {
clogger.trace("csm {}: add_to_buffer({})", fmt::ptr(this), rtc);
if (auto rtc_opt = _rt_merger.apply(src, std::move(rtc))) {
do_add_to_buffer(std::move(*rtc_opt));
}
}
inline
void cache_flat_mutation_reader::do_add_to_buffer(range_tombstone_change&& rtc) {
clogger.trace("csm {}: push({})", fmt::ptr(this), rtc);
_has_rt = true;
position_in_partition::less_compare less(*_schema);
auto lower_bound_changed = less(_lower_bound, rtc.position());
_lower_bound = position_in_partition(rtc.position());
_lower_bound_changed = lower_bound_changed;
push_mutation_fragment(*_schema, _permit, std::move(rtc));
_read_context.cache()._tracker.on_range_tombstone_read();
}
inline
void cache_flat_mutation_reader::add_range_tombstone_to_buffer(range_tombstone&& rt) {
position_in_partition::less_compare less(*_schema);
if (_queued_underlying_fragment && less(_queued_underlying_fragment->position(), rt.position())) {
add_to_buffer(*std::exchange(_queued_underlying_fragment, {}));
}
clogger.trace("csm {}: add_to_buffer({})", fmt::ptr(this), rt);
if (!less(_lower_bound, rt.position())) {
rt.set_start(_lower_bound);
}
flush_tombstones(rt.position());
_rt_gen.consume(std::move(rt));
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone_change& rtc) {
clogger.trace("csm {}: maybe_add_to_cache({})", fmt::ptr(this), rtc);
auto rt_opt = _rt_assembler.consume(*_schema, range_tombstone_change(rtc));
if (!rt_opt) {
return;
}
const auto& rt = *rt_opt;
if (can_populate()) {
clogger.trace("csm {}: maybe_add_to_cache({})", fmt::ptr(this), rt);
_lsa_manager.run_in_update_section_with_allocator([&] {
_snp->version()->partition().mutable_row_tombstones().apply_monotonically(
table_schema(), to_table_domain(rt));
});
} else {
_read_context.cache().on_mispopulate();
}
}
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const static_row& sr) {
if (can_populate()) {

17
cdc/CMakeLists.txt Normal file
View File

@@ -0,0 +1,17 @@
add_library(cdc STATIC)
target_sources(cdc
PRIVATE
cdc_partitioner.cc
generation.cc
log.cc
metadata.cc
split.cc)
target_include_directories(cdc
PUBLIC
${CMAKE_SOURCE_DIR})
target_link_libraries(cdc
PUBLIC
Seastar::seastar
xxHash::xxhash
PRIVATE
replica)

View File

@@ -15,7 +15,7 @@
#include "serializer.hh"
#include "db/extensions.hh"
#include "cdc/cdc_options.hh"
#include "schema.hh"
#include "schema/schema.hh"
#include "serializer_impl.hh"
namespace cdc {

View File

@@ -8,7 +8,7 @@
#include "cdc_partitioner.hh"
#include "dht/token.hh"
#include "schema.hh"
#include "schema/schema.hh"
#include "sstables/key.hh"
#include "utils/class_registrator.hh"
#include "cdc/generation.hh"

View File

@@ -8,7 +8,7 @@
#pragma once
#include "mutation.hh"
#include "mutation/mutation.hh"
/*
* This file contains a general abstraction for walking over mutations,

Some files were not shown because too many files have changed in this diff Show More